First of all, thank you for this great (!) module.

I am importing content in my local drupal installation, and I'm experiencing problems with paths of cyrillic categories and nodes.
I noticed in "pathauto.module" a "translations" array to convert chars. Is that the point where I should add my translit rules?

Could it be a useful contrib/patch? Or is there a more general approach you're going to follow to address this problem?

Ciao. Mic

Comments

tomamic’s picture

I tried to find a generic, standard solution to the transliteration problem, when generating urls from titles. Let me stress it's quite important, if we don't want Drupal to be limited to latin alphabets.

I eventually wrote stg around to the iconv function:

$separator = '_';
$pattern = '/[^a-zA-Z0-9 ]+/';

$txt = iconv("UTF-8", "ASCII//TRANSLIT", $txt);
$txt = strtolower($txt);
$txt = trim(preg_replace($pattern, '', $txt));
$txt = preg_replace('/ +/', $separator, $txt);

I don't know how well this works exactly. In fact, where my site is hosted, iconv is not supported :(

Can you have a look at how the problem is solved in textpattern?
They have an external translit map which, though not complete, can be extended without touching the code. It already includes rules for accented chars, cyrillic and greek.

http://dev.textpattern.com/browser/development/4.0/textpattern/lib/txplib_misc.php?rev=840
http://dev.textpattern.com/browser/development/4.0/textpattern/lib/i18n-ascii.txt

The most interesting function is dumbDown. It's very simple, and could be merged with the existing code in few minutes.
(Textpattern is free, open source software, distributed under the GNU General Public License.)

Thank you! Ciao. Mic

tomamic’s picture

As I don't know about a windows version of diff which supports unicode, I'm posting here my 'patch' to transliterate urls (it works for nodes, users, and everything).

The following lines should be inserted soon after the definition of $translations (in cleanstring - pathauto.module). Also, you should add this file to the pathauto folder.

  static $i18n_loaded = false;
  if (! $i18n_loaded) {
    $path = drupal_get_path('module', 'pathauto');
    if (is_file($path. '/i18n-ascii.txt')) {
      $i18n = parse_ini_file($path. '/i18n-ascii.txt');
      $translations = array_merge($translations, $i18n);
    }
    $i18n_loaded = true;
  }
greggles’s picture

Status: Active » Closed (duplicate)

tomamic - I really really like this idea. I've contacted the textpattern folks about collaborating on it.

I'm closing this issue in favor of http://drupal.org/node/61815 which contains more ideas on the subject about making it easier for site admins to edit the file.

greggles’s picture

Just to close the loop I have commited this patch - thanks tomamic!