the substitution of german characters in pathauto_cleanstring are wrong.

It should be

'Ä'=>'Ae'
'ä'=>'ae'
'Ö'=>'Oe'
'ö'=>'oe'
'Ü'=>'Ue'
'ü'=>'ue'
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rstamm’s picture

Status: Active » Needs review
Tommy Sundstrom’s picture

This would break substitution in other languages, for example swedish where the normal way is to substitute by just dropping the accents, like this:

ä -> a
ö -> o

etc. Actually, to my knowledge this way to substitute is quite usual also for german urls.

rstamm’s picture

sorry, i didn't know it before.

no, it is not quite usual for german. it is very unusual.

but may be we can find a solution if we put another option to the settings where you can choose out the substitution style if you want to substitute characters different from the general way.

frostschutz’s picture

The table posted by Flanker is the most common way to substitute umlauts in Germany if you are restricted to US-ASCII. ('ß'=>'ss' is missing from the list). This kind of substitution will be understood by almost anyone in Germany. Simply dropping the accents is not done (first time I hear of it). It makes things hard to impossible to understand, and changes the meaning in some cases.

There are other methods, like using "A/"a, "O/"o, "U/"u, and sz instead of ß. "A/"a is extremely uncommon and not suited for URLs anyway; there are quite a couple of people promoting the use of 'sz' instead of 'ss' as a replacement for 'ß', because 'ß'->'ss' can also cause misunderstandings in rare cases.

However, 'ss' is still the most commonly understood replacement, so I would not recommend to use anything else.

greggles’s picture

marked http://drupal.org/node/75971 as a duplicate of this

petrescs’s picture

greggles, thank you for your reply on http://drupal.org/node/77230.

Agree it would be good to have the conversion list exposed to admin/settings/pathauto, maybe using an interface similar to the one provided by locale.module (on "manage strings" tab).

Or, maybe adopt the approach described in http://drupal.org/node/63924 for clean urls - I have tried but it does not work in my case (PHP 5.1.12). After reading more about iconv() at http://ro.php.net/manual/en/ref.iconv.php and http://ro.php.net/manual/en/function.iconv.phpfound this might be because my webhost provider's php was not complied with --with-iconv option. Am not a programmer so I cannot debug that code, but this fix would completely eliminate I guess the need for a separate transliteration table.

greggles’s picture

Title: substitution of german characters wrong » overhaul pathauto_cleanstring to expose control to the admin
Status: Needs review » Active

Renaming this issue to reflect what I'd like to do here and resetting the status to active (since the patch isn't relevant to the new goal).

petrescs suggested the idea of using an interface similar to the locale module to manage these values. That feels overly difficult to me. I'm thinking of something like sticking the $translations array into a large textarea and then letting people edit that example. It would seem reasonable to create some recommended arrays in either the handbooks or CVS (or CVS linked from the handbooks) that people can use as a base for their sites. One would be the current array that tries to cover everything. Other choices might be focused on countries, or languages, or both.

If people agree with this conceptually then we can get to work on implementing it.

greggles’s picture

Here's yet another idea on the subject. http://drupal.org/node/46039

It's my new favorite idea: collaborate with other groups that need to do this, put it in a text file (instead of admin interface, which is still easy to edit and doesn't require "coding")

Nilard’s picture

Here is my 'useful comments or assistance' for this issue. I just want this code of Russian transliteration to be submitted into the pathauto.module.

'А'=>'A', 'а'=>'a',
'Б'=>'B', 'б'=>'b',
'В'=>'V', 'в'=>'v',
'Г'=>'G', 'г'=>'g',
'Д'=>'D', 'д'=>'d',
'Е'=>'E', 'е'=>'e',
'Ё'=>'YO', 'ё'=>'yo',
'Ж'=>'ZH', 'ж'=>'zh',
'З'=>'Z', 'з'=>'z',
'И'=>'I', 'и'=>'i',
'Й'=>'Y', 'й'=>'y',
'К'=>'K', 'к'=>'k',
'Л'=>'L', 'л'=>'l',
'М'=>'M', 'м'=>'m',
'Н'=>'N', 'н'=>'n',
'О'=>'O', 'о'=>'o',
'П'=>'P', 'п'=>'p',
'Р'=>'R', 'р'=>'r',
'С'=>'S', 'с'=>'s',
'Т'=>'T', 'т'=>'t',
'У'=>'U', 'у'=>'u',
'Ф'=>'F', 'ф'=>'f',
'Х'=>'KH', 'х'=>'kh',
'Ц'=>'TS', 'ц'=>'ts',
'Ч'=>'CH', 'ч'=>'ch',
'Ш'=>'SH', 'ш'=>'sh',
'Щ'=>'SHCH', 'щ'=>'shch',
'Ъ'=>'', 'ъ'=>'',
'Ы'=>'Y', 'ы'=>'y',
'Ь'=>'', 'ь'=>'',
'Э'=>'E', 'э'=>'e',
'Ю'=>'YU', 'ю'=>'yu',
'Я'=>'YA', 'я'=>'ya',
meba’s picture

Is there any progress in this? Because characters like: š, ť, ň are still missing in pathauto. I can provide a patch to add them or we can implement text files. What do you prefer?

greggles’s picture

My preference is definitely for a solution like that mentioned in comment #8 of this issue: Use a txt file for the translations that is NOT part of the distribution (though provide an example file that people can copy into place). I want to work on it but haven't had the time.

I think this provides the best upgrade path and the easiest system for admins to edit without adding too much code to Pathauto.

Also, note that for 5.x there will be an option to not even bother with transliteration: http://drupal.org/node/98964

meba’s picture

You are right. But still the patch is so easy that it should be integrated. I am attaching small patch adding š, ť to transliterations...

greggles’s picture

It's easy to add your patch right up until I actually do it and someone says "no, that's the wrong transliteration for my language" and asks me to change it back. Please don't clutter the issue with off-topic posts.

meba’s picture

OK. But you are translitering Š to S, why not š to s? It's just lowercase version and it's a bug, not a "transliteration to my language". Should i add another issue? Thanks

greggles’s picture

Status: Active » Needs review
FileSize
11.53 KB

Here is a patch with documentation which allows a site admin to control their own text file full of transliterations. I provide an example transliterations file which is from the textpattern i18n-ascii.txt (under GPL).

Please test out this patch, review the code, and see if it works for you. Note that the install.txt has been updated as well to explain how to use this patch.

The i18n-ascii.example.txt file as provided in the patch seems like it got messed up a bit. You can get a valid version from http://dev.textpattern.com/browser/development/4.0/textpattern/lib/i18n-...

greggles’s picture

Version: 4.7.x-1.x-dev » 5.x-1.x-dev

Also, this is for 5.x.
Also, I plan to commit this in the next few days regardless of comments - then I'll get some testers whether they want to or not :)
Also, a port for 4.7 will depend on the demand.

meba@drupal.org’s picture

Status: Needs review » Reviewed & tested by the community

I can confirm this patch works using i18n-ascii.txt from dev.textpattern.com

greggles’s picture

Status: Reviewed & tested by the community » Needs review

I appreciate that you tested it, but please don't go RTBC unless you are the maintainer or the maintainer has told you to do so.

Did you review the patch in addition to testing? What scenarios did you test?

meba@drupal.org’s picture

Sorry.
I tested adding all Czech diacritic characters and watched output - all transliterated correctly.
I also tested adding bogus characters to i18n-ascii.txt, which (correctly) printed "Error parsing..." and created node without alias.

greggles’s picture

Great, thanks, meba.

I should also mention that it would be really helpful for people to test taxonomies (simple ones, anyway, complex ones don't work) and also creating user names with international characters.

Thanks again for your help.

meba@drupal.org’s picture

Creating taxonomies with international characters work. Please note that creating terms still prints a warning ( http://drupal.org/node/92900 ) - not related to this bug.

Creating users works too

greggles’s picture

Status: Needs review » Fixed

Great, I committed this just now.

Thanks meba for your input.

ludwikg’s picture

I can confirm it works correctly also for the Polish letters.

Anonymous’s picture

Status: Fixed » Closed (fixed)
tobiasr’s picture

This doesn't seem to work from the latest CVS... German umlauts are not replaced correctly.

Also, wasn't it brought up as well to have an option to decide whether to substitute or leave special characters as-is?