Closed (fixed)
Project:
Pathauto
Version:
4.6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Reporter:
Created:
14 Aug 2005 at 11:54 UTC
Updated:
27 Sep 2006 at 22:45 UTC
Jump to comment: Most recent file
When certain accented characters are converted to ascii, extra dashes are added.
Some other characters are simply stripped.
E.g. Ăă Îî Ââ Şş Ţţ Iñtërnâţiônàlizætiønş is converted to ii-aa-in-te-rna-t-io-na-lizaetions.
I attached a file with a conversion "table" for all the accented characters in the Latin-1 and Latin Extended-A tables of the Unicode spec.
I would be glad to make a patch ...if someone can show me how :-)
Thanks.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | ru_translit.patch | 1.8 KB | restyler |
| #3 | pathauto_0.module | 12.6 KB | restyler |
| unicode to ascii conversion table.txt | 990 bytes | Gabriel R. |
Comments
Comment #1
mikeryanI can reproduce the problem, but that table isn't in a useful form. Here's the code pathauto uses, borrowed from commentary in the online PHP manual:
Any thoughts on modifying this to take care of the reported problem cases?
Thanks.
Comment #2
Gabriel R. commentedI found that code bit on my own too, but I couldn't find a way to convert a text files into \xxx stuff. If anything comes up I'll think of this issue.
Comment #3
restyler commentedI've edited pathauto.module a bit, to create translited aliases for russian node titles - but it doesn't uses these codes, it uses true cyrillic letters. So, I couldn't patch the original pathauto.module with unxutils.sourceforge.net diff.exe because of cyrillic letters, though I've created a patch.
Comment #4
restyler commentedand patch
Comment #5
bilgehan commentedThis bug should not be seen as narrow as to fix only the accented characters. Many languages have a couple of special characters (ie. spanish:ñ, catalan:ç, turkish:ü,ö,ş etc.) that are also converted to extra dashes or just stripped. There must be a solution including all these.
One solution is to use utf-8 counterpart of the characters like Wikipedia is using. For example the url path of the term "españa" is http://es.wikipedia.org/wiki/Espa%C3%B1a
Comment #6
restyler commentedwell, I think the easiest solution (the best?) is to increase the number of elements in array
you can find all symbol codes in charmap.exe (win32)
cyrillic, accented, ...etc
if I have time, I'll create the patch soon.
Comment #7
Gabriel R. commentedbilgehan: My table contains all diacritics, not just accented characters.
_Troy_: I know the hard way to do it, I was looking for a convertor to do it without wasting half a day and me love for Drupal :-)
I may end up paying for this, it's too important.
Comment #8
mikeryanPartial fix - I've implemented a translation using the table you provided, which seems to improve the situation but doesn't cleanly translate all the characters from your original example (Ăă Îî Ââ Şş Ţţ Iñtërnâţiônàlizætiøns). Any advice on further refinement welcome...
I'll commit this to cvs later today.
Comment #9
gregglesSince mike committed most of this, I'm going to close it in favor of this issue which discusses new ideas for the transliteration process: http://drupal.org/node/61815
Comment #10
(not verified) commented