"Þ" is being replaced with "Þ" instead of "Þ".

I reported TinyMCE incorrectly messing up my Icelandic texts some time ago (http://drupal.org/node/21060). At that time I gave up on WYSIWYG editors and stuck to HTML. Now I tried FCKeditor and had the same thing happen.

After many hours of reading through Javascript code certain of FCKEditor's fault in this I discovered by disabling all Input modules for the node everything was fine.

In filter.module there is a line:

Line 997: $chunk = preg_replace('/&([^#])(?![a-z]{1,8};)/', '&$1', $chunk);

This line assumes there are no entities with upper case letters. Big thorn however is and that's why this only messed up this SINGLE character.

By changing it to:

$chunk = preg_replace('/&([^#])(?![a-zA-Z]{1,8};)/', '&$1', $chunk);

Now everything works fine. I'd post a patch but I have no idea how atm.

Comments

imerlin’s picture

Input filter swallowed the first line... I'll try to explain.

* WYSIWYG is translating "Þ" to "& THORN;" (remove space)
* Drupal's input module is translatin "& THORN;" to "& amp; THORN;"

Messes up alot of Icelandic texts, not sure if other countries use this character but it's in the w3c doc.

Steven’s picture

Status: Active » Fixed

This has been fixed in 4.6/HEAD recently.

dries’s picture

Status: Fixed » Closed (fixed)