(I have posted this as a message to the drupal-support mailing list and been advised to post it here. Don't know if as a bug or as a feature request)

I am setting up an spanish language site using Drupal 4.4.0 and have
run into problems when using the Textile filter. If I don't use any filters, the content shows up correctly when using accents:

"La Asesoría Académica es uno de los centros oficiales de información"

But if I use the textile filter (last version, with any of the two possible usages (Textile 1 or Textile v2b) I get the next undesired result:

"La Asesoría Académica es uno de los centros oficiales de informaciÃ"

I think that UTF-8 encoding has something to do with it... I would like if someone has run into similar problems and if there's a solution.

Another question it's that if there is any way of Drupal adding

automatically when you insert breaks in the editing area.

Thanks in advance,
álvaro

Comments

magnestyuk’s picture

The same here, except I modified all header charsets to iso-8859-2. Things worked fine until textile. Textile deleted all characters from my posts that were accented.

I looked into the textile php-s and after some reading up on the topic I modified this code in textile1.php around line 130 after "# entify everything":

    if (function_exists('mb_encode_numericentity')) {
        $text = Textile1::encode_high($text);   
    } else { 
        $text = htmlentities($text,ENT_NOQUOTES);
    }

I commented that piece of code and inserted

$convmap = array(0x80, 0xff, 0, 0xff);
$text = mb_encode_numericentity($text, $convmap, "ISO-8859-1");

So, although the function that is responsible for the "bad" conversion (encode_high) is still in the script, it is not invoked. Instead, the above conversion (taken from php.net's manual) is done that--as I understand--leaves alone characters not in the iso-8859-1 character set.

I have a limited knowledge of php and this is not the best solution, but it worked for me. Also note that I use only Textile 1, because that's enough for me, so I left alone textile2.php.

I hope this helps somewhat.

magnestyuk’s picture

In fact, I just discovered that the above modification did not actually leave alone my accented characters but changed them to html entities. This is not what I wanted. I would love to see the textile module be considerate of non-Western character sets and truly leave out those non-Western characters from filtering altogether.

I'm also at the brink of going public with my Drupal site, but I'm having tons of issues with non-Western encoding, this being one of them.

I would really really appreciate if someone took the time to look into this.

Thanks.

jhriggs’s picture

Assigned: Unassigned » jhriggs

Please try again with the latest release of the Textile module. It now uses a completely different Textile engine, a PHP port [1] of Brad Choate's Textile.pm Perl module [2]. It may help with this, or there may be some new fixes we can try now.

[1] http://jimandlissa.com/project/textilephp
[2] http://bradchoate.com/mt-plugins/textile

pz’s picture

Priority: Normal » Minor

Didn't work for me with the 4.4.0 release (downloaded 2004-07-11), my workaround was to set the default value for options['char_encoding'] to 0 instead of 1, which seems to work fine for my site.

Gabriel R.’s picture

If you are using UTF-8, changing the char_encoding option is recomended and it works well.

jhriggs’s picture