When I tried to pass a unicode malayalam string to drupal_html_to_text() function the returned value contains extra U+200A characters. to reproduce the bug we used the following function.

$str1 = "നാ­ല് രാ­ജ്യ­ങ്ങ­ളി­ലെ ജന­ങ്ങള്‍ ഉപ­യോ­ഗി­ക്കു­ന്ന യോര്‍­ദ്ദാന്‍ നദി­യി­ലെ";
$str2 = drupal_html_to_text($str1);
print strlen($str1) ." ". strlen($str2);

we got length of $str1 as 199 and $str2 as 204 we found an extra U+200A in $str2, what can be the reason for this.

thanks.

Comments

rfay’s picture

Project: API » Drupal core
Version: 6.x-1.2 » 6.x-dev
Component: Code » base system

I'm certain this bug should not be against API module, but rather Drupal core.

What version of Drupal core? 6 or 7?

unnikrishnan’s picture

We are using drupal core version 6.17

rfay’s picture

This would have to be fixed first in Drupal 7 if it's an issue there. Could you please test in D7 to see if it's an issue?

unnikrishnan’s picture

In 7.0-alpha6 the bug persists. Now when we run the following code

$str1 = "നാ­ല് രാ­ജ്യ­ങ്ങ­ളി­ലെ ജന­ങ്ങള്‍ ഉപ­യോ­ഗി­ക്കു­ന്ന യോര്‍­ദ്ദാന്‍ നദി­യി­ലെ";
$str2 = drupal_html_to_text($str1);
print strlen($str1) ." ". strlen($str2);

length of $str1 is 199 and $str2 is 203.

rfay’s picture

Version: 6.x-dev » 7.x-dev

OK, moving to 7.x. After it's fixed there it can come back to 6.x.

Your best approach on this one is to debug it if you can and post a patch.

Status: Active » Closed (outdated)

Automatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.