Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
In drupal, substr() function is used in many place.
But it does not consider multi-byte strings.
In utf-8, characters are encoded from 1 byte to 3 bytes. For example, 'U+0041'(alphabet 'A') is encoded as "0x41", and 'U+AC00'(가(=ga) in Korean) is encoded as "0xEA 0xB0 0x80".
If you call "substr('0x41 0x41 0xEA 0xB0 0x80 0x41', 0, 3)", it returns a broken(!) string "0x41 0x41 0xEA". It should be trimmed to "0x41 0x41" or something.
Comments
Comment #1
moshe weitzman CreditAttribution: moshe weitzman commentedAt bottom of this PHP manual page, a chinese user proposes a replacement for substr().
I don't know how valid this solution is.
Comment #2
(not verified) CreditAttribution: commentedThe suggested method does work only for the EUC encoding.
This bug is not only related to asian languages. non-ASCII characters, such as accent grave in French or umlaut in German, also cause the problem.
Comment #3
cdpark CreditAttribution: cdpark commentedmb_strcut() is the solution. It is only supported for (php 4 >= 4.0.6). Becuase it is an extended module, it may not be available.
http://www.php.net/manual/function.mb-strcut.php
We may need to backport(or reinvent) this routine.
Bug #2230 is also related.
Comment #4
cdpark CreditAttribution: cdpark commentedInstead of
substr($str, 0, $length)
, use this function instead. It may solve the problem.Comment #5
cdpark CreditAttribution: cdpark commentedComment #6
al CreditAttribution: al commentedThe proper solution to this problem is to compile PHP with multibyte string support (--enable-mbstring) [see http://www.php.net/manual/en/ref.mbstring.php] and specify mbstring.func_overload in PHP.ini and/or .htaccess to be equal to 7 (overload on all functions).
--enable-mbstring is supposed to be enabled by default on PHP 4.3+, but the comment at the bottom of that page seems to imply that it actually isn't.
Comment #7
moshe weitzman CreditAttribution: moshe weitzman commentedAl suggests that the fix for this requires no code change in Drupal. Changing title to reflext that this is a documentation issue.
Comment #8
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedFixed by Steven.
Comment #9
(not verified) CreditAttribution: commented