Closed (fixed)
Project:
Drupal core
Component:
base system
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Reporter:
Anonymous (not verified)
Created:
19 Jul 2003 at 05:18 UTC
Updated:
6 Aug 2004 at 19:20 UTC
In drupal, substr() function is used in many place.
But it does not consider multi-byte strings.
In utf-8, characters are encoded from 1 byte to 3 bytes. For example, 'U+0041'(alphabet 'A') is encoded as "0x41", and 'U+AC00'(가(=ga) in Korean) is encoded as "0xEA 0xB0 0x80".
If you call "substr('0x41 0x41 0xEA 0xB0 0x80 0x41', 0, 3)", it returns a broken(!) string "0x41 0x41 0xEA". It should be trimmed to "0x41 0x41" or something.
Comments
Comment #1
moshe weitzman commentedAt bottom of this PHP manual page, a chinese user proposes a replacement for substr().
I don't know how valid this solution is.
Comment #2
(not verified) commentedThe suggested method does work only for the EUC encoding.
This bug is not only related to asian languages. non-ASCII characters, such as accent grave in French or umlaut in German, also cause the problem.
Comment #3
cdpark commentedmb_strcut() is the solution. It is only supported for (php 4 >= 4.0.6). Becuase it is an extended module, it may not be available.
http://www.php.net/manual/function.mb-strcut.php
We may need to backport(or reinvent) this routine.
Bug #2230 is also related.
Comment #4
cdpark commentedInstead of
substr($str, 0, $length), use this function instead. It may solve the problem.Comment #5
cdpark commentedComment #6
al commentedThe proper solution to this problem is to compile PHP with multibyte string support (--enable-mbstring) [see http://www.php.net/manual/en/ref.mbstring.php] and specify mbstring.func_overload in PHP.ini and/or .htaccess to be equal to 7 (overload on all functions).
--enable-mbstring is supposed to be enabled by default on PHP 4.3+, but the comment at the bottom of that page seems to imply that it actually isn't.
Comment #7
moshe weitzman commentedAl suggests that the fix for this requires no code change in Drupal. Changing title to reflext that this is a documentation issue.
Comment #8
killes@www.drop.org commentedFixed by Steven.
Comment #9
(not verified) commented