Looking at the CVS logs, this bug seems to be affecting 4.4.0 and HEAD too.
The problem is that the substr() used for extracting a subject from the comment string is unfortunately not utf-8 aware, and therefore it could split strings in the middle of utf-8 multibyte sequences. This produces broken output, and strange display in some browsers (Mozilla 1.5 just hides the bogus title from me). Since PHP does not provide a generic solution for handling utf-8 strings (except mbstring, which Drupal should not specify as a requirement IMHO), I guess we need to add some utf-8 substring functionality into common.inc (so other modules can also use it).
I am willing to work on providing a patch, if this approach is acceptable.
Here is some explanation on how utf-8 multibyte sequences can be detected: http://www.frech.ch/man/man7/utf8.7.html
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | truncate_utf8.patch | 1.12 KB | killes@www.drop.org |
Comments
Comment #1
Steven commentedI whipped up a UTF-8-safe truncator which should work for the problem areas.
Works fine for a couple of test strings here, don't have time to whip up a full patch.
Comment #2
killes@www.drop.org commentedhere is a patch
Comment #3
Steven commentedCommitted a modified version of this function to CVS: the problem applied to many more places than just comment.module.
Comment #4
(not verified) commented