When I try to search something in russian, i get warnings on the search page.
Warning: htmlspecialchars() [function.htmlspecialchars]: Invalid multibyte sequence in argument в функции check_plain() (строка 1476 в файле /var/www/data.astrobl/includes/bootstrap.inc).
I search for same errors here and found quite the same IIS issue. But I use apache web server under CentOS 5.5. Error generated on 1211 string in search.module, because check_plain function recieves non UTF-8 string (cp1251 encoding). To avoid the error I delete third parameter in htmlspecailchars function on 1476 string of bootstrap.inc.
I suggest that it is hosting related issue, because I don't get these errors on my Debian and Windows servers. Both Mysql and Apache default charsets are utf8.
Apache/2.2.17 (CentOS)
MySQL 5.0.92
Comments
Comment #1
jhodgdonThis sounds like it's an issue with check_plain, so moving to base system.
Comment #2
Coornail CreditAttribution: Coornail commentedIt works for me with the exact same strings.
PHP 5.3.5-1ubuntu7.2 with Suhosin-Patch (cli) (built: May 2 2011 23:00:17)
Apache/2.2.17 (Ubuntu)
MySQL 5.1.54-1ubuntu4
Comment #3
zoo33 CreditAttribution: zoo33 commentedSame issue here, on Drupal 7.2. I have a Swedish single-language site which gives this error intermittently on the search result page:
Warning: htmlspecialchars(): Invalid multibyte sequence in argument i check_plain() (rad 1354 av /path/to/includes/bootstrap.inc).
Sometimes the error is shown just once, sometimes 2–3 copies are shown.
The error happens regardless of weather the user is logged in or not.
Specific search queries always give errors while others never do (I think), but it doesn't seem to have anything to do with special characters:
* "barn" – shows error
* "polis" – no error
* "över" – shows error
* "länder" – no error
On my dev box there are no errors (which makes it hard to debug), but on the live server there is:
Ubuntu 10.04.2 LTS
Apache/2.2.14
PHP 5.3.2, mbstring installed
Comment #4
Heine CreditAttribution: Heine commentedThe text retrieved should not be in cp1251 encoding.
If this happens on node bodies, please check the originating table (eg node_revisions) with a SHOW CREATE TABLE [tablename], and post output here.
Comment #5
zoo33 CreditAttribution: zoo33 commentedNothing suspicious about the tables (they're all in utf-8), but then I realized:
I will have to try and convert it to utf-8 as default character set and see if that helps.
Comment #6
Heine CreditAttribution: Heine commentedIf you've used backup and migrate to restore the database, see #1100146: Problems with the input of diacritical characters (PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value)
Comment #7
lordzik CreditAttribution: lordzik commentedHello,
the same bug exists in Drupal 6 since few months. It was introduced around Drupal 6.17. It hits many users so maybe someone will finaly fix it...?
At the beggining FileField module was suspected but people who doesn't use that module also report this bug: http://drupal.org/node/837322
This is a temporary solution:
http://info4admins.com/warning-htmlspecialchars-expects-parameter-1-be-s...
Does core devs can do something about that?
Comment #8
Heine CreditAttribution: Heine commentedPlease check the originating tables (eg node_revisions) with a SHOW CREATE TABLE [tablename], and post output here.
Please also state how this database came to be. Did you import it? Did you use backup and migrate to restore?
Comment #9
lordzik CreditAttribution: lordzik commentedmysql> show create table node;
| node | CREATE TABLE `node` (
`nid` int(10) unsigned NOT NULL auto_increment,
`vid` int(10) unsigned NOT NULL default '0',
`type` varchar(32) NOT NULL default '',
`title` varchar(255) NOT NULL default '',
`uid` int(11) NOT NULL default '0',
`status` int(11) NOT NULL default '1',
`created` int(11) NOT NULL default '0',
`changed` int(11) NOT NULL default '0',
`comment` int(11) NOT NULL default '0',
`promote` int(11) NOT NULL default '0',
`moderate` int(11) NOT NULL default '0',
`sticky` int(11) NOT NULL default '0',
`language` varchar(12) NOT NULL default '',
`tnid` int(10) unsigned NOT NULL default '0',
`translate` int(11) NOT NULL default '0',
PRIMARY KEY (`nid`),
UNIQUE KEY `vid` (`vid`),
KEY `node_type` (`type`(4)),
KEY `uid` (`uid`),
KEY `node_moderate` (`moderate`),
KEY `node_promote_status` (`promote`,`status`),
KEY `node_created` (`created`),
KEY `node_changed` (`changed`),
KEY `node_status_type` (`status`,`type`,`nid`),
KEY `nid` (`nid`),
KEY `tnid` (`tnid`),
KEY `translate` (`translate`),
KEY `node_title_type` (`title`,`type`(4))
) ENGINE=MyISAM AUTO_INCREMENT=2393 DEFAULT CHARSET=utf8
I sometimes use mysqldump from commandline to dump database and
mysql database < dump.sql
to restore it.It works since early Drupal 5.
Comment #10
Heine CreditAttribution: Heine commentedDrupal 6.17 changed the way how it checked for invalid byte sequences in UTF-8. The new way will generate a warning (goes to log or screen) when an invalid byte sequence is encountered.
This is not a bug, nor did this introduce a bug. The problem is that there's an invalid byte sequence in your database.
This can happen in a number of ways. One is if you dump and import a database on older versions of mysql/mysqldump when you do not use the --default-character-set=utf8 flag.
If you know the problematic string (and it's location in the db), you can use the mysql HEX() function to check for the problematic bytesequence.
Comment #11
lordzik CreditAttribution: lordzik commentedWhat if that error shows on newly created content? Eg. during file upload in IMCE or CCK FileField?
Comment #12
Heine CreditAttribution: Heine commentedThis particular issue was about the search page. If you wish to make this broader fine, but please state this clearly. It might however be a good idea to move a file upload issue to a new issue by itself and provide steps to reproduce.
Comment #13
lordzik CreditAttribution: lordzik commentedIt's definitely a broader issue. Did you take a look at this thread http://drupal.org/node/837322 which i mentioned in #7? I first created it as issue in Drupal core but there was no answer for a while so i've moved it to FileField (but it looks like it's not a problem in FileField). Should i take that thread back to Drupal core?
About 13000 sites were hit by this so it's not only my issue:
http://www.google.pl/search?q=htmlspecialchars()+Invalid+multibyte+sequence+in+argument+in+bootstrap.inc
Comment #14
Heine CreditAttribution: Heine commentedIf there is already an issue for files, please keep the pertinent discussion there and perhaps move to the core queue. This is about the search page.
As I explained above, the warning is not the problem, just a symptom that can be caused by different problems (all related to character encoding). The bug is the invalid sequence passed to check_plain.
If it's in your db (as it seems it is the case here), it needs to be fixed (or backup & migrate needs to be fixed), if it's due to filenames on certain filesystems (NTFS?) we need to fix it in filehandling code.
Comment #15
droplet CreditAttribution: droplet commented#1090290: admin/people/permissions/roles: htmlspecialchars(): Invalid multibyte sequence in argument in check_plain()
Comment #16
Heine CreditAttribution: Heine commentedThis is about search terms, not the roles page. See #10 and #14 about how a similar message doesn't make this have the same root cause.
Comment #17
Heine CreditAttribution: Heine commentedChanging title
Comment #18
WilliamB CreditAttribution: WilliamB commentedSubscribe
Comment #19
droplet CreditAttribution: droplet commentedComment #20
jhodgdontag for backport
Comment #21
Heine CreditAttribution: Heine commentedComment #22
Floop CreditAttribution: Floop commentedAfter a few hours I have at least found out, that this is a search module issue.
Look at #987472: search.module doesn't consistently support multibyte characters. Fixing the search module fixed the error for me.
My site is using czech unicode characters and I edited the check_plain() to log the invalid strings. This is the example what appeared in the log file:
... Jm\xc3\xa9no a p\xc5\x99\xc3\xadjmen\xc3\xad:\xc2 ...
Look at the end. The last unicode character seems to be cut in the middle. I suppose that is the reason why htmlspecialchars throws the error and returns an empty string.
After editing the search.module file the search result appeared correctly and no error was thrown.
By the way the error can be also suppressed with PHP settings instead of touching the core.
Comment #23
Heine CreditAttribution: Heine commentedDuplicate of #987472: search.module doesn't consistently support multibyte characters
Comment #24
jhodgdonWhile I agree that if you fix search.module so that it doesn't feed this data into check_plain, the error goes away, that doesn't change the fact that there is still a problem with check_plain.
Comment #25
Heine CreditAttribution: Heine commentedWhat is the problem with check_plain?
Comment #26
jhodgdonNever mind. You are correct -- the real problem is that characters are being passed into check_plain that are not correctly-encoded UTF-8.
See #23, this is a duplicate after all.