When I try to search something in russian, i get warnings on the search page.

Warning: htmlspecialchars() [function.htmlspecialchars]: Invalid multibyte sequence in argument в функции check_plain() (строка 1476 в файле /var/www/data.astrobl/includes/bootstrap.inc).

I search for same errors here and found quite the same IIS issue. But I use apache web server under CentOS 5.5. Error generated on 1211 string in search.module, because check_plain function recieves non UTF-8 string (cp1251 encoding). To avoid the error I delete third parameter in htmlspecailchars function on 1476 string of bootstrap.inc.

I suggest that it is hosting related issue, because I don't get these errors on my Debian and Windows servers. Both Mysql and Apache default charsets are utf8.
Apache/2.2.17 (CentOS)
MySQL 5.0.92

Comments

jhodgdon’s picture

Title: Invalid multibyte sequence warning on the search page » Invalid multibyte sequence warning from check_plain
Component: search.module » base system

This sounds like it's an issue with check_plain, so moving to base system.

Coornail’s picture

It works for me with the exact same strings.

PHP 5.3.5-1ubuntu7.2 with Suhosin-Patch (cli) (built: May 2 2011 23:00:17)
Apache/2.2.17 (Ubuntu)
MySQL 5.1.54-1ubuntu4

zoo33’s picture

Version: 7.0 » 7.2

Same issue here, on Drupal 7.2. I have a Swedish single-language site which gives this error intermittently on the search result page:

Warning: htmlspecialchars(): Invalid multibyte sequence in argument i check_plain() (rad 1354 av /path/to/includes/bootstrap.inc).

Sometimes the error is shown just once, sometimes 2–3 copies are shown.

The error happens regardless of weather the user is logged in or not.

Specific search queries always give errors while others never do (I think), but it doesn't seem to have anything to do with special characters:

* "barn" – shows error
* "polis" – no error
* "över" – shows error
* "länder" – no error

On my dev box there are no errors (which makes it hard to debug), but on the live server there is:

Ubuntu 10.04.2 LTS
Apache/2.2.14
PHP 5.3.2, mbstring installed

Heine’s picture

The text retrieved should not be in cp1251 encoding.

If this happens on node bodies, please check the originating table (eg node_revisions) with a SHOW CREATE TABLE [tablename], and post output here.

zoo33’s picture

Nothing suspicious about the tables (they're all in utf-8), but then I realized:

# SHOW CREATE DATABASE mydb;

CREATE DATABASE `mydb` /*!40100 DEFAULT CHARACTER SET latin1 */

I will have to try and convert it to utf-8 as default character set and see if that helps.

Heine’s picture

lordzik’s picture

Hello,
the same bug exists in Drupal 6 since few months. It was introduced around Drupal 6.17. It hits many users so maybe someone will finaly fix it...?
At the beggining FileField module was suspected but people who doesn't use that module also report this bug: http://drupal.org/node/837322

This is a temporary solution:
http://info4admins.com/warning-htmlspecialchars-expects-parameter-1-be-s...

Does core devs can do something about that?

Heine’s picture

Please check the originating tables (eg node_revisions) with a SHOW CREATE TABLE [tablename], and post output here.

Please also state how this database came to be. Did you import it? Did you use backup and migrate to restore?

lordzik’s picture

mysql> show create table node;
| node | CREATE TABLE `node` (
`nid` int(10) unsigned NOT NULL auto_increment,
`vid` int(10) unsigned NOT NULL default '0',
`type` varchar(32) NOT NULL default '',
`title` varchar(255) NOT NULL default '',
`uid` int(11) NOT NULL default '0',
`status` int(11) NOT NULL default '1',
`created` int(11) NOT NULL default '0',
`changed` int(11) NOT NULL default '0',
`comment` int(11) NOT NULL default '0',
`promote` int(11) NOT NULL default '0',
`moderate` int(11) NOT NULL default '0',
`sticky` int(11) NOT NULL default '0',
`language` varchar(12) NOT NULL default '',
`tnid` int(10) unsigned NOT NULL default '0',
`translate` int(11) NOT NULL default '0',
PRIMARY KEY (`nid`),
UNIQUE KEY `vid` (`vid`),
KEY `node_type` (`type`(4)),
KEY `uid` (`uid`),
KEY `node_moderate` (`moderate`),
KEY `node_promote_status` (`promote`,`status`),
KEY `node_created` (`created`),
KEY `node_changed` (`changed`),
KEY `node_status_type` (`status`,`type`,`nid`),
KEY `nid` (`nid`),
KEY `tnid` (`tnid`),
KEY `translate` (`translate`),
KEY `node_title_type` (`title`,`type`(4))
) ENGINE=MyISAM AUTO_INCREMENT=2393 DEFAULT CHARSET=utf8

I sometimes use mysqldump from commandline to dump database and mysql database < dump.sql to restore it.
It works since early Drupal 5.

Heine’s picture

Status: Active » Closed (won't fix)

the same bug exists in Drupal 6 since few months. It was introduced around Drupal 6.17. It hits many users so maybe someone will finaly fix it...?

Drupal 6.17 changed the way how it checked for invalid byte sequences in UTF-8. The new way will generate a warning (goes to log or screen) when an invalid byte sequence is encountered.

This is not a bug, nor did this introduce a bug. The problem is that there's an invalid byte sequence in your database.

This can happen in a number of ways. One is if you dump and import a database on older versions of mysql/mysqldump when you do not use the --default-character-set=utf8 flag.

If you know the problematic string (and it's location in the db), you can use the mysql HEX() function to check for the problematic bytesequence.

lordzik’s picture

Status: Closed (won't fix) » Active

What if that error shows on newly created content? Eg. during file upload in IMCE or CCK FileField?

Heine’s picture

This particular issue was about the search page. If you wish to make this broader fine, but please state this clearly. It might however be a good idea to move a file upload issue to a new issue by itself and provide steps to reproduce.

lordzik’s picture

It's definitely a broader issue. Did you take a look at this thread http://drupal.org/node/837322 which i mentioned in #7? I first created it as issue in Drupal core but there was no answer for a while so i've moved it to FileField (but it looks like it's not a problem in FileField). Should i take that thread back to Drupal core?
About 13000 sites were hit by this so it's not only my issue:
http://www.google.pl/search?q=htmlspecialchars()+Invalid+multibyte+sequence+in+argument+in+bootstrap.inc

Heine’s picture

If there is already an issue for files, please keep the pertinent discussion there and perhaps move to the core queue. This is about the search page.

As I explained above, the warning is not the problem, just a symptom that can be caused by different problems (all related to character encoding). The bug is the invalid sequence passed to check_plain.

If it's in your db (as it seems it is the case here), it needs to be fixed (or backup & migrate needs to be fixed), if it's due to filenames on certain filesystems (NTFS?) we need to fix it in filehandling code.

droplet’s picture

Heine’s picture

Status: Closed (duplicate) » Active

This is about search terms, not the roles page. See #10 and #14 about how a similar message doesn't make this have the same root cause.

Heine’s picture

Title: Invalid multibyte sequence warning from check_plain » Invalid multibyte sequence warning from check_plain when searching

Changing title

WilliamB’s picture

Subscribe

droplet’s picture

Version: 7.2 » 8.x-dev
jhodgdon’s picture

Issue tags: +Needs backport to D7

tag for backport

Heine’s picture

Status: Active » Postponed (maintainer needs more info)
Floop’s picture

Version: 8.x-dev » 7.8
Component: base system » search.module
Status: Postponed (maintainer needs more info) » Active

After a few hours I have at least found out, that this is a search module issue.
Look at #987472: search.module doesn't consistently support multibyte characters. Fixing the search module fixed the error for me.

My site is using czech unicode characters and I edited the check_plain() to log the invalid strings. This is the example what appeared in the log file:
... Jm\xc3\xa9no a p\xc5\x99\xc3\xadjmen\xc3\xad:\xc2 ...
Look at the end. The last unicode character seems to be cut in the middle. I suppose that is the reason why htmlspecialchars throws the error and returns an empty string.

After editing the search.module file the search result appeared correctly and no error was thrown.

By the way the error can be also suppressed with PHP settings instead of touching the core.

Heine’s picture

Status: Active » Closed (duplicate)
jhodgdon’s picture

Version: 7.8 » 8.x-dev
Component: search.module » base system
Status: Closed (duplicate) » Active

While I agree that if you fix search.module so that it doesn't feed this data into check_plain, the error goes away, that doesn't change the fact that there is still a problem with check_plain.

Heine’s picture

What is the problem with check_plain?

jhodgdon’s picture

Status: Active » Closed (duplicate)

Never mind. You are correct -- the real problem is that characters are being passed into check_plain that are not correctly-encoded UTF-8.

See #23, this is a duplicate after all.