Hello,

I have a problem with typing in special characters in Drupal 7. For the translation of my website to a slawic language I need diacritical characters, e.g. ĕ, ć, ń or ř. If I type in these characters in the body field of a content type and save the node, I get the following error:

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC4\x95, \xC4\x87...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 243 [:db_insert_placeholder_2] => 244 [:db_insert_placeholder_3] => page [:db_insert_placeholder_4] => 0 [:db_insert_placeholder_5] => de [:db_insert_placeholder_6] => ĕ ć ń ř [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => filtered_html ) in field_sql_storage_field_storage_write() (Line 425 of .../modules/field/modules/field_sql_storage/field_sql_storage.module).

If I type in a diacritical character in the title field, I get this error message:

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC5\x99' for column 'title' at row 1: UPDATE {node} SET vid=:db_update_placeholder_0, type=:db_update_placeholder_1, language=:db_update_placeholder_2, title=:db_update_placeholder_3, uid=:db_update_placeholder_4, status=:db_update_placeholder_5, created=:db_update_placeholder_6, changed=:db_update_placeholder_7, comment=:db_update_placeholder_8, promote=:db_update_placeholder_9, sticky=:db_update_placeholder_10, tnid=:db_update_placeholder_11, translate=:db_update_placeholder_12 WHERE (nid = :db_condition_placeholder_0) ; Array ( [:db_update_placeholder_0] => 244 [:db_update_placeholder_1] => page [:db_update_placeholder_2] => de [:db_update_placeholder_3] => ř [:db_update_placeholder_4] => 1 [:db_update_placeholder_5] => 1 [:db_update_placeholder_6] => 1300015239 [:db_update_placeholder_7] => 1300017883 [:db_update_placeholder_8] => 1 [:db_update_placeholder_9] => 0 [:db_update_placeholder_10] => 0 [:db_update_placeholder_11] => 0 [:db_update_placeholder_12] => 0 [:db_condition_placeholder_0] => 243 ) in drupal_write_record() (Line 6776 of .../includes/common.inc).

I also get this error, if I type in the diacritical characters as source code (e.g. ř).

The character set of the database the Drupal-installation based on is set to "utf8_general_ci". The response header of the website shows also the charset "utf8".

Does anybody has an idea, how I can resolve this problem? For any help I would be thankful.

Comments

Heine’s picture

Can you connect to the db via a mysql client, execute the following query and post its results here?

SHOW CREATE TABLE field_data_body;
Damien Tournoud’s picture

I cannot reproduce, and "\xC4\x95" definitely looks correct for ĕ, so this points to some trouble on the MySQL side. Are you sure that the database tables have been created using the correct collation? (collation is per column, so check each affected column)

makli’s picture

Hey, thanks for the fast answers.

With the given db-query in the mysql-client I get the following:

CREATE TABLE `field_data_body` (
`entity_type` varchar(128) NOT NULL default '' COMMENT 'The entity type this data is attached to',
`bundle` varchar(128) NOT NULL default '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
`deleted` tinyint(4) NOT NULL default '0' COMMENT 'A boolean indicating whether this data item has been deleted',
`entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
`revision_id` int(10) unsigned default NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
`language` varchar(32) NOT NULL default '' COMMENT 'The language for this data item.',
`delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
`body_value` longtext,
`body_summary` longtext,
`body_format` varchar(255) default NULL,
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
KEY `entity_type` (`entity_type`),
KEY `bundle` (`bundle`),
KEY `deleted` (`deleted`),
KEY `entity_id` (`entity_id`),
KEY `revision_id` (`revision_id`),
KEY `language` (`language`),
KEY `body_format` (`body_format`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

So your approach seems to be right. Unfortunately I'm unfamiliar with mysql-database management. How can I change the collation (possibly all database tables at once)?

Thanks for your quick help so far!

makli’s picture

I solved the problem of converting the database tables collation to "utf8_general_ci" by using this PHP-script:
http://serversupportforum.de/forum/sql/9279-kollation-von-tabellen-aende...

Now the input of diacritical characters works!

yched’s picture

Status: Needs work » Closed (works as designed)

OK, thanks for reporting back. Closing, then

johnv’s picture

Thanks makli, I've been looking for weeks for this error. For future reference, my error was:
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC4\x81ori"...' for column 'data' at row 1: INSERT INTO {ctools_object_cache} (sid, obj, name, data, updated)
and also in table views_data_export_object_cache.
System was trying to store the language 'Maori' in native Maori...
I did not bother to change all tables via the script. I just changed the 'data' column of the 2 tables via PHPMYADMIN.

Heine’s picture

Status: Closed (works as designed) » Active

How were these tables created? By Drupal? Via an import or upgrade?

Setting to active because there might be a bug lurking somewhere.

johnv’s picture

I have websites on 2 locations, encoded in latin1_swedish_ci (which is a strange choice, being in the Netherlands):
- I installed Drupal 7 on Windows via the Acquia Drupal Installer. This ships with D6, but you can easily create create a fresh D7-instance via the 'import'-method. I just created a new D7-DB and enabled ctools: all tables are created in utf8_general_ci, so that seems OK. Strange thing: the last line of phpmyadmin says another encoding:
variable 28 InnoDB utf8_general_ci 16.0 KiB -
watchdog 54 InnoDB utf8_general_ci 80.0 KiB -
75 table(s) Sum 1,680 InnoDB latin1_swedish_ci 4.8 MiB 0 B
- Your PHP MySQL library version 5.0.51a differs from your MySQL server version 5.1.53.
- MySQL connection collation: utf8_general_ci

I am not sure about the origin of my production database, which is encoded in latin1_swedish_ci. It might be an older version of the Acquia Drupal Stack installer, or the webhoster.

Heine, what should be the right format? Is utf8_general_ci OK, or should we change to utf8_unicode_ci, with all the Russian an Chinese contributors nowadays?

g089h515r806’s picture

subscrbe+

dadderley’s picture

I noticed this error when trying to paste in text content.
At first I thought it just was the ckEditor being weird.
I disabled this and pasted content in as plain text. Same error.
Some really generic character was creating the error.
The original content was produced by the Apple Pages application. I thought this might have been producing weird special characters.
So put this text into TextMate and copied it from there. Same problem.

I did a search and found this thread.
I checked the db collation and sure enough, it was set as "latin1_swedish_ci"

I developed a site on my local machine (MAMP/Snow Leopard).
I exported the DB via the Backup and Migrate module.
The exported db had the same collation as the original (utf8_general_ci).

Then I moved it to a test server. I set up a DB there through phpMyadmin.
I installed Drupal 7 and installed the Backup and Migrate module.
I imported the content using Backup and Migrate module.
Somehow, in the process, the db collation wound up as "latin1_swedish_ci".
I am not sure how this happened.

So, now I had a problem with a DB whose tables all were set as "latin1_swedish_ci".
I did a search and came up with the script below @ http://stackoverflow.com/questions/105572/a-script-to-change-all-tables-... :

<?php
// your connection
mysql_connect("localhost","root","***");
mysql_select_db("db1");

// convert code
$res = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_array($res))
{
    foreach ($row as $key => $table)
    {
        mysql_query("ALTER TABLE " . $table . " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci");
        echo $key . " =&gt; " . $table . " CONVERTED<br />";
    }
}
?>

I ran this script.
It dutifully motored through all my db tables and changed the collation to the correct one and produced a nice report while doing so.

I am no longer getting the error.
Hopefully this information will be useful for someone who is having the same problem.
I still do not know for sure how the db got screwed up, but at least I know now how to fix it.

sheba’s picture

I was havin same problem.
Thanks for help.

Damien Tournoud’s picture

Project: Drupal core » Backup and Migrate
Version: 7.0 » 7.x-2.x-dev
Component: field system » Code
Category: support » bug

This is more likely an issue with Backup and Migrate. I'm reassigning them. Feel free to close this if support for collation is already properly implemented in -dev.

johnv’s picture

That might be the case. I use Backup&Migrate to move the data between computers.

cpelham’s picture

My tables are all in UTF8_unicode_ci but I get this error anyway when trying to save a node with some Japanese text in the body. It is not the case with every Japanese node.

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE7' for column 'alias' at row 1: INSERT INTO {url_alias} (source, alias, language) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2); Array ( [:db_insert_placeholder_0] => node/455 [:db_insert_placeholder_1] => � [:db_insert_placeholder_2] => ja ) in drupal_write_record() (line 6776 of /home/cpelham/public_html/dev.crsny.org/public/includes/common.inc).

ronan’s picture

Status: Active » Closed (won't fix)

Backup and migrate does not alter the collation of your database as this is usually set when the database is created and B&M doesn't create databases for you. If there's a great demand for it I'll consider adding a command to the export to alter the database collation, but this'll be an unexpected and possibly unwanted behavior change so I'm going to leave it as is unless somebody wants to convince me otherwise.

Thanks
Ronan

Damien Tournoud’s picture

Status: Closed (won't fix) » Active

Collation is per-database, per-table and per-column in MySQL. The problem here is that Backup & Migrate doesn't support that and lose this information. This is definitely an issue in the module.

steinmb’s picture

Subscribe

steinmb’s picture

@yched mention that this issue should indirectly fix the issue, http://drupal.org/node/1001060#comment-4589172. Anyway, yes I did a "drush cc all" after applying the patch, also restarted Apache to flush out any other caches.

Cheers

steinmb’s picture

Ffffffffh, wrong issue, doh! Pls ignore #18 :)

ronan’s picture

Status: Active » Closed (duplicate)

Yep, it looks like this is caused by the regression described in: #1128620: Module needs to back table properties up as well (DBTNG uses ANSI, TRADITIONAL sql mode which loses table properties)

I think I have a potential fix for that and will try and post a patch or a new dev as soon as I get a chance.

kuson’s picture

Not related to this module, but just this error: I found the problem happen when I tried to use PHP substr function and must have somehow sliced the the UTF8 data in a wrong way, and so when my module tried to save the string to database (via drupal_write_record / db_merge, etc), it puked; When I took the substr function out, It worked. So I'll try to find my way around fitting the data into the database max characters, but I mention here in case it helps anyone.

Just answering my own problem, its mb_substr to use (mb = multibyte). A little elementary :)

Heine’s picture

It's drupal_substr even.

shanid’s picture

Title: Problems with the input of diacritical characters (PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value) » search with "strange" characters (PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value)

I just solved a similar problem (couldn't search for "ü" etc.) by adding 'charset' => 'utf8', to my database connection in settings.php. Hope this helps?

Nice greetings,
Dinah

shanid’s picture

Title: search with "strange" characters (PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value) » Problems with the input of diacritical characters (PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value)
ursula’s picture

I posted a detailed fix for the character set problem in the Feeds issue queue. Might work for this issue as well:
http://drupal.org/node/1140194#comment-4858112

huhwatnou’s picture

In my case the problem was caused by the database's collation that was set to latin1_swedish_ci - changed it to utf8_general_ci. Then altered existing tables with the wrong collation by a "ALTER TABLE `x` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;". (And worked around some issues in existing keys because the new charset may take up to 3 bytes instead of only 1.)

So take care that the database has the correct charset/collation before installation (as stated in the install guide)!
PS Maybe this can be tried to be set by the installer before installation. If setting the charset fails, the user can be informed in the install dialog and the install must stop.

Anonymous’s picture

Hi there!
It happened to me, and I've used some auto-install service provided by my webhost.
I've found very elegant solution to perform directly in database, without uploading any php scripts.
http://muzso.hu/2008/04/09/how-to-change-the-character-set-and-or-collat...
NOTE: It confused me at the first moment, when you perform this query it won't alter anything, it will only print a list of queries you can simply copy+paste next.

holtzermann17’s picture

I found that in addition to changing the settings of the database (which is mentioned on this thread and many other places), I also had to change the settings for the connection (a point that is noted in rather fewer discussions, but cf. the second answer to this question on stackoverflow).

The method for doing this in D7 is explained in settings.php -- here's what I'm using now:

$databases = array (
  'default' => 
  array (
    'default' => 
    array (
      'database' => 'DBNAME',
      'username' => 'DBUSER',
      'password' => 'DBPASS',
      'host' => 'localhost',
      'port' => '',
      'driver' => 'mysql',
      'prefix' => '',
      'pdo' => array(
                     PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8",
                     )
    ),
  ),
);

Another alternative was to use utf8_encode(...) on the relevant content, before writing it to the database, but I think the above works more reliably.

holtzermann17’s picture

Note that to use more "fancy" unicode characters, some further configuration steps are needed!

I documented my experience here: http://drupal.stackexchange.com/questions/50868/configuring-drupal-to-use-unicode-characters/50979

purabdk’s picture

Is anything we can do in db_insert query instead of changing the table Collation or character set.

Checdedon’s picture

Hello, please am having exactly the same issue you once had of creating a new node and getting this error

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC2\x97We p...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 14 [:db_insert_placeholder_2] => 14 [:db_insert_placeholder_3] => page [:db_insert_placeholder_4] => 0 [:db_insert_placeholder_5] => und [:db_insert_placeholder_6] =>

           RAJRAB Ltd is a 100% Indigenous Pharmaceutical Manufacturing Company in active production since 1982.

[:db_insert_placeholder_7] => [:db_insert_placeholder_8] => filtered_html ) in field_sql_storage_field_storage_write() (line 448 of /home/rajrabor/public_html/modules/field/modules/field_sql_storage/field_sql_storage.module).

But please dadderley also posted a script

// your connection
mysql_connect("localhost","root","***");
mysql_select_db("db1");
// convert code
$res = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_array($res))
{
    foreach ($row as $key => $table)
    {
        mysql_query("ALTER TABLE " . $table . " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci");
        echo $key . " =&gt; " . $table . " CONVERTED<br />";
    }
}

but I dont know how to run a script please help me out
I have checked the "server connection collation: and it is utf8mb4_general_ci but if I should open the database, all tables are with collation Latin1_general_ci . I tried changing the collation to utf8_general_ci but it always automatically set it self back to utf8mb4_general_ci

Please I need serious help

jaesperanza’s picture

Before doing all suggested, do utf8 conversion using notepad++ did the trick, sheesh! this saves you all the trouble and reading! :)

jelo’s picture

I have the same issue. Error: PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x98\x82\xF0\x9F...' for column 'comment_body_value' at row 1: INSERT INTO {field_data_comment_body}

I used http://ddecode.com/hexdecoder/ to find out which strings are causing the problem. Apparently, some of my users in advanced forum are trying to use smilies in their forum replies (comments) which leads to this error...

I used B&M as well to move sites between environments and am just now discovering that the collation is messed up.
collation_connection = utf8_general_ci
collation_database = latin1_swedish_ci
collation_server = latin1_swedish_ci

jelo’s picture

I think we should move this to a core issue. I just caused d.o to throw a 500 error by trying to enter this character: &#120 584; here in the comments (I had to put a space in to stop the site from crashing).
If you go to http://www.html-entities.org/, decode it and copy it, try to post a comment with the character and Drupal gives up. I don't think that is desired behaviour, is it?

leendertdb’s picture

Hi all, if anyone else encounters this problem on one of their sites you can use this small update hook in a custom module in the .install file. After you have done that run update.php and your problems should go away :-). This update hook changes the collation of all tables in the database to utf8_unicode_ci which should work with all types of special (accented) characters. This update hook should also make deploying easy.

Note, change "hook_update_n()" to match the name of your custom module and correct version. If your custom module is called "my_module" and this is the first update hook, name the function: "my_module_update_7001()".

/**
 * Change collation of all fields in all tables to utf8_unicode_ci to allow special chars in node title and body etc
 */
function hook_update_n() {
  // NOTE: we use utf8_unicode_ci instead of utf8_general_ci since the latter is not accurate in some languages
  // See: http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
  $report = '';
  $res = db_query("SHOW TABLES");
  foreach ($res as $row)
  {
      foreach ($row as $key => $table)
      {
          db_query('ALTER TABLE ' . $table . ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci');
          $report .= $key . ' =&gt; ' . $table . ' CONVERTED<br />';
      }
  }
  return $report;
}
jelo’s picture

FYI, I came across this post which probably describes the underlying issue here:
http://www.midwesternmac.com/blogs/jeff-geerling/solving-emoji-problem-d...

It links to these issues on d.o:
https://www.drupal.org/node/2488180
https://www.drupal.org/node/1314214

sanjayk’s picture

RAWDESK’s picture

#35 solved it also for Spanish diacritical characters like í in Villamartín.

jelo’s picture

This should likely have addressed this issue
https://www.drupal.org/node/2754539