Hello,
I have a problem with typing in special characters in Drupal 7. For the translation of my website to a slawic language I need diacritical characters, e.g. ĕ, ć, ń or ř. If I type in these characters in the body field of a content type and save the node, I get the following error:
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC4\x95, \xC4\x87...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 243 [:db_insert_placeholder_2] => 244 [:db_insert_placeholder_3] => page [:db_insert_placeholder_4] => 0 [:db_insert_placeholder_5] => de [:db_insert_placeholder_6] => ĕ ć ń ř [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => filtered_html ) in field_sql_storage_field_storage_write() (Line 425 of .../modules/field/modules/field_sql_storage/field_sql_storage.module).
If I type in a diacritical character in the title field, I get this error message:
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC5\x99' for column 'title' at row 1: UPDATE {node} SET vid=:db_update_placeholder_0, type=:db_update_placeholder_1, language=:db_update_placeholder_2, title=:db_update_placeholder_3, uid=:db_update_placeholder_4, status=:db_update_placeholder_5, created=:db_update_placeholder_6, changed=:db_update_placeholder_7, comment=:db_update_placeholder_8, promote=:db_update_placeholder_9, sticky=:db_update_placeholder_10, tnid=:db_update_placeholder_11, translate=:db_update_placeholder_12 WHERE (nid = :db_condition_placeholder_0) ; Array ( [:db_update_placeholder_0] => 244 [:db_update_placeholder_1] => page [:db_update_placeholder_2] => de [:db_update_placeholder_3] => ř [:db_update_placeholder_4] => 1 [:db_update_placeholder_5] => 1 [:db_update_placeholder_6] => 1300015239 [:db_update_placeholder_7] => 1300017883 [:db_update_placeholder_8] => 1 [:db_update_placeholder_9] => 0 [:db_update_placeholder_10] => 0 [:db_update_placeholder_11] => 0 [:db_update_placeholder_12] => 0 [:db_condition_placeholder_0] => 243 ) in drupal_write_record() (Line 6776 of .../includes/common.inc).
I also get this error, if I type in the diacritical characters as source code (e.g. ř).
The character set of the database the Drupal-installation based on is set to "utf8_general_ci". The response header of the website shows also the charset "utf8".
Does anybody has an idea, how I can resolve this problem? For any help I would be thankful.
Comments
Comment #1
Heine CreditAttribution: Heine commentedCan you connect to the db via a mysql client, execute the following query and post its results here?
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedI cannot reproduce, and "\xC4\x95" definitely looks correct for ĕ, so this points to some trouble on the MySQL side. Are you sure that the database tables have been created using the correct collation? (collation is per column, so check each affected column)
Comment #3
makli CreditAttribution: makli commentedHey, thanks for the fast answers.
With the given db-query in the mysql-client I get the following:
CREATE TABLE `field_data_body` (
`entity_type` varchar(128) NOT NULL default '' COMMENT 'The entity type this data is attached to',
`bundle` varchar(128) NOT NULL default '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
`deleted` tinyint(4) NOT NULL default '0' COMMENT 'A boolean indicating whether this data item has been deleted',
`entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
`revision_id` int(10) unsigned default NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
`language` varchar(32) NOT NULL default '' COMMENT 'The language for this data item.',
`delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
`body_value` longtext,
`body_summary` longtext,
`body_format` varchar(255) default NULL,
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
KEY `entity_type` (`entity_type`),
KEY `bundle` (`bundle`),
KEY `deleted` (`deleted`),
KEY `entity_id` (`entity_id`),
KEY `revision_id` (`revision_id`),
KEY `language` (`language`),
KEY `body_format` (`body_format`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
So your approach seems to be right. Unfortunately I'm unfamiliar with mysql-database management. How can I change the collation (possibly all database tables at once)?
Thanks for your quick help so far!
Comment #4
makli CreditAttribution: makli commentedI solved the problem of converting the database tables collation to "utf8_general_ci" by using this PHP-script:
http://serversupportforum.de/forum/sql/9279-kollation-von-tabellen-aende...
Now the input of diacritical characters works!
Comment #5
yched CreditAttribution: yched commentedOK, thanks for reporting back. Closing, then
Comment #6
johnvThanks makli, I've been looking for weeks for this error. For future reference, my error was:
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC4\x81ori"...' for column 'data' at row 1: INSERT INTO {ctools_object_cache} (sid, obj, name, data, updated)
and also in table views_data_export_object_cache.
System was trying to store the language 'Maori' in native Maori...
I did not bother to change all tables via the script. I just changed the 'data' column of the 2 tables via PHPMYADMIN.
Comment #7
Heine CreditAttribution: Heine commentedHow were these tables created? By Drupal? Via an import or upgrade?
Setting to active because there might be a bug lurking somewhere.
Comment #8
johnvI have websites on 2 locations, encoded in latin1_swedish_ci (which is a strange choice, being in the Netherlands):
- I installed Drupal 7 on Windows via the Acquia Drupal Installer. This ships with D6, but you can easily create create a fresh D7-instance via the 'import'-method. I just created a new D7-DB and enabled ctools: all tables are created in utf8_general_ci, so that seems OK. Strange thing: the last line of phpmyadmin says another encoding:
variable 28 InnoDB utf8_general_ci 16.0 KiB -
watchdog 54 InnoDB utf8_general_ci 80.0 KiB -
75 table(s) Sum 1,680 InnoDB latin1_swedish_ci 4.8 MiB 0 B
- Your PHP MySQL library version 5.0.51a differs from your MySQL server version 5.1.53.
- MySQL connection collation: utf8_general_ci
I am not sure about the origin of my production database, which is encoded in latin1_swedish_ci. It might be an older version of the Acquia Drupal Stack installer, or the webhoster.
Heine, what should be the right format? Is utf8_general_ci OK, or should we change to utf8_unicode_ci, with all the Russian an Chinese contributors nowadays?
Comment #9
g089h515r806 CreditAttribution: g089h515r806 commentedsubscrbe+
Comment #10
dadderley CreditAttribution: dadderley commentedI noticed this error when trying to paste in text content.
At first I thought it just was the ckEditor being weird.
I disabled this and pasted content in as plain text. Same error.
Some really generic character was creating the error.
The original content was produced by the Apple Pages application. I thought this might have been producing weird special characters.
So put this text into TextMate and copied it from there. Same problem.
I did a search and found this thread.
I checked the db collation and sure enough, it was set as "latin1_swedish_ci"
I developed a site on my local machine (MAMP/Snow Leopard).
I exported the DB via the Backup and Migrate module.
The exported db had the same collation as the original (utf8_general_ci).
Then I moved it to a test server. I set up a DB there through phpMyadmin.
I installed Drupal 7 and installed the Backup and Migrate module.
I imported the content using Backup and Migrate module.
Somehow, in the process, the db collation wound up as "latin1_swedish_ci".
I am not sure how this happened.
So, now I had a problem with a DB whose tables all were set as "latin1_swedish_ci".
I did a search and came up with the script below @ http://stackoverflow.com/questions/105572/a-script-to-change-all-tables-... :
I ran this script.
It dutifully motored through all my db tables and changed the collation to the correct one and produced a nice report while doing so.
I am no longer getting the error.
Hopefully this information will be useful for someone who is having the same problem.
I still do not know for sure how the db got screwed up, but at least I know now how to fix it.
Comment #11
sheba CreditAttribution: sheba commentedI was havin same problem.
Thanks for help.
Comment #12
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis is more likely an issue with Backup and Migrate. I'm reassigning them. Feel free to close this if support for collation is already properly implemented in -dev.
Comment #13
johnvThat might be the case. I use Backup&Migrate to move the data between computers.
Comment #14
cpelham CreditAttribution: cpelham commentedMy tables are all in UTF8_unicode_ci but I get this error anyway when trying to save a node with some Japanese text in the body. It is not the case with every Japanese node.
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE7' for column 'alias' at row 1: INSERT INTO {url_alias} (source, alias, language) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2); Array ( [:db_insert_placeholder_0] => node/455 [:db_insert_placeholder_1] => � [:db_insert_placeholder_2] => ja ) in drupal_write_record() (line 6776 of /home/cpelham/public_html/dev.crsny.org/public/includes/common.inc).
Comment #15
ronan CreditAttribution: ronan commentedBackup and migrate does not alter the collation of your database as this is usually set when the database is created and B&M doesn't create databases for you. If there's a great demand for it I'll consider adding a command to the export to alter the database collation, but this'll be an unexpected and possibly unwanted behavior change so I'm going to leave it as is unless somebody wants to convince me otherwise.
Thanks
Ronan
Comment #16
Damien Tournoud CreditAttribution: Damien Tournoud commentedCollation is per-database, per-table and per-column in MySQL. The problem here is that Backup & Migrate doesn't support that and lose this information. This is definitely an issue in the module.
Comment #17
steinmb CreditAttribution: steinmb commentedSubscribe
Comment #18
steinmb CreditAttribution: steinmb commented@yched mention that this issue should indirectly fix the issue, http://drupal.org/node/1001060#comment-4589172. Anyway, yes I did a "drush cc all" after applying the patch, also restarted Apache to flush out any other caches.
Cheers
Comment #19
steinmb CreditAttribution: steinmb commentedFfffffffh, wrong issue, doh! Pls ignore #18 :)
Comment #20
ronan CreditAttribution: ronan commentedYep, it looks like this is caused by the regression described in: #1128620: Module needs to back table properties up as well (DBTNG uses ANSI, TRADITIONAL sql mode which loses table properties)
I think I have a potential fix for that and will try and post a patch or a new dev as soon as I get a chance.
Comment #21
kuson CreditAttribution: kuson commentedNot related to this module, but just this error: I found the problem happen when I tried to use PHP substr function and must have somehow sliced the the UTF8 data in a wrong way, and so when my module tried to save the string to database (via drupal_write_record / db_merge, etc), it puked; When I took the substr function out, It worked. So I'll try to find my way around fitting the data into the database max characters, but I mention here in case it helps anyone.
Just answering my own problem, its mb_substr to use (mb = multibyte). A little elementary :)
Comment #22
Heine CreditAttribution: Heine commentedIt's drupal_substr even.
Comment #23
shanid CreditAttribution: shanid commentedI just solved a similar problem (couldn't search for "ü" etc.) by adding
'charset' => 'utf8',
to my database connection in settings.php. Hope this helps?Nice greetings,
Dinah
Comment #24
shanid CreditAttribution: shanid commentedComment #25
ursula CreditAttribution: ursula commentedI posted a detailed fix for the character set problem in the Feeds issue queue. Might work for this issue as well:
http://drupal.org/node/1140194#comment-4858112
Comment #26
huhwatnou CreditAttribution: huhwatnou commentedIn my case the problem was caused by the database's collation that was set to latin1_swedish_ci - changed it to utf8_general_ci. Then altered existing tables with the wrong collation by a "ALTER TABLE `x` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;". (And worked around some issues in existing keys because the new charset may take up to 3 bytes instead of only 1.)
So take care that the database has the correct charset/collation before installation (as stated in the install guide)!
PS Maybe this can be tried to be set by the installer before installation. If setting the charset fails, the user can be informed in the install dialog and the install must stop.
Comment #27
Anonymous (not verified) CreditAttribution: Anonymous commentedHi there!
It happened to me, and I've used some auto-install service provided by my webhost.
I've found very elegant solution to perform directly in database, without uploading any php scripts.
http://muzso.hu/2008/04/09/how-to-change-the-character-set-and-or-collat...
NOTE: It confused me at the first moment, when you perform this query it won't alter anything, it will only print a list of queries you can simply copy+paste next.
Comment #28
holtzermann17 CreditAttribution: holtzermann17 commentedI found that in addition to changing the settings of the database (which is mentioned on this thread and many other places), I also had to change the settings for the connection (a point that is noted in rather fewer discussions, but cf. the second answer to this question on stackoverflow).
The method for doing this in D7 is explained in settings.php -- here's what I'm using now:
Another alternative was to use utf8_encode(...) on the relevant content, before writing it to the database, but I think the above works more reliably.
Comment #29
holtzermann17 CreditAttribution: holtzermann17 commentedNote that to use more "fancy" unicode characters, some further configuration steps are needed!
I documented my experience here: http://drupal.stackexchange.com/questions/50868/configuring-drupal-to-use-unicode-characters/50979
Comment #30
purabdk CreditAttribution: purabdk commentedIs anything we can do in db_insert query instead of changing the table Collation or character set.
Comment #31
Checdedon CreditAttribution: Checdedon commentedHello, please am having exactly the same issue you once had of creating a new node and getting this error
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC2\x97We p...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 14 [:db_insert_placeholder_2] => 14 [:db_insert_placeholder_3] => page [:db_insert_placeholder_4] => 0 [:db_insert_placeholder_5] => und [:db_insert_placeholder_6] =>
RAJRAB Ltd is a 100% Indigenous Pharmaceutical Manufacturing Company in active production since 1982.
[:db_insert_placeholder_7] => [:db_insert_placeholder_8] => filtered_html ) in field_sql_storage_field_storage_write() (line 448 of /home/rajrabor/public_html/modules/field/modules/field_sql_storage/field_sql_storage.module).
But please dadderley also posted a script
but I dont know how to run a script please help me out
I have checked the "server connection collation: and it is utf8mb4_general_ci but if I should open the database, all tables are with collation Latin1_general_ci . I tried changing the collation to utf8_general_ci but it always automatically set it self back to utf8mb4_general_ci
Please I need serious help
Comment #32
jaesperanza CreditAttribution: jaesperanza commentedBefore doing all suggested, do utf8 conversion using notepad++ did the trick, sheesh! this saves you all the trouble and reading! :)
Comment #33
jelo CreditAttribution: jelo commentedI have the same issue. Error: PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x98\x82\xF0\x9F...' for column 'comment_body_value' at row 1: INSERT INTO {field_data_comment_body}
I used http://ddecode.com/hexdecoder/ to find out which strings are causing the problem. Apparently, some of my users in advanced forum are trying to use smilies in their forum replies (comments) which leads to this error...
I used B&M as well to move sites between environments and am just now discovering that the collation is messed up.
collation_connection = utf8_general_ci
collation_database = latin1_swedish_ci
collation_server = latin1_swedish_ci
Comment #34
jelo CreditAttribution: jelo commentedI think we should move this to a core issue. I just caused d.o to throw a 500 error by trying to enter this character: x 584; here in the comments (I had to put a space in to stop the site from crashing).
If you go to http://www.html-entities.org/, decode it and copy it, try to post a comment with the character and Drupal gives up. I don't think that is desired behaviour, is it?
Comment #35
leendertdb CreditAttribution: leendertdb commentedHi all, if anyone else encounters this problem on one of their sites you can use this small update hook in a custom module in the .install file. After you have done that run update.php and your problems should go away :-). This update hook changes the collation of all tables in the database to utf8_unicode_ci which should work with all types of special (accented) characters. This update hook should also make deploying easy.
Note, change "hook_update_n()" to match the name of your custom module and correct version. If your custom module is called "my_module" and this is the first update hook, name the function: "my_module_update_7001()".
Comment #36
jelo CreditAttribution: jelo commentedFYI, I came across this post which probably describes the underlying issue here:
http://www.midwesternmac.com/blogs/jeff-geerling/solving-emoji-problem-d...
It links to these issues on d.o:
https://www.drupal.org/node/2488180
https://www.drupal.org/node/1314214
Comment #37
sanjayk CreditAttribution: sanjayk commentedPlease check https://www.drupal.org/node/1824506#comment-12016280
Comment #38
RAWDESK CreditAttribution: RAWDESK for Colruyt Group Services commented#35 solved it also for Spanish diacritical characters like í in Villamartín.
Comment #39
jelo CreditAttribution: jelo commentedThis should likely have addressed this issue
https://www.drupal.org/node/2754539