Posted by Andrei Toutoukine on September 8, 2006 at 8:41am
Dear friends,
I've done an upgrade:
1. MySQL 4.0->5.0
2. PHP 4.3 -> 5.1
3. Drupal 4.6.8 -> 4.7.3
The problem looks simple: all content in russian is not readable anymore!
All the content is stored now in the table with the default encoding utf8.
A query "SELECT nid, title FROM node_revisions WHERE nid=13" passed through the client program 'mysql' returns what is expected: utf8-encoded title.
Now let's see node 13 in browser: magically everithyng gets cipherred. It looks very like the data are treated as latin1 or something like this and encoded to utf8 again by PHP or Drupal.
Where this dramatic second encoding takes place?
WBR, Andrei.
Comments
Thinking a bit more
Both php5 and MySQL5 were installed before Drupal's update. With 4.6.8 everything worked fine.
It means that it is Drupal who does the second encoding conversion.
The trouble catched!
Dear All,
In my case the trouble appeared at the moment of a silent upgrade of MySQL server to 5.0 version.
In fact, the data in tables after this upgrade can appear only with wrong MySQL configuration, which is default for 'mysql' client program and for Drupal 4.6.5, namely:
mysql> show variables like "char%";+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
It means, that client (Drupal) sends latin1 encoded data, which shouldn't be decoded by server. Database is latin1 by default. Server is latin1 by default. Replies should be encoded in latin1!
This situation is changed in Drupal 4.7. It sets all these variables properly, and server sends what it keeps in the tables. It is the reason why I saw "twice encoded data".
Well, solution I found follows.
Take wrongly-encoded tables and dump them wrong way:
host: user $ mysqldump --opt -v --default-character-set=latin1 -u dbuser -p drupal >drupal.sqlnote --default-character-set=latin1 option, which is intentionally wrong.
Check, if you have got utf8 data in drupal.sql. Anyhow, probably using
host: user $ cat drupal.sql |iconv -f utf8 -t koi8-r -Now replace all occurences of 'latin1' string by 'utf8' in drupal.sql, using text editor of your choice and feed the data back to the server:
mysql> drop database drupal; create database drupal default character set utf8;host: user $ mysql -u dbuser -p drupal <drupal.sql
Now you have correctly encoded tables.
You can check this, using mysql, for example:
host: user $ mysql -u dbuser -p drupal
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1083 to server version: 5.0.22-Debian_3-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> set character_set_client="koi8r";
mysql> set character_set_connection="utf8";
mysql> set character_set_database="utf8";
mysql> set character_set_results="koi8r";
mysql> set character_set_server="utf8";
mysql> use drupal;
mysql> select title from node where nid=8;
If your console fonts are koi8-r, then you should see correct characters.
If you are using Drupal 4.6 then your pages will appear incorrect. Add string
<?phpmysql_query('SET NAMES "utf8"', $connection);
?>
in file includes/database.mysql.inc, to function db_connect() just before the return statement.
Now it should work.