Dear friends,

I've done an upgrade:
1. MySQL 4.0->5.0
2. PHP 4.3 -> 5.1
3. Drupal 4.6.8 -> 4.7.3

The problem looks simple: all content in russian is not readable anymore!

All the content is stored now in the table with the default encoding utf8.
A query "SELECT nid, title FROM node_revisions WHERE nid=13" passed through the client program 'mysql' returns what is expected: utf8-encoded title.
Now let's see node 13 in browser: magically everithyng gets cipherred. It looks very like the data are treated as latin1 or something like this and encoded to utf8 again by PHP or Drupal.

Where this dramatic second encoding takes place?

WBR, Andrei.

Comments

Andrei Toutoukine’s picture

Both php5 and MySQL5 were installed before Drupal's update. With 4.6.8 everything worked fine.

It means that it is Drupal who does the second encoding conversion.

Andrei Toutoukine’s picture

Dear All,

In my case the trouble appeared at the moment of a silent upgrade of MySQL server to 5.0 version.
In fact, the data in tables after this upgrade can appear only with wrong MySQL configuration, which is default for 'mysql' client program and for Drupal 4.6.5, namely:

mysql> show variables like "char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

It means, that client (Drupal) sends latin1 encoded data, which shouldn't be decoded by server. Database is latin1 by default. Server is latin1 by default. Replies should be encoded in latin1!

This situation is changed in Drupal 4.7. It sets all these variables properly, and server sends what it keeps in the tables. It is the reason why I saw "twice encoded data".

Well, solution I found follows.

Take wrongly-encoded tables and dump them wrong way:

host: user $ mysqldump --opt -v --default-character-set=latin1 -u dbuser -p drupal >drupal.sql

note --default-character-set=latin1 option, which is intentionally wrong.

Check, if you have got utf8 data in drupal.sql. Anyhow, probably using

host: user $ cat drupal.sql |iconv -f utf8 -t koi8-r -

Now replace all occurences of 'latin1' string by 'utf8' in drupal.sql, using text editor of your choice and feed the data back to the server:

mysql> drop database drupal; create database drupal default character set utf8;
host: user $ mysql -u dbuser -p drupal <drupal.sql

Now you have correctly encoded tables.

You can check this, using mysql, for example:

host: user $  mysql -u dbuser -p drupal
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1083 to server version: 5.0.22-Debian_3-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.


mysql> set character_set_client="koi8r";
mysql> set character_set_connection="utf8";
mysql> set character_set_database="utf8";
mysql> set character_set_results="koi8r";
mysql> set character_set_server="utf8";
mysql> use drupal;
mysql> select title from node where nid=8;

If your console fonts are koi8-r, then you should see correct characters.

If you are using Drupal 4.6 then your pages will appear incorrect. Add string

     mysql_query('SET NAMES "utf8"', $connection);

in file includes/database.mysql.inc, to function db_connect() just before the return statement.

Now it should work.