Drupal isn't setting the connection character set when creating and using a MySQL 4.1 database. Because all of Drupal is UTF-8, it should do "SET CHARACTER SET utf8" at the begining of each MySQL connection, if it detects that it's connecting to a MySQL 4.1 database. Also we need to instruct users to make the database with CREATE DATABASE drupal CHARACTER SET utf8.

Comments

jvandyk’s picture

This still applies to cvs. INSTALL.txt and database.mysql.inc do not reflect these changes.

damien tournoud’s picture

Until all the modules use utf8-compliant functions, we need to use "binary" collation on several columns, at least in tables "cache" and "search_*". That's because in a utf8 connection, not all character strings are valid.

Until this is fixed, it's probably better to use a latin1 charset connection, and to modifiy the database creation script to use "latin1" collation.

morbus iff’s picture

Priority: Normal » Critical

damz: are those two the only tables that "matter"?

Besides the connection string that anonymous mentions (of which I know little about it), we can convert the default encoding of the database and tables using the following commands in MySQL 4.1.

ALTER DATABASE database CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE {each table name} CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

More talk about this issue here: http://www.cesspit.net/drupal/node/897.

Since this particular bug can corrupt backups, I'm setting it to critical. A broken backup is very very bad.

morbus iff’s picture

Priority: Critical » Normal

Changing back to normal. Further testing is needed.

jozef’s picture

Version: » 4.6.3

Here are results of my experiment, but i am drupal newbie, correct me if i am wrong
1. My ISP has MySQL 4.1 set to default values:
collation_database=cp1250_general_ci
collation_server=cp1250_general_ci
character_set_database=cp1250
character_set_server=cp1250
2. I can say Drupal 4.6.3 works with slovak language, but
3. cron.php completes successfully but the search_ tables are empty, therefore search module does not work.
4. trip_search works good, as well with slovak characters
5. ALTER DATABASE database CHARACTER SET utf8 COLLATE utf8_general_ci;
solved the cron troubles, but
6. it is no more possible input slovak characters into content (page, story, ...)
7. RESUME: i will use 8bit encoding and trip_search with MySQL 4.1

magico’s picture

Status: Active » Closed (fixed)

I agree with #4 and because nobody else complained about this particular problem I'm closing it.
It's critical to have an usable and recoverable database, but it seems it was one person case.