The steps in INSTALL.mysql.txt results in the creation of a latin1 database (MySQL's default). I believe it's supposed to create a UTF-8 database.

CommentFileSizeAuthor
#4 mysql_install_instructions_utf8.txt1.33 KBAmrMostafa

Comments

mfb’s picture

I don't know if it's possible to set the default character set when creating a database using mysqladmin. The mysqladmin option --default-character-set=utf8 applies to the client not the newly created database.

mysql shell of course works fine: CREATE DATABASE `databasename` DEFAULT CHARACTER SET utf8;

jhodgdon’s picture

Yeah... I just tried:
mysqladmin --default-character-set=utf8 create databasename
and I ended up with a database with latin 1 collation.

Maybe we should recommend that people use the mysql program instead? If you do:

mysql -u username -p
> create database databasename character set = utf8
> quit

then you do get a UTF8 database.

Questions: (a) Do we need the database to have UTF8 character set? Because you can also define the character set at the table level, I think? (b) And do we want to recommend to people that they use UTF8 in the first place?

jhodgdon’s picture

Actually, I just noticed that the GRANT stuff farther down in INSTALL.mysql.txt has them logging in to the mysql program anyway. So why not just do the whole thing there?

So my question remains: What character set (if any) do we want to recommend they set up for the database, or should we tell them to choose one based on what language their site will use?

AmrMostafa’s picture

StatusFileSize
new1.33 KB

I think we definitely need the database to be UTF-8

Patch attached.

jhodgdon’s picture

Those commands worked fine for me in MySQL 5, although the MySQL manual says to put an = in: DEFAULT CHARSET = utf8 etc.

Is that collation the best one to suggest? Just asking, I am not the DB expert here, obviously. :)

mfb’s picture

No it doesn't really matter what the default charset is on the database because drupal sets the default charset for each table as it's created. But it does somehow "feel wrong" to have a latin1 database and utf8 tables.. :p

jhodgdon’s picture

mfb: Are you saying we do not care whether the default charset and collation on the database are set or not, so that this issue should be set to "won't fix"?

And does Drupal set the default charset for each table, even for contrib modules when creating tables?

Third question: Does this apply equally well for Drupal 6 -- if not, maybe we should at least change the install doc for D6?

jhodgdon’s picture

Status: Active » Needs review
mfb’s picture

Yes yes and yes. It doesn't need to be fixed because each table is utf8, see the first few lines of http://api.drupal.org/api/function/db_create_table_sql/6 A "fix" would be for purely aesthetic reasons of not having a latin1 database with utf8 tables.

jhodgdon’s picture

Status: Needs review » Closed (won't fix)

You've convinced me. It sure doesn't look like this needs to be fixed.

DeeZone’s picture

Yet another cryptic item in Drupal that leaves new Drupal developers wondering. Perhaps this is purely aesthetic but why the heck would you leave it as is if it creates confusion?!? The simple existence of this thread, which is reflective of what many go through, when first looking at this issue experience. I find resistance to change at this level odd. Perhaps someone can enlighten me??

jhodgdon’s picture

The point is that whether the *database* is marked as UTF-8, or any other character set, is actually not relevant (only the character encoding of tables is relevant, not of the database as a whole). So there doesn't seem to be much point in making the instructions more complex than necessary.

j0rd’s picture

Status: Closed (won't fix) » Active

I've got a Drupal 7 site, in which all the tables are CHARSET latin1;

I'm running into problems when saving nodes with non-latin1 characters.

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE2\x80\x8B&nb...' for column 'body_value' at row 1
...
in field_sql_storage_field_storage_write() (line 424 of /home/www/powerengineercentral.ca/htdocs/modules/field/modules/field_sql_storage/field_sql_storage.module

Ideally, Drupal would fail upon installation if for what ever reason, Drupal is unable to set tables are UTF8. This would allow the user to resolve the issue, before they create a site and later may have problems converting the tables to UTF8.

I do believe this is a problem, which should be looked at.

This is a site which I've setup on a standard Ubuntu 10.04 LTS with MySQL 5.1. Nothing wonky in my configs.

Here are the MySQL variables which may be causing this issue:


root@li361-6:# mysqladmin  variables | grep -i char
| character_set_client            | latin1 
| character_set_connection        | latin1
| character_set_database          | latin1
| character_set_filesystem        | binary
| character_set_results           | latin1
| character_set_server            | latin1
| character_set_system            | utf8 
| character_sets_dir              | /usr/share/mysql/charsets/ 
damien tournoud’s picture

Status: Active » Closed (won't fix)

As already mentioned, you probably exported/reimported your database using broken tools. The default charset configuration of MySQL doesn't prevent Drupal from properly creating UTF-8 tables.