Hello all,
I've been using Drupal 4.6 for a personal website and I've always been able to write stories in Japanese. I guess it just works out-of-the-box since Drupal is using UTF-8 for encoding, right?
But now I've been trying Druapl 4.7-b6, and the first thing I did was to write a story in Japanese. And...it failed. After posting a story, I got question marks (???) instead of the desired Japanese characters.
I tried experimenting with clean installs of Drupal 4.6.6 and 4.7-b6, on a local machine, and as expected, I could immediately start writing Japanese in 4.6.6 but not in 4.7-b6.
Now my question is: Can others confirm this? Is this a bug and should I report it as an issue, or did some things change along the way and do I need to 'enable' UTF-8 support in 4.7-b6 somehow? I haven't seen any mention of this in the issue tracker...
Thanks,
Tim
Comments
Database encoding
The only thing that changed is that 4.7 explicitly sets the database table encoding and database connection to UTF-8. Before, we assumed this was true. If you are using MySQL 4.0 or earlier, nothing changes, because it does not support UTF-8. For MySQL 4.1 or later, the upgrade should convert your tables. A clean install should work immediately.
You can easily tell if it is a database problem by previewing a node (because then, the database is not involved). If the preview is messed up too, then it is either PHP or your browser messing things up. If it only corrupts when you save the node, then it is a database issue.
--
If you have a problem, please search before posting a question.
another test
Thanks for your reply!
I've just done another test with Drupal 4.7-b6. I've entered some Japanese text and after pressing the "Preview" button, it shows the correct text. After submitting, the text has changed to question marks. So you are saying it is a database problem?
I'm using MySQL 5.0.18 and a clean install of 4.7-b6, so it's not a upgrade of my Drupal 4.6 installation. I'll try now to upgrade my 4.6 installation to 4.7-b6 and see what happens...
Thanks,
Tim
Assuming that you database
Assuming that your database actually contains UTF8 data, an easy solution is to (1) dump the database file using mysqldump, (2) open the dumped file and do a quick search and replace changing the phrase "CHARSET=latin1" to "CHARSET=utf8", and (3) reinstall the modified database file.
Fixed the problem for me in under a minute.
Is it a problem with 4.7-b6 version of database.mysql?
Okay, I've updated my 4.6.6 installation to 4.7-b6, and all my Japanese UTF-8 postings survived the update, and I'm able to post new stories in UTF-8 without any problems!
So now I only wonder why it didn't worked directly with a clean installation of 4.7-b6? Could it be that there is a bug in the database.mysql of 4.7-b6, which makes it not equal to a 4.6.6 database converted to 4.7-b6?
might be phpmyadmin related
did you create the database with phpmyadmin? i noticed that phpmyadmin sometimes defaults to strange language settings... some databases of mine have a default character encoding in swedish now.. very impractical.
Yes, it is related!
I created my DB with phpmyadmin. I checked, and there is an option to create the DB with a collation of utf8_general_ci. I installed a clean version of 4.7-b6 and this time the tables were created with the correct collation, and I could use UTF-8 without a problem. So that problem is solved.
I looked in database.mysql and saw that parts dealing with utf8 were marked out. That could explain why 4.7-b6 doesn't create utf8 tables by default. Was this done on purpose so that older mysql versions aren't confused by it?
Anyway, thanks for your help!
Older versions of MySQL
The character set statements are not really commented out, this is a syntax supported by MySQL that will only be used by MySQL 4.1 and higher. It's possible phpMyAdmin does not support it.
--
If you have a problem, please search before posting a question.
My Latest CVS Upgrade Changed The Content
Hi, I run a 4.7 based Drupal site in spanish, I have been creating content during the last weeks and didn't have problems with accents or the letter "ñ", used in spanish.
Yesterday I did a CVS update and now I see my content is showing strange characters. I can edit each node, write the needed characters again and the problem is fixed, however, I have too much content as to do this manually, my question is: how could I correct my content in a more automatic way?, and, what have changed in code that caused this problem?
I am running Red Hat Linux Enterprise 3.0 with PHP 4.4.0 and MySQL 4.1.18.
Regards!
Alexis Bellido - Ventanazul web solutions
session variables, the key
Hi alexis,
I found the same problem days ago in a migration from 4.6.6 to 4.7.0-rc1.
My config was:
* mysql global variables set to latin1 (my.cnf)
* an utf8 database (and tables)
(this weird way content is shown properly via web but it's stored with bad characters. It was my fail to not check mysql config when I set up the server)
=> drupal < 4.7.x asumes the global variables of mysql as session variables (because it doesn't set the session vars and globals are taken by default,, as in any conection to mysql..).
Drupal 4.7 do "SET NAMES UTF8" in database_connect() so it's used an UTF8 session in connections to the database (prior it was latin1, the default - global mysql).
So... you need to have a 100% utf8 database to make a good migration. That is, you should have been using mysql with global variables set to utf8.
I dont have instructions but some random notes on how to convert it to 100% utf8. Perhaps I work on it in the next days (I'm not in a hurry). Drop me a message if you're interested in the solution.
sorry, me expreso mejor en castellano :)
I'd definitely be interested in this
I'd definitely be interested in this. Hook me up! :)
Clean instal with Fantastico Cpanel MySQL 4.1.18 screwed
The database does not have a single UTF8 Collation so writing UTF characters is meaningless
Is it a fantastico problem or is a drupal 4.7.0 bug ?
--
chios sightseeings
------
GiorgosK
Web Development