I have to replace an existing server with the new one. Old one is Fedora Core 3 based server with the latest update of PHP available for it (version 4.3) and the latest 4.1.14 version of MySQL (from www.mysql.com site; default character set of the database and tables is UTF-8, default collation is utf8_general_ci). phpinfo() shows old version 3.23 communication with MySQL (shared-compat installed). Drupal version is 4.6.3. The site is in Slovenian language, with accented letters (č, š and ž). On the new server, there are serialize() errors reported and almost no content is displayed. I've found out the reason must be a character handling.

To be precise, the old system stored everything in pure UTF-8 format to a database (as displayed with phpMyAdmin or MySQL Query Browser). Now, on the new system, when these strings are handled by drupal, something goes terribly wrong.

I've also installed a new drupal 4.6.3 test site. Entering our accented letters there was a problem with a small letter c-caron (č). All others were displayed correctly, but that one was messed. This problem was resolved by replacing the original 4.1.12 MySQL (comming with Fedora 4) with the MySQL 4.1.14 one (from www.mysql.com, glibc 2.3 version - exactly the same as on the old system...).

Now, all letters on the new server are displayed OK, but this is not resolving the basic problem. Looking at the underlying MySQL database, I see our accented characters were not stored as UTF-8 (everything, the database itself and tables all have default character set UTF-8 and character collation utf8_general_ci, exactly as on the old server), but as 2 weird characters (most probably 2 bytes from UTF-8 character - Č is stored as 'ÄŒ', Š as 'Å ', Ž as 'Ž', č as 'č', š as 'Å¡' and ž as 'ž').

So, obviously somewhere on the way from my browser to the database 2-byte UTF-8 character is splitted and each byte is stored separately. On the way back, the reverse process occurs and everything is displayed OK. But:
1. What about the whole site I do have in pure UTF now?
2. Why to complicate with this byte juggling, and how this will be handled in the future? I mean, even if I manually copy about 100 nodes from the old to the new system, what if this will change in the future - shall I do the reverse process again?

With PostgreSQL, bytes are stored OK (unicode database). But I would like to stay with MySQL, as I am not sure about the other modules and PostgreSQL support (I've quite a lot of them installed and using them, in particular about taxonomy and privileges...) - mainly, I don't see any such SQL script to generate PostgreSQL tables, just MySQL ones. Or, can drupal handle it 100% OK in case I create such tables and enter data manually?

Since the MySQL and drupal are exactly the same on the old and the new system, I suspect PHP (5.0.4 or 5.0.5 on the new system, 4.3 on the old one) is a troublemaker. Is there something to set differently on the new system as it was before (php.ini and apache configuration is on my system practically the same, also phpinfo() doesn't show any differences except of communication with MySQL - now 4.1, with the old one 3.23, and of course php is now 5 instead of 4; also SAPI module in not listed anymore, as it should be built-in now, according to the PHP site - there is no RPM with it anymore...)? I mean, is this a problem of some PHP (or, drupal) settings, or maybe something has to be compiled differently? I've also tried the yesterday's CVS version of drupal, no differences about this issue...

I don't know if iconv has something to do with it? I see in phpinfo() all encodings there are ISO-8859-1 (but, they are ISO-8859-1 on the old system as well!). And, how (where) to change them, in case they could help to resolve the problem.

Is there something to change to mbstring settings? Now, settings are exactly the same as on the old system, input and output is set to pass (no changes), internal_encoding is not set and encoding is Off - as required by drupal, mbstring_language is neutral...). I've played here a bit trying to set some variables to UTF-8, but no success or complains from drupal.

Help, please!

Comments

luti’s picture

I've temporary resolved the issue of character set chaos including 2 lines:

mysql_query("SET NAMES 'utf8'", $connection);
mysql_query("SET CHARACTER SET utf8", $connection);

right after connection to the database:

$connection = mysql_connect($url['host'], $url['user'], $url['pass'], TRUE) or die(mysql_error());

Unserialize() errors have now disappeared and web is almost accessible (almost - now, I am fighting with image_assist and who knows what problem comes next...).

But, I am still curious, why this worked on the old system very well, but on the new system PHP and MySQL insist on latin1 character set. I've even made a simple test script in which also after running those 2 commands, mysql_client_encoding($connection) throws out latin1!? But, at least things work...

In mysql config file, there is already client character set specified as UTF-8, so I presume the problem is at mysql module of PHP.

So, does anyone have an idea where to adjust something to get rid of latin1?!