Hi,
my ISP has default Collation for mySQL tables 'cp1250_general_ci'. I'm changing my ISP into a new one which uses 'latin2_general_ci'.
Now, when I export from the first one into the second one it obviously does not work, because the character encoding differs. It does not help to convert the extract from cp1250 to latin2 because as I found out, Drupal stores the content as Unicode anyway.
Yes, Drupal stores the unicode content in DB with whatever character encoding.
This is the example:
Real correct unicode character as hex: c5 a1
is stored in latin2 database as: c4 b9 c4 84
and in cp1250 database as: c4 b9 cb 87
Did anyone deal with this kind of mismash in past?
Do you have an idea how can I import the extract from cp1250 db into latin2 db?
If your recomendation is the use UTF-8 as my default db collation, then sorry. Tried and does not work. The only way to make my local chars working was to accept my ISP defaults (perhaps someone knows why and how to solve this as UTF-8)
Your help is more then greatly apprecieted,
--Josef
Comments
SOLVED!!!
Problem description very brief: happened during Drupal db backup or transfer. Backup: tree letters (see bellow) damaged. Transfer between two DBs each in different encoding: czech totally damaged.
Solution: char damage is caused by many different char encoding conversions on the way. phpMyAdmin make the exports in the encoding of its UI, which is mainly in UTF-8. The key is the get the data on local in the format it is stored on the server (no conversion on the way). This solves the first 'Backup' problem. Next, when you want to transfer this export to another DB with other encoding (say src=cp1250 and dest=latin2) then do not try to convert the export file! Only replace the cp1250 string in DB table definition into latin2. That is all! Drupal stores the data in UTF-8 regardless the DB/table/column encoding. So, when you manage to get the data out of the DB without any conversion then regardless the DB encoding the data will already be in UTF-8 - so do not scraw it up, just load it back to whatever encoding the target db is. Do not forget to replace the table definition encoding to your target DB one and also to set the proper file encoding in phpMyAdmin during file upload/import.
Final conclusion: all the trouble is caused by phpMyAdmin and its char encoding smart conversions on the way.
Thanks
Thanks for posting your solution! I am struggling with this, and this makes things a lot clearer.
Now all I need to figure out is:
When the DB is exported from phpMyAdmin using the wrong encoding, and then imported to a new DB that was created UTF-8, then a bunch of content was created, how do I deal with the multiple encodings? Going back to phpMyAdmin, and re-exporting the DB using the original encoding (latin) then all of the new content will be messed up.
:\