Hi - I am unable to copy Arabic characters into my site.

1. I am copying Arabic sybmbols from documents and the net and pasting into FCK editor
2. They look fine in the preview but appear as ????? when saved

I searched ( a lot ) and came to the conclusion my database needed converting from latin1 to utf8. I downloaded it, converted it and then uploaded the converted database into a new database created with utf8 support.

However it still does not work. Looking in the database, the new Arabic symbols are still being stored as ???? even though all tables (as well as the database) are now utf8.

Viewing the header of the Drupal pages shows:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

It's really frustrating. What can I try next :) ?

Thanks.

Comments

cog.rusty’s picture

This can be very complicated because there are many factors.

- What happens now to new content which you copy?
- What is your MySQL version? And is it the same version you were using before?
- Are your individual tables also set to utf8, or only the database default?
- Do you have an old backup? What was the charset of the individual tables before? Especially the nodes and revisions tables.

A few "lucky tries", because as I said there are too many factors:

Try to export the database forcing it to be treated as latin1:

mysqldump  --default-character-set=latin1  -u [your-username] -p[you-pass] [database-name] > dump.sql

Edit it with a Unicode-enabled text editor (such as http://www.flos-freeware.ch/notepad2.html for Windows) to remove any charset/collation information (if there is any) and then load it back to the utf8 database.

If this doesn't work, edit it again and try a conversion to utf8 in the text editor, before loading it back.

If this doesn't work either, tell us about anything you noticed while you were doing this.

gavin_s’s picture

Thanks Cog. Here are some answers.

- What happens now to new content which you copy? - shows as ??????
- What is your MySQL version? And is it the same version you were using before? - MySQL client version: 4.1.22, phpMyAdmin - 2.11.4
- Are your individual tables also set to utf8, or only the database default? - both. First I created a fresh DB in utf8, then converted all the old tables from latin1 to utf8 using: ALTER TABLE table_name CHARSET utf8; (in PHPMyadmin) - I then exported and loaded into the new DB.
- Do you have an old backup? What was the charset of the individual tables before? Especially the nodes and revisions tables. - I do yes. The charset was a mix of latin1 and utf8. The nodes (most) and revisions tables were latin1.

I don't seem to be able to follow all your tips such as using that command as I only have access to PHPMyAdmin at the moment.

Also when I tried the manual conversion in a text editor I was faced with time outs either at the save stage or the import.

I'm pulling my hair out :)

S

cog.rusty’s picture

I found the following hanbook page. Although the title seems irrelevant and your MySQL version is OK, it covers many cases and contain a lot of relevant information about what may be wrong.

http://drupal.org/node/198184

If you don't mind losing the existing Arabic text and you mostly want it to work correctly in the future, also go to the individual tables (for example the node_revisions table) and make sure that the individual columns are not latin1.

Your ALTER TABLE table_name CHARACTER SET utf8 command only changes the default for a table, that is the charset of columns created in the future. Your existing columns may still be in latin1.

You may not need to do any text conversion after all, but I should add that the timeouts during uploading could be because of a huge converted database dump. The cause of the problem may be a conversion not to UTF8 (without a BOM signature) but to general Unicode (which would bloat the English text as well). Also it doesn't hurt to empty the cache, session, and especially the watchdog tables to save size.