Hi,

I'm porting some text content (HTML -> node Body field) from an older CMS to Drupal 7. Most of the content is English with some other languages mixed in, like Chinese. I've tried using scripts (custom module code) to move the content into Drupal as well as copy/paste and using SQL to move the content over with no success. When I view the resulting Drupal 7 page in a browser I see funky characters.

When I view the content in phpMyAdmin I see the same funky looking characters. I made a simple test page to access the database directly, pull some content, and send it to the browser. The characters (Chinese in this case) displayed correctly! If I copy and paste the Chinese characters into the Drupal node editor OR the phpMyAdmin editor they save successfully and I see Chinese when viewing the resulting Drupal page in a browser.

Things I've tried with no success:
1) The MySQL database was originally using latin1_swedish_ci collation - I tried converting it utf8_general_ci and also re-importing it as utf8_general_ci - no dice.
2) Forcing the Content-Type header to "text/html; charset=utf-8" - no dice.
3) I looked at the browser headers sent between my simple test page (that works) and the Drupal page (which doesn't) and noticed one difference - the Drupal page has a "Transfer-Encoding: chunked". Apache docs say this is necessary because no "Content-Length" is sent with Drupal pages. I looked through some of the common.inc code and it looks like D7 has it's own custom way of sending content to the browser.
4) Tried adding utf8_encode() function to the imported content - it changed the content but seemed to make it worse.

So I'm stuck and hoping I don't have to go through and manually copy and paste from my test page browser results back into phpMyAdmin or the Drupal editor. Any advice or leads for what's going on appreciated.

Comments

codesmith’s picture

A little bit more sleuthing and found a solution.

This page http://mysql.rjweb.org/doc.php/charcoll is pure gold. Turns out my problem was "double-encoding" and going through the "Example of Double-encoding" exercise with my own data solved it.