I have set up my drupal site and it's been up and running fine. The primary language used on its web pages is Korean and the Korean text shows up fine on every page. Although I don't think I explicitly set the encoding setting (besides adding a <meta http-equiv='Content-type' content='text/html; charset=UTF-8'> tag in my page.tpl.php file) I'm assuming the pages were encoded with UTF-8 charset. At least the characters show up fine on my Opera browser only when it's set to UTF-8 encoding.
Here's the problem though. My Drupal search box doesn't work. Searching for some English text works all right but whenever I type in some Korean text and hit the search button, I encounter,
"You must include at least one positive keyword with 3 characters or more."
and I only see a bunch of question marks ??? in the search box below the error message. Funny thing is when I switch the encoding setting of my web browser from UTF-8 to EUC-KR (most commonly used charset in Korea), then those ??? magically turns into its original text and the rest of the page all turns funky on me. So I'm guessing the basic encoding stuff is all set correctly as UTF-8 but whenever I submit some information to the search box the charsets get mixed up somehow.
Has anyone had the same problem as this? Any suggestions would be appreciated. Thanks.
Comments
Server setup broken
I've seen this once before... something is taking the UTF-8 that your browser sends, and converting it to EUC-KR before it arrives in the Drupal script. Make sure you don't have mbstring input encoding conversion or an Apache module messing things up.
--
If you have a problem, please search before posting a question.
I still haven't been able to
I still haven't been able to resolve this problem yet. But here's what I've tried this morning.
I have backed up my mysql database and then completely blew off the whole database. Now that my database was sparkle clean, I recreated each drupal db table by firing off the database/database.4.0.mysql file. Upon connecting to my website I was prompted to create an admin account, which I did. I consequently enabled the search module, ran cron.php once and then did a Korean search and voila! It works. But as soon as I reimport my old database the problem continues to crop up.
So my suspicion is that it has nothing to do with my apache server settings (since it worked fine under the same (unaltered) server settings when I cleared the DB). I'm wondering if there's any other drupal settings concerning languages. I've already tried disabling my locale module and that didn't have any effect on it.
Any other suggestions?
I stumbled upon the
I stumbled upon the following info while aimlessly wandering around the drupal forum:
http://drupal.org/node/45904
which happens to describe the exact same symptoms that my own website is exhibiting. Seeing that you've indeed dealt with the same problem before, you probably know way more about this problem than me. Could you run me through some of the things you've mentioned above if possible. I'm what you might call a newbie in this whole web development and drupal business.
For instance, I'm not exactly sure how I might go about finding out if "mbstring input encoding conversion or an Apache module" are "messing things up." Where would I look for them? (And what is mbstring input encoding in the first place?) Also, how would I debug my website with LiveHTTPHeaders? Is it some kind of a Mozilla plugin? Do I need Mozilla web browser to use it?
I've been stumped for the last few days and I feel really disoriented. I have only rudimentary knowledge of html, php, and the whole web thing in general.
I would greatly appreciate any help or input. Thanks.
Resolved
Okay, after much hubbub, I finally was able to solve this conundrum in a sort of anticlimactic way. I simply told my web host about what was going on and apparently they just flipped a little switch so to speak and now everything's back to normal. I guess the default encoding setting for my web host account was set for euc-kr and the server somehow messed with my URLs. Now everything's working as it should. Well, almost. There is still that whole CJK search indexing issue (search keywords tokenized only by word boundaries) but guess I'll have to deal with that another day.