As discussed with the other thread at http://drupal.org/node/view/2092, I tried a work around adopted from phpbb, and it worked for my situation, though might not be very efficient.

Becase I am using multibyte charset anyway, so I search against the node table directly (in fields title, teaser and body), using the following syntax:

$sql = "SELECT n.nid AS lno, n.title AS title, n.created AS created, u.uid AS uid, u.name AS name, 1 AS count
FROM node n LEFT JOIN users u ON n.uid = u.uid
WHERE
( n.title LIKE BINARY '%'
OR n.teaser LIKE BINARY '%'
OR n.body LIKE BINARY '%')
AND n.status = 1";

I am not sure if mysql can return the count of occurence while doing string match. So now I am just setting it to one. Or we might just order it by title or date at this moment.

If you are going to put a fix into the CVS version, I would suggest you check for charset, and then decide to go with this search (if multibyte) or the regular search if single-byte charset.

Comments

Anonymous’s picture

I don't think this is a Drupal bug. This is a database issue. If you want to use multibyte charactersets, you must use a database that supports that (PostgreSQL, MS-SQL, MySQL v4.1+). IMO, this should be marked WONTFIX.

jb605’s picture

This is not a database issue. My mysql database is configured to support the multibyte charset I used. But without using this search syntax, I can not search at all, no matter the current search mechanism (which won't work for multibyte charset anyway because the way it creates index), or search directly against the node body content. If use the above mentioned search syntax ( body LIKE BINARY $word_to_match), search for multibyte charset is possible.

Of course if you don't consider Drupal in situation of multibyte charset, you don't need to do anything.

connermo’s picture

There's another work arround. Just adding '*'s to keywords by default can solve the multibye search problem.

in search.module
change

$keys = str_replace("*", "%", $keys);

to

$keys = '%'.str_replace("*", "%", $keys).'%';

, and that's it.

inertia@drupal.org’s picture

where i can put codes ?

Steven’s picture

Title: search on multibyte charset » Search in Chinese / Japanese

This fix has nothing to do with multi-byte vs single-byte charsets, but with languages that don't use spaces (e.g. Chinese and Japanese). Drupal's usage of UTF-8 means that practically every languages becomes multibyte.

Steven’s picture

Assigned: Unassigned » Steven

I'm working on search improvements, and including the possibility for external text splitting utilities to be used on search. This can make indexing of Japanese / Chinese possible.

chx’s picture

Since then Steven has reworked the search system which solves this.

Anonymous’s picture

yagami’s picture

Version: » 4.6.0

I can't search Chinese words ,even in Drupal 4.6.0.
why close this bug?
I have read the topic between jb605 and Steven(http://drupal.org/node/2092) so much time(up to 10 times),but i can not finger out a slotion,maybe my English is too poor to understand it.
Today is 2005 May 30,this bug fixed in March 14, 2005 - 14:10
I dont think this bug is REALLY been FIXED.

My run time env:
Window XP SP2 CHS + php5 + mysql 4.1.12
CentOS 4 running in Microsoft Versual Server 2005 sp1

Even windows or linux both can NOT search Chinese words. :(

kzeng’s picture

Could you describe your problem? I am using Chinese and Search.module works very well in my site. So it's not a problem of Drupal. It may be caused by your settings (e.g. did you configure your cron job?). If you like, we could discuss this in Chinese in my site. Thanks.

Steven’s picture

Search.module has an indexing preprocessing hook. For Chinese/Japanese searching, this is where you need to hook in a word splitter with an extra module.

Due to the way search works (by indexing words), Chinese / Japanese searching will never work out of the box.