Drupal search does not work out of the box for Unicode text. There may be or may not be tweaks, I do not know. But Drupal-as-it-is, in common hosting servers and set-ups, does not work commonly for Unicode texts like Unicode Devnagari, Unicode Indic etc.
There are several BUG reports on these unattended for a long time like http://drupal.org/node/604002
However, chx has asked to post a fresh bug report (http://drupal.org/node/671566#comment-2426384) with certain informations.
These are
Drupal 6.15, Usual Lamp stack ( I tried this in three to four common popular webhosts)
Unicode library - PHP Mbstring Extension
PCRE Library Version 7.8 2008-09-05
I have tried just now again with a fresh install of Drupal with the above Unicode lib and PCRE specifications.
I pasted the following Unicode text in a node -
सुदृढ आणि सुजाण बाळाची चाहूल सुदृढ, सशक्त व हुशार मुले ही ज्याप्रमाणे आई वडिलांचा तसेच समाजाला आधार असतात, त्याचप्रमाणे देशाची खरी संपत्ती असतात अशी मुले ही घडवावी लागतात ornage
लठ्ठपणा घालविण्याचे सोपे उपाय डाएटिंग सुरु केल्यानंतर वजन कमी होण्याची गती अपेक्षाकृत जलद असते. नंतर मात्र ही गती मंदावते. त्यामुळे निराश होऊ नये. त्यानंतर मात्र वजन कमी होऊ लागते orange
I indexed my site after making sure my search settings are okay.
Search can find the word orange but cannot find बाळाची
Please let know if you have success with the above steps, more so a demo link working successfully will be greatly appreciated. If a demo link works it will mean something wrong in our set up which we should correct.
Comments
Comment #1
dave reidDuplicate of #604002: Poor search support of some Unicode scripts
Comment #2
kaakuu commented@Dave Reid - Thanks. This post was asked to be posted File a bug report by chx despite 604002 was shown. Hence the post.
Comment #3
dave reid@kaakuu: Should have just pointed chx to 604002 again. He knows there's no need for duplicate reports and he probably didn't see 604002.
Comment #4
kaakuu commentedThanks Dave. It was pointed to Chx which is obvious as Chx himself referred to 604002, and asked File a bug report. You may please read the link. Thanks again for your concern, hoping to get a solution soon now that both Chx and you are looking into this issue.
Comment #5
damien tournoud commentedI guess this is Devanagari, right?
We need a clear description of what's failing here. I guess the word splitter is to blame, but it might be useful to check the rest of the character manipulation.
Comment #6
chx commented