Chinese Word Splitter(中文分词)

zealy - March 14, 2006 - 06:50

Support _search_preprocess interface. This module split chinese word with space. So it make search module to add correct chinese word into index table. You need re-index your site after active this module.
Module works with a user-defined dictionary. So in fact it can support split other languages.
Now there are two match arithmetic in module.

Using with 4.7 or 5.x, you should disable "simple Chinese/Japanese/Korean tokenizer" in search.module setting.

此模块支持_search_preprocess接口,可对中文进行分词,以便在search模块的预索引和搜索时获得正确的中文结果,避免使用简单中日韩处理时产生巨量的搜索条目。安装此模块后,需要重新生成Search索引,建议索引词长度为1或2。
模块使用用户定义字典,因此实际上使用合适的字典可以支持其他的语言。
目前提供正向最大匹配和逆向最大匹配两种算法。
在4.7下使用时,需要关闭 管理-〉设置-〉搜索 中的“简单CJK(中日韩字符)处理”选项。

注意:字典文件是UTF-8格式(带BOM头标)。在有些系统上你可能需要去掉BOM头标,模块才能正确的读取字典并匹配分词,否则可能不能分词成功。

now support 4.7 and 5.x

Releases

Official releasesDateSizeLinksStatus
6.x-1.02008-Apr-213.95 MBRecommended for 6.xThis is currently the recommended release for 6.x.
5.x-1.02008-Apr-212.2 MBRecommended for 5.xThis is currently the recommended release for 5.x.
Development snapshotsDateSizeLinksStatus
5.x-1.x-dev2007-Jun-192.2 MBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.
4.7.x-1.x-dev2006-Nov-132.16 MBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.
 
 

Drupal is a registered trademark of Dries Buytaert.