Chinese Word Splitter(中文分词)
Support _search_preprocess interface. This module split chinese word with space. So it make search module to add correct chinese word into index table. You need re-index your site after active this module.
Module works with a user-defined dictionary. So in fact it can support split other languages.
Now there are two match arithmetic in module.
Using with 4.7 or 5.x, you should disable "simple Chinese/Japanese/Korean tokenizer" in search.module setting.
此模块支持_search_preprocess接口,可对中文进行分词,以便在search模块的预索引和搜索时获得正确的中文结果,避免使用简单中日韩处理时产生巨量的搜索条目。安装此模块后,需要重新生成Search索引,建议索引词长度为1或2。
模块使用用户定义字典,因此实际上使用合适的字典可以支持其他的语言。
目前提供正向最大匹配和逆向最大匹配两种算法。
在4.7下使用时,需要关闭 管理-〉设置-〉搜索 中的“简单CJK(中日韩字符)处理”选项。
注意:字典文件是UTF-8格式(带BOM头标)。在有些系统上你可能需要去掉BOM头标,模块才能正确的读取字典并匹配分词,否则可能不能分词成功。
now support 4.7 and 5.x
Releases
| Official releases | Date | Size | Links | Status | |
|---|---|---|---|---|---|
| 6.x-1.0 | 2008-Apr-21 | 3.95 MB | Download · Release notes | Recommended for 6.x | |
| 5.x-1.0 | 2008-Apr-21 | 2.2 MB | Download · Release notes | Recommended for 5.x | |
| Development snapshots | Date | Size | Links | Status | |
|---|---|---|---|---|---|
| 5.x-1.x-dev | 2007-Jun-19 | 2.2 MB | Download · Release notes | Development snapshot | |
| 4.7.x-1.x-dev | 2006-Nov-13 | 2.16 MB | Download · Release notes | Development snapshot | |
