I just tried to use Simple Extractor to tag Chinese text. Before realizing that it won't work in the first place as Chinese doesn't use word boundaries as English does I also saw that the organization and look-up of terms by first character breaks in the case of Chinese - and likely other languages (which ones?).
Patch coming.
| Comment | File | Size | Author |
|---|---|---|---|
| #1 | 726080-1_chinese.patch | 1.93 KB | alex_b |
Comments
Comment #1
alex_b commentedDoes not make simple extractor work with chinese but fixes the look up by first character problem. Contains debug output to error log.
Comment #2
charlesc commentedIs it possible to use Yahoo! Taiwan API?
http://tw.developer.yahoo.com/cas/
* ws:斷詞與詞性標註(Word Segmentation)
* ke:文章關鍵字擷取(Keyword Extraction)
Comment #3
alex_b commentedExtractor's Yahoo Placemaker option just works (tm) with Chinese. This issue only concerns the simple extractor mode.