I just tried to use Simple Extractor to tag Chinese text. Before realizing that it won't work in the first place as Chinese doesn't use word boundaries as English does I also saw that the organization and look-up of terms by first character breaks in the case of Chinese - and likely other languages (which ones?).

Patch coming.

CommentFileSizeAuthor
#1 726080-1_chinese.patch1.93 KBalex_b

Comments

alex_b’s picture

Status: Active » Needs work
StatusFileSize
new1.93 KB

Does not make simple extractor work with chinese but fixes the look up by first character problem. Contains debug output to error log.

charlesc’s picture

Is it possible to use Yahoo! Taiwan API?
http://tw.developer.yahoo.com/cas/

* ws:斷詞與詞性標註(Word Segmentation)
* ke:文章關鍵字擷取(Keyword Extraction)

alex_b’s picture

Title: Tagging Chinese texts » Tagging Chinese texts (simple extractor)

Extractor's Yahoo Placemaker option just works (tm) with Chinese. This issue only concerns the simple extractor mode.