Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
Hi,
I have modified the schema.xml so that i can make use of CJKAnalyzer / CJKTokenizer to index and query Chinese content:
<fieldType name="text" class="solr.TextField">
<analyzer type="index" class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
</analyzer>
<analyzer type="query" class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
</analyzer>
</fieldType>
I can see that my word is splitting correctly and the query as well. But after i create a view with exposed form filter. It did not show any result. The problem is if a field that has both English and CJK and my query is English, then the result shows perfectly.
Much appreciate if anyone has experience working with CJK in solr module
Thanks
Comment | File | Size | Author |
---|---|---|---|
Untitled.png | 36.03 KB | wingsss |
Comments
Comment #1
wingsss CreditAttribution: wingsss commentedIt turn out to be the problem of Tomcat that not accepting UTF-8 in url by default
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
Thanks and please close the ticket.
Comment #2
travismark CreditAttribution: travismark commentedThis method works for Full text fields..I;m trying to index also a String type like Chinese taxonomies..Any idea?
Comment #3
drunken monkeyWhat is the problem when trying to do this?
Comment #4
OanaIlea CreditAttribution: OanaIlea at bio.logis Genetic Information Management GmbH commentedThis issue was closed due to lack of activity over a long period of time. If the issue is still acute for you, feel free to reopen it and describe the current state.