Currently the Snowball stemmer is used for Dutch but it comes with a lot of issues.
Much better stemming is realised using the 'Kraaij-Pohlmann' stemming algoritm (language="Kp"). The simplest improvement is to use this algoritm as the default for Dutch stemming.
See @Sutharsans comment on GitHub.
The Solr wiki mentions Kraaij-Pohlmann as an alternative stemmer for Dutch.
When needed, Kraaij-Pohlmann can be further fine tuned:
- KeywordMarkerFilterFactory → exclude stemming for certain keywords;
- StemmerOverrideFilterFactory → custom stemming for certain keywords that are not handled correctly by KP;
- HunspellStemFilterFactory → only allow stems from the dictionary;
More information and examples can be found in a blogpost "Going Dutch: stemming in Apache Solr" by Peter J Lord.
Comment | File | Size | Author |
---|---|---|---|
#13 | 3023603_12.patch | 3.92 KB | mpp |
Comments
Comment #2
mpp CreditAttribution: mpp at District09 for District09 commentedComment #3
mkalkbrennerPatch?
Comment #4
mpp CreditAttribution: mpp at District09 for District09 commentedAttached patch should use the Kraaij-Pohlmann algorithm for Dutch.
Comment #5
mpp CreditAttribution: mpp at District09 for District09 commentedComment #6
mpp CreditAttribution: mpp at District09 for District09 commentedApply Kp stemming to index & query.
Comment #7
Fernly CreditAttribution: Fernly at Dropsolid for District09 commentedThis patch looks good to me.
Comment #8
mpp CreditAttribution: mpp at District09 for District09 commented@mkalkbrenner, should we implement an upgrade path for existing sites?
SynonymFilterFactory
has been deprecated toSynonymGraphFilterFactory
in the config/optional yaml files in code in #2916407 but not in our sync folder.Comment #9
rami_h CreditAttribution: rami_h commentedKP Stemming indeed produces better results and SynonymGraphFilterFactory works fine for both single and multi word synonyms
Comment #10
mkalkbrennerThe patch is ok for me. But we need an upgrade path to modify the existing configs.
And yes, we missed that upgrade path in #2916407: solr.SynonymFilterFactory is deprecated in Solr 7 :-(
Comment #11
mkalkbrennerComment #12
mpp CreditAttribution: mpp at District09 for District09 commentedComment #13
mpp CreditAttribution: mpp at District09 for District09 commentedSee https://github.com/mkalkbrenner/search_api_solr/pull/28
Comment #15
mkalkbrennerI adjusted the update hooks before committing it.