Currently the Snowball stemmer is used for Dutch but it comes with a lot of issues.

Much better stemming is realised using the 'Kraaij-Pohlmann' stemming algoritm (language="Kp"). The simplest improvement is to use this algoritm as the default for Dutch stemming.
See @Sutharsans comment on GitHub.

The Solr wiki mentions Kraaij-Pohlmann as an alternative stemmer for Dutch.

When needed, Kraaij-Pohlmann can be further fine tuned:
- KeywordMarkerFilterFactory → exclude stemming for certain keywords;
- StemmerOverrideFilterFactory → custom stemming for certain keywords that are not handled correctly by KP;
- HunspellStemFilterFactory → only allow stems from the dictionary;
More information and examples can be found in a blogpost "Going Dutch: stemming in Apache Solr" by Peter J Lord.

CommentFileSizeAuthor
#13 3023603_12.patch3.92 KBmpp
#6 3023603_6.patch893 bytesmpp
#5 3023603_4.patch607 bytesmpp
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mpp created an issue. See original summary.

mpp’s picture

Issue summary: View changes
mkalkbrenner’s picture

Patch?

mpp’s picture

Status: Active » Needs review

Attached patch should use the Kraaij-Pohlmann algorithm for Dutch.

mpp’s picture

FileSize
607 bytes
mpp’s picture

FileSize
893 bytes

Apply Kp stemming to index & query.

Fernly’s picture

This patch looks good to me.

mpp’s picture

@mkalkbrenner, should we implement an upgrade path for existing sites?

SynonymFilterFactory has been deprecated to SynonymGraphFilterFactory in the config/optional yaml files in code in #2916407 but not in our sync folder.

rami_h’s picture

Status: Needs review » Reviewed & tested by the community

KP Stemming indeed produces better results and SynonymGraphFilterFactory works fine for both single and multi word synonyms

mkalkbrenner’s picture

Status: Reviewed & tested by the community » Needs work

The patch is ok for me. But we need an upgrade path to modify the existing configs.
And yes, we missed that upgrade path in #2916407: solr.SynonymFilterFactory is deprecated in Solr 7 :-(

mkalkbrenner’s picture

Version: 8.x-2.x-dev » 8.x-3.x-dev
mpp’s picture

Assigned: Unassigned » mpp
mpp’s picture

Assigned: mpp » Unassigned
Status: Needs work » Needs review
FileSize
3.92 KB

  • mkalkbrenner committed 6ec7e7f on 8.x-3.x authored by mpp
    Issue #3023603 by mpp, mkalkbrenner: Improve stemming for Dutch language
    
mkalkbrenner’s picture

Status: Needs review » Fixed

I adjusted the update hooks before committing it.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.