Project:Sphinx search
Version:6.x-1.x-dev
Component:Sphinx search indexer
Category:support request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

Hi,

I was wondering if anyone has successfully got the Morphology and Word Forms working through the auto generation utility using a dictionary file.

I activate Morphology and wordforms in sphinx.conf (snippet below)

morphology = stem_en
wordforms = z:\environment\usr\local\sphinx\var\data\wordforms.txt

1. Scenario, create 1 article with ninjas in the body field: When I reindex (rotate), I'd expect if I search for either the word ninja OR ninjas as a keyword it would the article in the scenario, what happens is with morphology on it returns a result for ninjas not ninja! Any ideas?

2. You create a wordform file using the sphinx spelldump.exe "Z:\environment\usr\local\sphinx\var\data\dictionary\en_GB\en_GB.dic" commandline it generates a file which has entires as follows:-

Abe > Abe
Abel > Abel

which doesn't seem much use as they are just 1 to 1's! I'd expect them to be of the form:-

walks > walk
walked > walk
walking > walk

Any ideas? (Note the wordforms work in the walk format above but you need to reindex and restart the sphinx service)

Comments

#1

I haven't used spelldump, so all I can do here is search in the forums of the Sphinx search site itself.

Here's what I've been able to find:

http://www.sphinxsearch.com/forum/view.html?id=2462
http://www.sphinxsearch.com/forum/view.html?id=1699

As I understand, wordforms and morphology can be used together, but the stemmer is not invoked for words found in wordforms.txt, so that allows you to create you own exceptions to the stemmer rules. In the end, I guess you need this to provide good results for word variations, and that could also be approached using min_word_len and similar options of the indexes defined in your sphinx.conf.