stem on search terms instead of stemming in index
malc_b - June 16, 2009 - 08:01
| Project: | Porter-Stemmer |
| Version: | 6.x-1.0 |
| Component: | Miscellaneous |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | duplicate |
Jump to:
Description
There seems to be problems with stemming to me. Extract fails because the stem doesn't match the actually word. Wouldn't a better approach be to turn it on its head and so instead of making a search into the stem to make it into a search for singular or plural. So search for orange becomes orange or oranges? The downside is the speed issue.

#1
Better title. I don't have a strong feeling about this.
#2
See also #437084: Excerpt fails to find stemmed keyword and #493270: search_excerpt() doesn't work well with stemming
I think Porter Stemmer is doing the right thing, indexing the root words into the search index, and then reducing search terms to root words when searching. The Search module calls the preprocessing hook when indexing and when searching, so you have to do the same thing both times.
If you did it the other way, the search index would be many times larger. E.g. a page containing "walk" would have to be indexed under "walk", "walks", and "walking".
#3
Also there is no published algorithm for finding all possible derived forms of a word -- the Porter Stemmer module is stemming in the standard way (i.e. standard in the Information Retrieval industry), at least it seems so.
So I am going to go ahead and mark this as "duplicate", since the main issue reported here is that excerpts are not being found, which is covered in #493270: search_excerpt() doesn't work well with stemming.