Closed (outdated)
Project:
Porter Algorithm Search Stemmer
Version:
6.x-2.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
7 Dec 2010 at 03:41 UTC
Updated:
28 Jul 2022 at 16:12 UTC
Jump to comment: Most recent
Comments
Comment #1
jhodgdonProbably if you don't want full linguistic stemming (which is what Porter Stemmer does), what you want to do is to take the Porter Stemmer module and cut some parts out of its stemming algorithm.... I think that making exceptions for individual words won't really solve the problem you are seeing. For instance, you could put in an exception saying "don't stem designers down to design", but you would still have other -er words being stemmed to their root.
I guess I also don't see why matching designers and design is a problem?
Comment #2
dafletcha commentedI understand the need here. In the content I'm working with, searching for, say, "manager" implies you want information on that role specifically, vs. information on management, managing, etc. It would be nice to be able to define words that get stemmed but that are also preserved in full, to give them a sort of precedence.
So, for example, if I can define "manager" as such a word, during indexing the word is submitted to the index in its stemmed form and in its whole form. And when searching, "manager" is searched for as both a stem and in its whole form. Because nodes that do not contain "manager" will have been indexed with only the stem ("manag"?), they will fail the search for "manager" and be ranked lower than nodes that match both the whole word and the stem. Does this make sense?
Comment #3
jhodgdonI'm not sure about having two versions of a word getting into the index. It could cause some problems with phrase searching, because the pre-processed text is what is used when searching for a phrase. So in this case, if you had a phrase like "The web site manager likes" in your text, it would go into the index as "the web site manage manager lik" (assuming manager->manag and likes->lik in stemming)... well maybe that would be OK, because if you did a search on the phrase "manager likes", it would be pre-processed the same way.
But that would also mean that if you searched for "manager", you would really be doing an AND search on both "manager manag" being in the search index, which would probably screw up the relevant ratings if nothing else.
So I think we could do bypasses, but I think having it both ways is kind of a problem.
Comment #4
mark_fullmerGiven there has been no activity on this in 11 years and Drupal 6 is end-of-life, I'm closing this as outdated.