Porter Stemmer should respect Drupal minimum word size setting
| Project: | Porter-Stemmer |
| Version: | 6.x-2.0 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | jhodgdon |
| Status: | closed |
Jump to:
I noticed that the 2.0 CVS log says "minimum word size 3 characters".
I have a site where I've set the "Minimum word length to index" at /admin/settings/search to 2 characters, which means that 2-digit numbers get indexed. This is quite useful.
From looking at #511930: Upgrade to Porter 2 algorithm and #219335: If term gets stemmed to fewer than 3 characters, form validation fails I'm not sure if this simply means that the stemmer won't produce stems less than 3 characters, or if my 2 character minimum word size setting will go haywire. In point of fact, should the minimum stem size be taken from the search settings screen rather than being hard-coded? The value of 3 characters is simply the Drupal default.

#1
That is a good point.
The Porter algorithm itself has a minimum word size of 2 letters, so we could not go below that, but you are correct that the Porter Stemmer Drupal module should probably set its minimum word size to the Drupal setting rather than hard-wiring 3 characters minimum (as it does now). I will look into it.
#2
Oh, and just as a comment: The Porter Stemmer module currently leaves unchanged anything that is smaller than 3 characters, and doesn't stem a word past 3 characters in length.
It shouldl not prevent Drupal's core Search module from indexing words smaller than that, though.
#3
Many thanks, this is very encouraging :)
#4
#5
I have committed this to the Development version of Porter Stemmer (branch 6.x-2.x). I'll make this available from the module home page shortly (available now from CVS -- commit http://drupal.org/cvs?commit=256440), and it will go into the next regular release.
gpx, if you could give it a try in the meantime, that would be helpful.
#6
#7
Strange, 6.x-2.x-dev is still showing last update on 5 Aug?? I guess that release is tracking HEAD not the 6--2 branch or something?
#8
No, it is tracking 6--2. I don't know why it hasn't updated. Must be a delay or bug in the packaging script. I'll file an issue...Sorry, my mistake. It was tracking HEAD. Fixed now, should be a new release shortly.
#9
This is now fixed and released (or should be within a few minutes) in version 6.x-2.1 of Porter Stemmer.
#10
Automatically closed -- issue fixed for 2 weeks with no activity.