Porter Stemmer should respect Drupal minimum word size setting

gpk - August 25, 2009 - 11:27
Project:Porter-Stemmer
Version:6.x-2.0
Component:Code
Category:bug report
Priority:normal
Assigned:jhodgdon
Status:closed
Description

I noticed that the 2.0 CVS log says "minimum word size 3 characters".

I have a site where I've set the "Minimum word length to index" at /admin/settings/search to 2 characters, which means that 2-digit numbers get indexed. This is quite useful.

From looking at #511930: Upgrade to Porter 2 algorithm and #219335: If term gets stemmed to fewer than 3 characters, form validation fails I'm not sure if this simply means that the stemmer won't produce stems less than 3 characters, or if my 2 character minimum word size setting will go haywire. In point of fact, should the minimum stem size be taken from the search settings screen rather than being hard-coded? The value of 3 characters is simply the Drupal default.

#1

jhodgdon - August 25, 2009 - 14:21
Assigned to:Anonymous» jhodgdon

That is a good point.

The Porter algorithm itself has a minimum word size of 2 letters, so we could not go below that, but you are correct that the Porter Stemmer Drupal module should probably set its minimum word size to the Drupal setting rather than hard-wiring 3 characters minimum (as it does now). I will look into it.

#2

jhodgdon - August 25, 2009 - 14:23

Oh, and just as a comment: The Porter Stemmer module currently leaves unchanged anything that is smaller than 3 characters, and doesn't stem a word past 3 characters in length.

It shouldl not prevent Drupal's core Search module from indexing words smaller than that, though.

#3

gpk - August 25, 2009 - 14:27

Many thanks, this is very encouraging :)

#4

jhodgdon - August 25, 2009 - 15:10
Title:Does the "minimum word size 3 characters" restriction prevent indexing of 2-character words?» Porter Stemmer should respect Drupal minimum word size setting
Category:support request» bug report

#5

jhodgdon - August 26, 2009 - 19:25

I have committed this to the Development version of Porter Stemmer (branch 6.x-2.x). I'll make this available from the module home page shortly (available now from CVS -- commit http://drupal.org/cvs?commit=256440), and it will go into the next regular release.

gpx, if you could give it a try in the meantime, that would be helpful.

#6

jhodgdon - August 26, 2009 - 19:29
Status:active» needs review

#7

gpk - August 28, 2009 - 08:32

Strange, 6.x-2.x-dev is still showing last update on 5 Aug?? I guess that release is tracking HEAD not the 6--2 branch or something?

#8

jhodgdon - August 28, 2009 - 13:10

No, it is tracking 6--2. I don't know why it hasn't updated. Must be a delay or bug in the packaging script. I'll file an issue...

Sorry, my mistake. It was tracking HEAD. Fixed now, should be a new release shortly.

#9

jhodgdon - September 9, 2009 - 22:30
Status:needs review» fixed

This is now fixed and released (or should be within a few minutes) in version 6.x-2.1 of Porter Stemmer.

#10

System Message - September 23, 2009 - 22:40
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.