Porter-Stemmer

jhodgdon - December 17, 2005 - 22:43

This module implements the Porter stemming algorithm to improve American English-language searching with the Drupal built-in Search module.

The process of stemming reduces each word in the search index to its basic root or stem (e.g. 'blogging' to 'blog') so that variations on a word ('blogs', 'blogger', 'blogging', 'blog') are considered equivalent when searching. This generally results in more relevant search results.

Porter Stemmer version 6.x-2.0 and later versions use Version 2 of the Porter stemming algorithm, which is the version that Porter currently recommends using for live applications. Older versions of Porter Stemmer (including all 5.x versions) use the original Porter stemming algorithm.

Note that the Porter stemming algorithm is specific to American English, so some British spellings will not be stemmed correctly.

Installation Note

After installing and enabling this module (in the usual way), you will need to rebuild the search index. To do this:

  1. Visit Administer > Site configuration > Search settings, and click on "Re-index site".
  2. Ensure that cron has run sufficient times so that the Search Settings page shows that the site is 100% indexed. You can run cron manually by visiting Reports > Status report and clicking on the "Run cron manually" link.

Limitations and Notes

  • The Porter stemming algorithm is specific to American English, so some British spellings will not be stemmed correctly, and non-English content will not be stemmed correctly.
  • The core Search module does not currently provide a way for a stemming module (such as Porter Stemmer) to know the language of content or search terms during searching or search indexing. So, if you have a multi-lingual site and enable the Porter Stemmer module, it will unfortunately try to apply its stemming algorithm to all the content on your site, regardless of language. See this issue for details: #363336: Porter-stemmer should only stem english or language neutral content for a multi-language site.
  • The Porter stemming algorithm attempts to reduce words to their lingustic root words -- it does not do general substring matching. So, for instance, it should make "walk", "walking", "walked", and "walks" all match in searching, but it will not make "walking" a match for "king".
  • There is currently an issue with exerpts in Porter Stemmer (see: #437084: Excerpt fails to find stemmed keyword). For example, if a page contains the word "walking" and someone searches for "walk", that page will be included in the search results, but the search excerpt will not display the portion of text containing "walking" (it will probably just display the first paragraph of text on that page).

Maintainers

The Porter Stemmer module is co-maintained by jhodgdon and greggles. If you have questions or comments about this module, please communicate with the maintainers by posting an issue (see box in left sidebar of this page). That way, others can benefit from the answers as well.

Downloads

Recommended releases

Version Downloads Date Links
6.x-2.4 Download (163.19 KB) 2009-Nov-19 Notes
5.x-1.0 Download (8.65 KB) 2008-Feb-21 Notes

Development releases

Version Downloads Date Links
6.x-2.x-dev Download (163.2 KB) 2009-Nov-20 Notes
5.x-1.x-dev Download (8.73 KB) 2009-Jul-08 Notes


 
 

Drupal is a registered trademark of Dries Buytaert.