"ize" and "ise"

carneeki - November 16, 2008 - 17:07
Project:Porter-Stemmer
Version:6.x-1.0
Component:Documentation
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed
Description

In Australia (and I suppose many English speaking parts of the world) we sometimes spell words with the "ise" suffix instead of the "ize" suffix.

I have attached a unified diff which I hope can be tested by someone who knows more about stemming than I do. (my limited time allowed me to test for "caramelise", "caramelised", "caramelize" and "caramelized", all resolving back to the word "caramel" in a node). All I can say is "it works for me". :)

AttachmentSize
porterstemmer.module.diff1.57 KB

#1

greggles - November 16, 2008 - 20:45

That's an interesting idea. This module is largely based on an external porter-stemmer codebase. Maybe you could check that code to see if it has this capability? Perhaps it's time to update this code with a refresh of their latest version.

#2

jhodgdon - July 6, 2009 - 22:16
Component:Code» Documentation

The published Porter Stemmer algorithm is apparently only for American English (this is true of the Porter 2 algorithm). I think we should just update the documentation to state this clearly, rather than trying to modify the algorithm so it would maybe work for non-American English as well. The reason I think this is that the algorithm's decision process is quite complex, and I'm concerned that any modifications we would do would likely screw up the stemming of other words.

Places to fix documentation:
- Project page - http://drupal.org/project/porterstemmer
- README.txt file

Thoughts? Any other places to fix?

#3

greggles - July 6, 2009 - 23:03

That seems like a good solution to me.

Thanks!

#4

jhodgdon - July 6, 2009 - 23:22

I fixed the project page. Here's a patch for the README. Which branch(es) should we commit it to, if any?

AttachmentSize
335030.patch 564 bytes

#5

jhodgdon - July 6, 2009 - 23:23

Missing newline. Try this patch.

AttachmentSize
335030.patch 536 bytes

#6

jhodgdon - July 6, 2009 - 23:23
Status:active» needs review

#7

greggles - July 6, 2009 - 23:59

Looks great to me. I guess commit to 5.x and 6.x branches which are DRUPAL-5 and DRUPAL-6--1.

#8

greggles - July 7, 2009 - 00:01

I should add, if you want to commit things to HEAD as well, please do. Otherwise we can just merge everything from DRUPAL-6--1 into HEAD whenever we start working on 7.x compatibility.

#9

jhodgdon - July 7, 2009 - 14:52
Status:needs review» fixed

Done.

#10

System Message - July 21, 2009 - 15:00
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.