I have a test site with the Porter Stemmer module installed and working correctly, in that if I search for "come", I find the page including the phrase "here comes the sun". So far, so good (thanks for the module!).

However, the search excerpt shown in the search results doesn't show me that portion of the page. If I search for "comes", I see "here comes the sun" with the word "comes" highlighted in bold. But if I search for "come", I just see some random part of the page (or the top?) , with nothing highlighted.

I'm not sure this can be fixed within Porter Stemmer, actually, but I thought I'd report it anyway.

The problem is happening within functions such as node_search( $op = 'search'), which call the core function search_excerpt() to find the excerpt of the content to display. It doesn't look like there is any easy hook-based way to modify that function, but maybe the Porter Stemmer module could supply a replacement function of some sort? Not sure what the right approach would be...

CommentFileSizeAuthor
#9 accessory-match.png24.68 KBcpliakas
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhodgdon’s picture

I have filed a related issue against the core Search module -- I think that would need to be fixed before a stemmer module could do anything to fix it. See #493270: search_excerpt() doesn't work well with stemming

malc_b’s picture

I've added a comment to the above with a fix, see http://drupal.org/node/493270#comment-1714308

jhodgdon’s picture

greggles’s picture

Status: Active » Closed (duplicate)

Ok, then this can be a duplicate. Thanks, all.

jhodgdon’s picture

If you think this issue is important, please visit #493270: search_excerpt() doesn't work well with stemming and leave a comment. Otherwise, it is possible that no one will think it is important to get into Drupal 7 (much less Drupal 6). The code freeze for Drupal 7 is coming up on September 1st.

jhodgdon’s picture

Status: Closed (duplicate) » Postponed

I'm going to reopen this issue, because if the core Search issue is fixed so that the search_excerpt() function has a new hook in it, we'll also need to modify Porter Stemmer accordingly. And because others are almost certainly having the same problem, so having an open issue they can see immediately in the issue list will help them find it.

But I'll mark this issue "postponed" because we can't really do anything until there is action on that core Search issue.

cpliakas’s picture

Just a note, highlighting does work when using this module with the 2.0 version of Search Lucene API.

jhodgdon’s picture

The Lucene project has its own search excerpt function, and doesn't use the core search_excerpt() function.

It looks to me as though the main difference is that search_excerpt() is looking for the keywords as complete words, where luceneapi_excerpt() looks for the keyword as a substring anywhere in the word. So this will *usually* find stem output of Porter Stemmer, but it might not in every case, because sometimes the stems from Porter Stemmer may not actually be substrings of the word that in the text.

For instance, try using Porter Stemmer to search for the words "accessory" and "accessories" in text containing one or the other, and I think you may find that Lucene doesn't highlight the match in some cases. (The Porter Stemmer stemming output for both of those is "accessori", at least in the current version of Porter Stemmer using the Porter 2 algorithm.)

cpliakas’s picture

FileSize
24.68 KB

Thanks for your reply and a great module, but the point you mentioned above is actually not true (although I completely understand why you thought so). Search Lucene API highlighting is based on the position of the matched word in the document and not the substring like you suggested. By the time it does the pattern matching for highlighting, it looks for both "accessory" and "accessories" because it knows that those were the words that were matched. I just tried it out, and it worked as expected. See the attached screenshot. I guess my point is that I haven't come across a case where a matched word wasn't highlighted because it knows the position of the matched word. Maybe this technique could be applied to the core search somehow, but I am not sure if it is possible due to the SQL backend of the search. I know this isn't really the place for this discussion, so I apologize for the tangent :-). My fault. Again, thanks for the great module.

jhodgdon’s picture

OK, my mistake. I didn't delve too deeply into the code, obviously.

jhodgdon’s picture

I don't think the Search Lucene API module actually uses stemming modules at all -- I think it does its own stemming, by the way.

jhodgdon’s picture

Version: 6.x-1.0 » 6.x-2.x-dev

An update on the status of this issue:

Apparently, no one cared about this issue enough to review my proposed change to Drupal Core (#493270: search_excerpt() doesn't work well with stemming), which would have modified the core search_excerpt() function to allow Porter Stemmer to work correctly with search excerpts. Drupal 7 is now in a "code freeze", so it is now (I believe) too late to get this into Drupal 7, and we can only hope for Drupal 8 at this point. Until this gets into Drupal core, there is no hope of having search excerpts working correctly with core search and Porter Stemmer. There's nothing else I can really do about it at this point. I will leave this issue as "postponed" until this fix goes into Drupal core.

So, as a work-around, I have implemented a similar change in the Search by Page module (http://drupal.org/project/search_by_page), which I'm the maintainer of. This means that if you use the Search by Page module in place of the core Search module, your search excerpts will work correctly with Porter Stemmer.

This is currently only checked into the development versions of both Porter Stemmer (6.x-2.x) and Search by Page (6.x-1.x) (make sure the build date reads September 10 or later). Assuming you were previously up to date on the Porter Stemmer module (version 6.x-2.1), you shouldn't need to clear your search index or run cron when installing this new version to see it working, since this change only affects search results display, not the search index.

jhodgdon’s picture

This is now released in Porter Stemmer 6.x-2.2 and Search by Page 6.x-1.4 -- you can use Search by Page if you want better search exceprts.

It is still not fixed in Drupal core though, so I'll leave this issue as Postponed. Indefinitely.

jhodgdon’s picture

As a further note, the core Search issue #493270: search_excerpt() doesn't work well with stemming has now been postponed to Drupal 8. At least.

This was very frustrating. What happened was that I proposed a change for Drupal 7, but no one cared enough about the issue to comment or review my change, and now it is too late.

If we want to get this into Drupal 8, we're talking about a year or more down the road, and still there would have to be some interest (besides mine) from a Drupal developer to get it to actually happen.

minnur’s picture

The word "bankruptcy" indexed as "bankruptci" which is not correct.

jhodgdon’s picture

Actually, it is correct, since it would also match "bancruptcies" that stems to the same thing. The stemmed keywords are not necessarily English words.

minnur’s picture

Hi Jennifer,

I agree, it would match "bancruptcies", but searching for the word "Bankruptcy" returns nothing.
How could I know that I need to type "Bankruptci" instead of "Bankruptcy" to get any result?

I think there is an issue with filter.
1. I searched for "Bankruptcy"
2. Nothing found
3. I checked the table in database, I see that it has only the word "Bankruptci", which will never be found by searching the word "Bankruptcy"

Thanks

jhodgdon’s picture

You must be doing something special in your searching then. If you use the core Search for both indexing and searching, Porter Stemmer should be applied to both. So when you search using the word "bankruptcy", the stemmer will shorten it to "bankruptci", and it will match what is in the database.

jhodgdon’s picture

jhodgdon’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
Issue summary: View changes
jhodgdon’s picture

Status: Postponed » Closed (won't fix)

The Core issue was fixed in Drupal 8, but not in Drupal 7. We don't need to make changes to the Drupal 8 version of Porter Stemmer to support Core (the fix took care of that). It's not possible to fix in 7 without a core fix. Therefore, I am going to close this as Won't Fix.