The search_simplify function must split on apostrophe (though I cannot find the 0027 / 39 hex code in the PREG_CLASS_SEARCH_EXCLUDE constant).

I can understand this for stemming purposes to some extent, but it seems that should be handled differently, and it does not improve results for terms such as "Parkinson's" or, in particular, names such as "O'Shea".

Does anyone know additional reasons for the logic behind the current approach?

Thanks.

benjamin, Agaric Design Collective

Comments

mlncn’s picture

Category: support » bug
Priority: Minor » Normal

We've been looking into this more and consider it a bug, since it -- as does splitting on a dash when the resulting words are dropped for being under three characters -- makes searching for some names impossible.

Patch forthcoming from Stefan Freudenberg of Agaric, but feedback on the philosophy appreciated at before or after.

figaro’s picture

Please ensure that sufficient characters are included in the search. I posted this issue on this topic: http://drupal.org/node/138284
So the two points arising out of this issue are:
1- Include more characters than just apostrophe.
2- Ensure that search terms of 3 characters lead to meaningful results.

jhodgdon’s picture

bump - this still needs to be fixed.

jhodgdon’s picture

Status: Active » Closed (duplicate)

I'm adding this to a related issue on hyphens and underscores, since it's exactly the same problem.
#108100: Need smarter search splitting on underscores, hyphens, apostrophes and other characters