searching doesn't break on hyphens
fworsley - January 2, 2008 - 18:08
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | search.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Description
When breaking words using a hyphen the search module will not pick up on the word parts. For example, create the following story:
Title: Workflow-based Systems
Body: This is really new age. Let's make workflow-based systems.
Searching for "systems" or "age" will correctly return the story as a search result.
Searching for "workflow" however will not return any results. I would expect the search module to treat hyphens the same as spaces and correctly break on them. It should return the story when searching for "workflow".

#1
Confirmed.
#2
The word that gets indexed is workflowbased. It seems unlikely that this will be useful in many cases. It's possible that workflowbased and work and flow need to be indexed (in the absence of some other partial word matching).
#3
The easy "fix" to the problem is to consider the dash a word boundary. We have to be mindful of the consequences, however. This will now index workflow and based as two separate words. If there are words that should have dashes in them this will then break.
#4
#5
So this means that "e-mail" becomes "e" "mail". That's not good ;) Doubling up the indexing so we have "workflow" "based" and "workflowbased" as suggested in #3 sounds like it might work. That'd mean nodes with "e-mail" would show up in searches for "mail", doesn't seem like such a bad thing.
#6
This will be fixed with a new patch that makes use of input filters to process text for search and indexing. This would allow anyone specific language problems to disable the default input filters provided by search and assign their own.
http://drupal.org/node/257007
#7
Replacing "-" with a different search character wouldn't help in this case, if I understand correctly.
e-mail
Workflow-based
replace dash with space
e mail
workflow based
(although I heard e-mail is deprecated in the English language and has now become email ;-)
Possible solutions:
as workflow - based and display as search result as workflow-based. Something like that anyway.
p.s: I set to active, because I'm not sure what happens to a post is left with the previous setting.