searching doesn't break on hyphens

fworsley - January 2, 2008 - 18:08
Project:Drupal
Version:7.x-dev
Component:search.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Description

When breaking words using a hyphen the search module will not pick up on the word parts. For example, create the following story:

Title: Workflow-based Systems
Body: This is really new age. Let's make workflow-based systems.

Searching for "systems" or "age" will correctly return the story as a search result.

Searching for "workflow" however will not return any results. I would expect the search module to treat hyphens the same as spaces and correctly break on them. It should return the story when searching for "workflow".

#1

robertDouglass - April 15, 2008 - 10:31
Version:5.3» 7.x-dev

Confirmed.

#2

robertDouglass - April 15, 2008 - 10:32

The word that gets indexed is workflowbased. It seems unlikely that this will be useful in many cases. It's possible that workflowbased and work and flow need to be indexed (in the absence of some other partial word matching).

#3

robertDouglass - April 15, 2008 - 10:42

The easy "fix" to the problem is to consider the dash a word boundary. We have to be mindful of the consequences, however. This will now index workflow and based as two separate words. If there are words that should have dashes in them this will then break.

AttachmentSize
search-dash-boundary.patch 752 bytes

#4

robertDouglass - April 15, 2008 - 10:42
Status:active» needs work

#5

catch - April 15, 2008 - 10:51

So this means that "e-mail" becomes "e" "mail". That's not good ;) Doubling up the indexing so we have "workflow" "based" and "workflowbased" as suggested in #3 sounds like it might work. That'd mean nodes with "e-mail" would show up in searches for "mail", doesn't seem like such a bad thing.

#6

BlakeLucchesi - May 10, 2008 - 19:40
Status:needs work» won't fix

This will be fixed with a new patch that makes use of input filters to process text for search and indexing. This would allow anyone specific language problems to disable the default input filters provided by search and assign their own.

http://drupal.org/node/257007

#7

design_dolphin - May 10, 2008 - 22:40
Status:won't fix» active

Replacing "-" with a different search character wouldn't help in this case, if I understand correctly.

e-mail
Workflow-based

replace dash with space
e mail
workflow based

(although I heard e-mail is deprecated in the English language and has now become email ;-)

Possible solutions:

  1. is to accept the technical limitations for a search function (either way).
  2. add the possibility of a native language dictionary which controls the spelling. (and cache this for all content) I don't know if there is a open source dictionary though (for all native languages) But maybe work together with somebody like OpenOffice.org on this (Do they do that?), and possible commercial third parties.
  3. give a list manually (but I don't see this happening in a workflow and on a large scale)
  4. Split the words when indexing but then combine them again in the results. So index on cron
    as workflow - based and display as search result as workflow-based. Something like that anyway.

p.s: I set to active, because I'm not sure what happens to a post is left with the previous setting.

 
 

Drupal is a registered trademark of Dries Buytaert.