In some cases, search module doesn't index some words, for instance, when there are only tags between words. In that case they are indexed all together:

This is part of a real node text in one of my web pages (in catalan):

1732/1735<br\><b>Instrumentació:</b>

this got indexed like this in the search_index table:

17321735instrumentació        169        1

which means I couldn't get a search result over 'instrumentació'

I fixed that by adding a white space into the code of search.moulde file:

original file (lines 253-254):

      // Strip heaps of stuff out of it.
      $wordlist = preg_replace("'<[\/\!]*?[^<>]*?>'si", '', $wordlist);

fixed file (lines 253-254):

      // Strip heaps of stuff out of it.
      $wordlist = preg_replace("'<[\/\!]*?[^<>]*?>'si", ' ', $wordlist);
CommentFileSizeAuthor
#5 search.module_0.patch607 bytesrobertgarrigos

Comments

benshell’s picture

Have you tried this on 4.6.x? I read this issue because I'm also having search indexing problems, but this particular problem looks like it has been fixed on 4.6.1. On line 344 on the search.module, I'm reading this:

  // Strip off all ignored tags to speed up processing, but insert space before/after
  // them to keep word boundaries.
  $text = str_replace(array('<', '>'), array(' <', '> '), $text);
  $text = strip_tags($text, '<'. implode('><', array_keys($tags)) .'>');
robertgarrigos’s picture

Title: wrong search indexing in some cases » only 4.5.3

No, I haven't. The web page I was having this problem is on a shared server running php 4, thus no way to get drupal 4.6 on it.

robertgarrigos’s picture

Title: only 4.5.3 » wrong search indexing in some cases
robertgarrigos’s picture

Version: 4.5.3 » 4.5.5

This is not yet fixed with 4.5.5. Apparently there is no problem with 4.6.x versions.

robertgarrigos’s picture

Assigned: Unassigned » robertgarrigos
Status: Active » Reviewed & tested by the community
StatusFileSize
new607 bytes

I enclose a patch for this.

Please, forgive me if this is not the right way of doing. It's the first time I'm using cvs with my macosx. Also the first time I'm using diff to get a patch file, so take it as a simple "hello world" patch file, which should work and fix the problem anyway.

dries’s picture

Robert: your patch looks OK, but I'll let Steven (UnConeD) review it.

Steven’s picture

This patch fixes the described issue, but do we want to be bothered maintaining 4.5 search? It's pretty darn crappy.

robertgarrigos’s picture

You might be right. I was keeping one of my sites on 4.5.5 because I missunderstud the system requirements: drupal 4.6.x can in fact run on php4 (!).

However, if you keep updating 4.5.x due to security wholes for people runing < php 4.3.3 I think it pays to debug that crappy search module ;-)

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to DRUPAL-4-5. Thanks.

Anonymous’s picture

Anonymous’s picture

Anonymous’s picture

Status: Fixed » Closed (fixed)