This code in search_index boosts various html elements in various ways:

  // Multipliers for scores of words inside certain HTML tags.
  // Note: 'a' must be included for link ranking to work.
  $tags = array('h1' => 25,
                'h2' => 18,
                'h3' => 15,
                'h4' => 12,
                'h5' => 9,
                'h6' => 6,
                'u' => 3,
                'b' => 3,
                'i' => 3,
                'strong' => 3,
                'em' => 3,
                'a' => 10);

The result is that the text in the title is 7x more important that text that is boldfaced. This might be fine, but the decision is arbitrary. Administrators should be able to tweak these decisions. The simplest solution would be to replace each of these values with a variable_get(). That way search admins could tweak the values in settings.php. A user interface could be built later, or could be built for admin/settings/search.

The ramification of changing these values is that they apply at index time. If you have some content indexed, and then you change the values, the next content you index will have the new weights applied but the old content will still be indexed with the old weights. This won't cripple search, but if you want the entire index to be uniform you'd need to reindex. Therefore a UI that offers admins the chance to tweak the weights should either automatically trigger a re-index or offer the advice that one should be triggered manually.

Comments

robertdouglass’s picture

Assigned: Unassigned » robertdouglass
Status: Active » Needs review
StatusFileSize
new1.7 KB

Step 1: place them in variables. This way they can be overridden in settings.php or by contrib modules. Step 2 is to decide what, if any, interface is needed to support this. I'm perfectly happy with this patch going in and then opening a separate issue for step 2.

robertdouglass’s picture

StatusFileSize
new5.07 KB

I now think the whole array should be one variable because this way I can change the allowed tags themselves. For example, if I have a lot of content in HTML tables, I can add a boost for <td> elements. Or I can decide that <em> elements don't get any boost.

robertdouglass’s picture

StatusFileSize
new1.02 KB

wrong patch.

David Lesieur’s picture

StatusFileSize
new1.23 KB

Nice!

The indentation looks a bit arbitrary... What about this patch? ;)

dries’s picture

Status: Needs review » Fixed

This looks trivial to me. Committed.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.