This code in search_index boosts various html elements in various ways:
// Multipliers for scores of words inside certain HTML tags.
// Note: 'a' must be included for link ranking to work.
$tags = array('h1' => 25,
'h2' => 18,
'h3' => 15,
'h4' => 12,
'h5' => 9,
'h6' => 6,
'u' => 3,
'b' => 3,
'i' => 3,
'strong' => 3,
'em' => 3,
'a' => 10);
The result is that the text in the title is 7x more important that text that is boldfaced. This might be fine, but the decision is arbitrary. Administrators should be able to tweak these decisions. The simplest solution would be to replace each of these values with a variable_get(). That way search admins could tweak the values in settings.php. A user interface could be built later, or could be built for admin/settings/search.
The ramification of changing these values is that they apply at index time. If you have some content indexed, and then you change the values, the next content you index will have the new weights applied but the old content will still be indexed with the old weights. This won't cripple search, but if you want the entire index to be uniform you'd need to reindex. Therefore a UI that offers admins the chance to tweak the weights should either automatically trigger a re-index or offer the advice that one should be triggered manually.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | 237754_search_tag_weights.patch | 1.23 KB | David Lesieur |
| #3 | search_tag_weights.patch | 1.02 KB | robertdouglass |
| #2 | search-input-formats.patch | 5.07 KB | robertdouglass |
| #1 | search-tag-weights.patch | 1.7 KB | robertdouglass |
Comments
Comment #1
robertdouglass commentedStep 1: place them in variables. This way they can be overridden in settings.php or by contrib modules. Step 2 is to decide what, if any, interface is needed to support this. I'm perfectly happy with this patch going in and then opening a separate issue for step 2.
Comment #2
robertdouglass commentedI now think the whole array should be one variable because this way I can change the allowed tags themselves. For example, if I have a lot of content in HTML tables, I can add a boost for <td> elements. Or I can decide that <em> elements don't get any boost.
Comment #3
robertdouglass commentedwrong patch.
Comment #4
David Lesieur commentedNice!
The indentation looks a bit arbitrary... What about this patch? ;)
Comment #5
dries commentedThis looks trivial to me. Committed.
Comment #6
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.