Download & Extend

Scoring factors not normalized to (0,1) properly

Project:Drupal core
Version:7.x-dev
Component:search.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

While reviewing the recent patch to add a new hook_ranking I noticed that some of the equations that have been used to normalize certain ranking factors are not functioning as expected. Robert Douglass posted a blog entry regarding how scoring factors were not functioning as expected here: http://acquia.com/blog/drupals-search-compared-google-and-yahoo.

Statistics and Comments
In the statistics scoring algorithm (currently residing in node.module) we can see:

Statistics:
'score' => '2.0 - 2.0 / (1.0 + node_counter.totalcount * %f)',
'arguments' => array(variable_get('node_cron_views_scale', 0)),

Comments:
'score' => '2.0 - 2.0 / (1.0 + node_comment_statistics.comment_count * %f)',
'arguments' => array(variable_get('node_cron_comments_scale', 0)),

The %f is the setting that the site admin applies to the score in the search settings page, (1 - 10). As can be shown in the screen shot of the graphed function, the range of this function is (0,2) and not (0,1).

The importance to normalize these score factors to (0,1) is that the other score factors ARE normalized properly and thus these two factors have higher score modifiers than the others when set to the same weight in the search settings. Can anyone else verify this with me?

AttachmentSizeStatusTest resultOperations
Picture 4.png25.07 KBIgnored: Check issue status.NoneNone

Comments

#1

Agreed. Node promotion and stickiness is automatically normalized on a 0 - 1 scale; comments and statistics should be too. This can probably simply be changed to:

Statistics:
'score' => '1.0 - 1.0 / (1.0 + node_counter.totalcount * %f)',
'arguments' => array(variable_get('node_cron_views_scale', 0)),

Comments:
'score' => '1.0 - 1.0 / (1.0 + node_comment_statistics.comment_count * %f)',
'arguments' => array(variable_get('node_cron_comments_scale', 0)),

#2

Status:active» fixed

False alarm, the algorithms are fine, since %f is the reciprocal of the maximum number of comments on a single node, so the maximum possible is 1, and minimum possible is 0, so both comment and statistic rankings are normalized properly.

#3

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

nobody click here