Greetings! I just re-indexed my site with around 50k nodes and set my search_cron_limit value to 500. The behavior was a bit strange, it seemed to do more than 500 at the beginning and I had to run cron.php several times to index the last 100 nodes. Almost as if the value of search_cron_limit was a percentage of the total node count not a whole number.

I did some brief digging to see if I could locate the source, but haven't found anything yet. If anyone has more experience with the search indexing, it might be worth looking at.

Drupal 5.5, PHP 3.3, MySQL 5.0

Thanks!!

Kevin

CommentFileSizeAuthor
#2 node.search-update-20080729.patch1.49 KBekes

Comments

robertdouglass’s picture

Status: Active » Postponed (maintainer needs more info)

Can anyone reproduce this?

@kevinhankens you surely meant something other than PHP 3.3?

ekes’s picture

Version: 5.5 » 5.x-dev
StatusFileSize
new1.49 KB

I can report something similar I believe I have the reason too:-

The node.module function node_update_index() pulls the nodes to index with

  $result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change, n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(n.changed, c.last_comment_timestamp) = %d AND n.nid > %d) OR (n.changed > %d OR c.last_comment_timestamp > %d)) ORDER BY GREATEST(n.changed, c.last_comment_timestamp) ASC, n.nid ASC', $last, $last_nid, $last, $last, $last, 0, $limit);

The ORDER BY GREATEST means NULL is preferred over a date or zero so in the case of anything not in the node_comment_statistics table gets indexed earlier thus jumping up the stored node_cron_last_nid used as $last_nid. The IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp in the select part takes note of this, but not with the ORDER. Adding the IF NULL into the ORDER BY changing the line to:

 $result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change, n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(n.changed, c.last_comment_timestamp) = %d AND n.nid > %d) OR (n.changed > %d OR c.last_comment_timestamp > %d)) ORDER BY GREATEST(n.changed, IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp)) ASC, n.nid ASC', $last, $last_nid, $last, $last, $last, 0, $limit);

solved the problem for me. Hence re-indexing my site correctly.

Now I've not looked at the SQL compatibility on this (it was a site with MySQL 5.0.45), and as it's not been reported so much elsewhere I'm a little reticent to change the title and component. Nor have I had time to work through the other user of GREATEST, sorry, but... I've put a patch attached if someone with a bigger picture on this can look, I'm happy to investigate further if it's in the correct direction :-) and if the changes from D6 that change the whole way this works aren't coming back to D5 instead of course.

robertdouglass’s picture

Status: Postponed (maintainer needs more info) » Needs review

Thanks for the patch.

robertdouglass’s picture

Title: search_cron_limit behaves like a percentage, not a number » ORDER BY GREATEST produces wrong results in indexing of nodes
drumm’s picture

Issue tags: +Newbie

I have also seen the coalesce() SQL function used in this type of situation.

jhodgdon’s picture

Status: Needs review » Closed (won't fix)

7.0 is out tomorrow, 5.x is obsolete, won't fix, sorry.