Indexing can be permanently broken

Gerhard Killesreiter - February 4, 2009 - 14:17
Project:xapian
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Jeremy
Status:closed
Description

When I have to re-index a site from scratch, I try to play with the number of nodes I can index per cron run. When I increase this number too high, cron dies with OOM. When I then decrease the setting again, cron will still die, because of the limit being set to first address the nodes that weren't flushed to disk the last time. However, this number of nodes can still be too high and so the only way to fix the indexing is to start from scratch with the lower setting.

I think that

if ($not_flushed > 1) {
// Re-index content that wasn't flushed to disk last time, minus the node
// which failed last time.
$sql = 'SELECT nid FROM {xapian_index_queue} WHERE status > 0 ORDER BY COALESCE(priority, 0) DESC, added ASC';
$limit = $not_flushed - 1;
}

should also take the variable_get('xapian_indexing_throttle', 100) into account.

#1

Jeremy - February 4, 2009 - 16:22

Agreed, this is a bug. However, doesn't it solve itself by dropping the limit by 1? If, for example, it tried to index 100 nodes and failed on the 75'th, the next time it only tries to index 74 -- why doesn't that work? And if that fails, the next time it should try 73, etc...

Regardless, agreed that it should take into account the xapian_indexing_throttle as a maximum. Patches welcome! :)

BTW: Is this OOM directly related to this issue, or does it still happen with your patch applied?

#2

Gerhard Killesreiter - February 5, 2009 - 00:51

Still happens with that patch applied if you choose too many nodes.

#3

Jeremy - February 5, 2009 - 23:50
Assigned to:Anonymous» Jeremy
Status:active» fixed

Committed to 6.x-1.x-dev. If you lower "Items to index per cron run" after it failures due to OOM, Xapian will now respect your setting and processes the lessor of the two values.

#4

System Message - February 20, 2009 - 00:00
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.