Indexing can be permanently broken
| Project: | xapian |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Jeremy |
| Status: | closed |
Jump to:
When I have to re-index a site from scratch, I try to play with the number of nodes I can index per cron run. When I increase this number too high, cron dies with OOM. When I then decrease the setting again, cron will still die, because of the limit being set to first address the nodes that weren't flushed to disk the last time. However, this number of nodes can still be too high and so the only way to fix the indexing is to start from scratch with the lower setting.
I think that
if ($not_flushed > 1) {
// Re-index content that wasn't flushed to disk last time, minus the node
// which failed last time.
$sql = 'SELECT nid FROM {xapian_index_queue} WHERE status > 0 ORDER BY COALESCE(priority, 0) DESC, added ASC';
$limit = $not_flushed - 1;
}
should also take the variable_get('xapian_indexing_throttle', 100) into account.

#1
Agreed, this is a bug. However, doesn't it solve itself by dropping the limit by 1? If, for example, it tried to index 100 nodes and failed on the 75'th, the next time it only tries to index 74 -- why doesn't that work? And if that fails, the next time it should try 73, etc...
Regardless, agreed that it should take into account the xapian_indexing_throttle as a maximum. Patches welcome! :)
BTW: Is this OOM directly related to this issue, or does it still happen with your patch applied?
#2
Still happens with that patch applied if you choose too many nodes.
#3
Committed to 6.x-1.x-dev. If you lower "Items to index per cron run" after it failures due to OOM, Xapian will now respect your setting and processes the lessor of the two values.
#4
Automatically closed -- issue fixed for 2 weeks with no activity.