In testing this module on scratch, we found that when indexing nodes that have a drupal_goto() in them, the cron process is redirected to the destination of the goto. Indexing therefor stalls on the first node with a goto.
In testing this module on scratch, we found that when indexing nodes that have a drupal_goto() in them, the cron process is redirected to the destination of the goto. Indexing therefor stalls on the first node with a goto.
Comments
Comment #1
jeremy commentedI have committed a less than optimal "fix" to this problem, but will leave this bug open.
The problem is quite simple: when we building nodes to index them, we call node_buid_content(). This in turn calls node_prepare(), which calls check_markup() in the filter.module. In check_markup(), we apply all filters, including php filters. If a node has a drupal_goto() in it, and a php filter applied, the drupal_goto() is executed.
This affects the core module the same as the xapian module. However, it's a more serious problem for the xapian module, because the goto results in the cache not being flushed to disk. Thus, and nodes we've indexed in this cron run prior to reaching the drupal_goto() are lost.
My "fix" is to flush a node from the index_queue before we actually index it. This prevents us from trying to reindex the same node over and over and over forever, but it doesn't solve the problem of indexed content never being flushed to disk.
I still consider this bug to be critical, and I'm leaving it open until we come up with an optimal fix.
Comment #2
jeremy commentedMy current idea on how to fix this bug is to make it indexing transactional:
Comment #3
jeremy commentedI essentially implemented what I've describe above, and verified that we successfully index all content except for those which it's not possible to index (ie, those with a drupal_goto).
Comment #4
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.