The node_update_index function indexes all pages, including those with format = 2 (PHP code). The result is that when search_cron runs, and reaches the php page, the page executes, and depending on its code, it might abort the rest of the indexing, other than the execution of the code may have unpredictable consequences.
I encountered this bug when I had a php page which just calls drupal_goto to redirect a page to another. The result was that when cron is called it aborts in the middle of execution, and redirects to the destination page given to drupal_goto.
I solved this problem temporarily by excluding all nodes with PHP input format from indexing, by adding a " AND n.format <> 2", and I believe it is a valid solution, except that, in my humble opinion, a more general approach should be applied in the administer > input formats section, to include/exclude the format into/from search indexing. This would scale when the user adds more input formats.
Comments
Comment #1
weam commentedAnother solution might be at the node level;i.e. to be able to exclude a specific node from search, by adding a field to the node named "searchable" or so.
Comment #2
Steven commentedIf the search index is aborted because of a php page, it'll continue with the next node later.
Comment #3
weam commentedYes. But all the nodes created/modified after the last cron before this one, will be excluded from the search.
The reason is that the abortion happens before exiting the
foreach module_list() as $moduleat the beginning ofsearch_cron(), and does not reach theforeach search_dirty() as $word => $dummyloop toUPDATEthe{search_total}table from the dirty stuff, while thenode_cron_lastvariable would have been already updated by thenode_update_index()code.Since the
{search_total}rows are necessary for thepager_querycall to succeed indo_search, because of the INNER JOIN, a not up-to-date{search_total}will hide all the nodes that have been changed since the last cron run, and this is permanent, since thenode_cron_lastalready bypassed the misfortunate nodes.Comment #4
weam commentedChanged status back to active;
Comment #5
weam commentedComment #6
dopry commentedweam, can you tell me how to duplicate this?
Comment #7
Zen commentedComment #8
mfredrickson commentedHere's how to duplicate:
Create a page node.
Use the php filter:
Make sure this node will be indexed and run mysite/cron.php
You should be redirected to example.com and site indexing should stop.
Here's a method to fix it: Wrap the goto in:
I would also like to see a cron user created that can be checked during PHP pages.:
http://drupal.org/node/5380
Comment #9
Steven commented4.6.x search does do the redirect, but will continue indexing at the next node, the next time cron is run. Only the search_totals table will not be updated correctly.
In 4.7.x this was addressed. The search also does the redirect, but recovers gracefully. No search data is lost, the next time cron is run it will continue at the next node. The only side-effect is that one cron run will index slightly less nodes.
Comment #10
(not verified) commented