Cron Search Executes PHP pages - Node Range Lost Permanently in Search [#34768]

The node_update_index function indexes all pages, including those with format = 2 (PHP code). The result is that when search_cron runs, and reaches the php page, the page executes, and depending on its code, it might abort the rest of the indexing, other than the execution of the code may have unpredictable consequences.

I encountered this bug when I had a php page which just calls drupal_goto to redirect a page to another. The result was that when cron is called it aborts in the middle of execution, and redirects to the destination page given to drupal_goto.

I solved this problem temporarily by excluding all nodes with PHP input format from indexing, by adding a " AND n.format <> 2", and I believe it is a valid solution, except that, in my humble opinion, a more general approach should be applied in the administer > input formats section, to include/exclude the format into/from search indexing. This would scale when the user adds more input formats.

Comments

Comment #1

weam commented 21 October 2005 at 12:33

Another solution might be at the node level;i.e. to be able to exclude a specific node from search, by adding a field to the node named "searchable" or so.

Comment #2

Steven commented 21 October 2005 at 13:05

Status:

Active

» Closed (works as designed)

If the search index is aborted because of a php page, it'll continue with the next node later.

Comment #3

weam commented 21 October 2005 at 13:23

Yes. But all the nodes created/modified after the last cron before this one, will be excluded from the search.

The reason is that the abortion happens before exiting the foreach module_list() as $module at the beginning of search_cron(), and does not reach the foreach search_dirty() as $word => $dummy loop to UPDATE the {search_total} table from the dirty stuff, while the node_cron_last variable would have been already updated by the node_update_index() code.

Since the {search_total} rows are necessary for the pager_query call to succeed in do_search, because of the INNER JOIN, a not up-to-date {search_total} will hide all the nodes that have been changed since the last cron run, and this is permanent, since the node_cron_last already bypassed the misfortunate nodes.

Comment #4

weam commented 21 October 2005 at 14:18

Status:

Closed (works as designed)

» Active

Changed status back to active;

Comment #5

weam commented 21 October 2005 at 15:44

Title:

Cron Search Executes PHP pages

» Cron Search Executes PHP pages - Node Range Lost Permanently in Search

Comment #6

dopry commented 21 January 2006 at 07:03

weam, can you tell me how to duplicate this?

Comment #7

Zen commented 8 March 2006 at 14:49

Priority:

Critical

» Normal

Comment #8

mfredrickson commented 12 June 2006 at 20:42

Here's how to duplicate:

Create a page node.
Use the php filter:

<?php
drupal_goto("http://www.example.com");
?>

Make sure this node will be indexed and run mysite/cron.php

You should be redirected to example.com and site indexing should stop.

Here's a method to fix it: Wrap the goto in:

if ($_SERVER['REQUEST_URI'] != '/cron.php') {
... code ...
}

I would also like to see a cron user created that can be checked during PHP pages.:

http://drupal.org/node/5380

Comment #9

Steven commented 12 June 2006 at 20:54

Status:

Active

» Fixed

4.6.x search does do the redirect, but will continue indexing at the next node, the next time cron is run. Only the search_totals table will not be updated correctly.

In 4.7.x this was addressed. The search also does the redirect, but recovers gracefully. No search data is lost, the next time cron is run it will continue at the next node. The only side-effect is that one cron run will index slightly less nodes.

Comment #10

(not verified) commented 26 June 2006 at 21:02

Status:

Fixed

» Closed (fixed)

Cron Search Executes PHP pages - Node Range Lost Permanently in Search

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

News items

Our community

Documentation

Drupal code base

Governance of community