Pages are cached without content when cron is stuck

Frank Steiner - April 30, 2009 - 05:47
Project:Protected node
Version:6.x-1.5
Component:Code
Category:bug report
Priority:critical
Assigned:Unassigned
Status:needs review
Description

Hi,
the test

        // If we have been accessed from cron.php (f.e. search indexing)
        if (variable_get( 'cron_semaphore', FALSE )) {

when viewing a node doesn't work well. When a cron run takes longer or is stuck/crashed, any access to a protected_node page will result in an empty page. The problem is that these empty pages will be cached for anonymous users, because the cache is not disabled for this case.
Thus, those pages will stay empty for anonymous users.

First of all, caching should be disabled even for the cron run case to exclude this.

To avoid showing an empty page to the user the if-clause must be changed. However, I'm not aware of any reliable method to check if the page load is initiated from cron. Sth. like
  if (request_uri() == '/cron.php')  {
will exclude false positives and catch the regulra cron runs. But it will miss manual cron runs or any other call to drupal_run_cron from a user function or page.

I'm not sure what happens when we miss a cron run: I thought that the page shouldn't be returned but redirected, so that the content is never indexed for search. Is that correct? I guess it wouldn't be too bad to miss some cron runs as long as that doesn't lead to indexing protected content for search. But I'm not sure.

Showing cached empty pages to the users instead of asking for a password is quite a problem at our site, so I set this to critical.

#1

Frank Steiner - April 30, 2009 - 14:19
Status:active» needs review

I might have found a solution. It depends wether you only intend to keep pages from cron jobs in general or just from being indexed for search. The comment in the module seems to imply the latter one. If that's right, please test the patch.

Just before _node_index_node builds the node body, it set's $node->build_mode to NODE_BUILD_SEARCH_INDEX, so we can check for it in nodeapi('view') and empty the node.

This way we don't have to care about cron_semaphore etc. and won't run into empty pages or cached pages even when cron is stuck.

AttachmentSize
protected_node_prevent_indexing.patch 891 bytes
 
 

Drupal is a registered trademark of Dries Buytaert.