Download & Extend

Pages are cached without content when cron is stuck

Project:Protected node
Version:6.x-1.5
Component:Code
Category:bug report
Priority:critical
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

Hi,
the test

        // If we have been accessed from cron.php (f.e. search indexing)
        if (variable_get( 'cron_semaphore', FALSE )) {

when viewing a node doesn't work well. When a cron run takes longer or is stuck/crashed, any access to a protected_node page will result in an empty page. The problem is that these empty pages will be cached for anonymous users, because the cache is not disabled for this case.
Thus, those pages will stay empty for anonymous users.

First of all, caching should be disabled even for the cron run case to exclude this.

To avoid showing an empty page to the user the if-clause must be changed. However, I'm not aware of any reliable method to check if the page load is initiated from cron. Sth. like
  if (request_uri() == '/cron.php')  {
will exclude false positives and catch the regulra cron runs. But it will miss manual cron runs or any other call to drupal_run_cron from a user function or page.

I'm not sure what happens when we miss a cron run: I thought that the page shouldn't be returned but redirected, so that the content is never indexed for search. Is that correct? I guess it wouldn't be too bad to miss some cron runs as long as that doesn't lead to indexing protected content for search. But I'm not sure.

Showing cached empty pages to the users instead of asking for a password is quite a problem at our site, so I set this to critical.

Comments

#1

Status:active» needs review

I might have found a solution. It depends wether you only intend to keep pages from cron jobs in general or just from being indexed for search. The comment in the module seems to imply the latter one. If that's right, please test the patch.

Just before _node_index_node builds the node body, it set's $node->build_mode to NODE_BUILD_SEARCH_INDEX, so we can check for it in nodeapi('view') and empty the node.

This way we don't have to care about cron_semaphore etc. and won't run into empty pages or cached pages even when cron is stuck.

AttachmentSize
protected_node_prevent_indexing.patch 891 bytes

#2

I tried out the patch and it looks fine to me.

#3

Frank Steiner, Cyberwolf,

Does that actually work with boost?

I would imagine that we'd get the same problem with any protected node if it were cached earlier with boost. There is a hook in boost to avoid that, but if this patch already does what is necessary, we should not add unnecessary code.

Thank you.
Alexis

#4

Assigned to:Anonymous» tolmi

#5

I got my answer in regard to Boost. It does not work. Boost still caches everything. I guess that's because the test is very specific to the search and/or CRON and not an anonymous user visiting a page.

I posted a patch for boost here: #829994: Boost caches content once an anonymous user sees it

Thank you.
Alexis

#6

Assigned to:tolmi» Anonymous
Status:needs review» fixed

Hi Frank,

This is fixed. Boost was a different problem altogether.

Thank you for the patch, that's what is checked in 8-)
Alexis Wilke

See: http://drupalcode.org/project/protected_node.git/commit/e94ae30

P.S. Actually, that CRON test could be TRUE any time CRON was running, not just when it is stuck! So that was a much bigger problem ("easily" discovered when CRON was getting stuck, obviously.)

#7

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.