Due to the late page cache, having a spurious entry in the page cache for 'cron.php' can prevent cron from working.

If for any reason you rename cron.php temporarily, and something hits it as anonymous while it is not there, it will cause a page cache entry for 'cron.php' to be stored due to the 404 handling procedure.

After moving it back, cron.php will continue to return 404s during bootstrap due to the late page cache.

This probabaly also affects xmlrpc.php and any third party handlers.

(If a way was discovered to cause this entry to be stored without having to move the file first, this would be a pretty serious issue. Leaving at critical until this can be investigated more thoroughly...)

[11:40] <cornernote> so it is crapping itself at phase 5
[11:40] <Bdragon> 404's usually come while running the menu handler, but that shouldn't happen for cron.php
[11:41] <Bdragon> Because index.php is responsible for it
[11:41] <cornernote> DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE
[11:41] <cornernote> thats where it gets too...
[11:42] <Bdragon> Hmm...
[11:42] <Bdragon> Did you remove cron.php and then try to visit it as anonymous?
[11:43] <Bdragon> There could be a spurious cache entry
[11:43] <Bdragon> Try truncating the page cache
[11:44] <Bdragon> If this fixes it, I know exactly what's happening...
[11:44] <cornernote> it just magically fixed itself
[11:44] <Bdragon> yeah
[11:45] <cornernote> but i got it down to 2 lines.. either this:
[11:45] <cornernote>       $conf = variable_init(isset($conf) ? $conf : array());
[11:45] <cornernote> or this:       _drupal_cache_init($phase);
[11:45] <Bdragon> It would be #2.
[11:45] <Bdragon> Here's what's happening:
[11:46] <Bdragon> At some point, you attempted to visit cron.php as anonymous when it had been renamed or moved.
[11:47] <Bdragon> When a file is not found, the webserver tries again by rewriting it to index.php?q=(requested file..)
[11:47] <Bdragon> This causes a normal drupal startup.
[11:47] <Bdragon> Drupal attempts to find a menu handler for cron.php
[11:47] <Bdragon> This doesn't exist.
[11:48] <Bdragon> Drupal returns a 404 and adds a page cache entry to match.
[11:48] <Bdragon> NOW, you move cron.php BACK.
[11:48] <Bdragon> Next time you try to call cron.php, it actually runs cron.php
[11:48] <Bdragon> BUT, cron.php does a full bootstrap
[11:49] <Bdragon> A full bootstrap involves the late page cache check.
[11:49] <cornernote> i am on dreamhost, and they have a nasty habbit of killing scripts... usually most issues i blame on that... rekon that could have caused it ?
[11:49] <Bdragon> The cache happens to have a stored entry for cron.php
[11:49] <Bdragon> The system returns this result and aborts the bootstrap
[11:50] <Bdragon> So, the result is, despite not even using the menu system, the bootstrap returns 404
[11:50] <cornernote> i think DH tinker with sites when servers we under load.. i think they did it
[11:50] <Bdragon> Possibly
[11:50] <Bdragon> So, this is a bug in Drupal.
[11:50] <cornernote> so... now i fully understand the problem
[11:51] <Bdragon> Cron.php should not be bootstrapping that far
[11:51] <cornernote> now, to prevent this from recuring... can i just delete that page from cache once per day with an external php script ?
[11:51] <Bdragon> No, change cron.php to not DRUPAL_BOOTSTRAP_FULL
[11:52] <Bdragon> Should go to DRUPAL_BOOTSTRAP_SESSION, I reckon
[11:53] <Bdragon> The further steps are only relevant for index.php invoked bootstraps
[11:54] <cornernote> that gives me: Call to undefined function drupal_cron_run()
[11:57] <Bdragon> ahh, there's the problem...
[11:57] <Bdragon> Yeah, it isn't pulling in common.inc until the last part...
[11:57] <Bdragon> Damn...
[11:58] <cornernote> shouldnt the cache handler check for things like caching the cron.php page ?
[11:58] <Bdragon> Yeah... That might be a good idea.
[11:58] <Bdragon> page_get_cache?
[11:58] <Druplicon> page_get_cache: Retrieve the current page from the cache. => page_get_cache() => http://api.drupal.org/api/5/function/page_get_cache
[11:58] <Bdragon> Should be ensuring that the handler is index.php....
[12:00] <Bdragon> I'll bugreport it
Files: 
CommentFileSizeAuthor
#28 177790-dont-use-cache-on-technical-pages.patch3.71 KBbdragon
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 177790-dont-use-cache-on-technical-pages_0.patch.
[ View ]
#26 177790-dont-use-cache-on-technical-pages.patch3.74 KBDamien Tournoud
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 177790-dont-use-cache-on-technical-pages.patch.
[ View ]

Comments

Priority:Critical» Normal
Status:Active» Postponed (maintainer needs more info)

I couldn't reproduce this...

I'm marking active needs more info to ask for the list of steps necessary to make this happen.

My current list:

1. Install Drupal6
2. wget http://localhost/cron.php returns 200 and runs cron
3. mv cron.php cron.php2
4. wget http://localhost/cron.php returns 404 and fails to run cron
5. mv cron.php2 cron.php
6. wget http://localhost/cron.php

Expected results:
wget in step 6 returns a 404.

I tried this without page caching, with normal page caching, and with aggressive page caching.

Downgrading to normal until we get steps to successfully repeat (it may even be normal then...)

Version:6.x-dev» 5.6

I have just had this issue on Drupal 5.6, for some reason the http://www.example.com/cron.php ended up in the table example_cache_page and therefore the cron did not run anymore because if was taken from the cache during boot_strap. I have patched function page_get_cache() so that if the uri equals /cron.php it won't load it from the cache table.

It's a dirty fix but it works.

jurgenhaas - can you provide some "steps to repeat" for this issue? That should be the first step of a bug report.

Oh sorry, I forgot. Well, in fact I'm not sure how it happened, I only realised that suddenly the cron wasn't running anymore. It was called and finished without an error message but it didn't do anything. So I started to investigate and finally found out that the cron.php was in the cache_page table which caused the system to call it from there and then exiting.

I have no idea when and how this happened but when I found this issues on the Drupal repository it sounded very similar to my problem.

What could have happened is that the Windows scheduler was calling the cron job while I was updating my Drupal system and for a short period of time, either the cron.php wasn't available or that the .htaccess wasn't available and therefore prevented access to the maybe available cron.php.

Hope that helps, but please let me know if you need more detail.

Version:5.6» 6.11

I am able to reproduce this on Drupal 6.11 with normal caching.

First, cron.php works normally. If I move cron.php, then try to load it, it ends up in the database cache. If I move it back, I receive the cached 404. If I delete the cache row, it works normally again.

I stumbled upon this bug report after the cron daemon on a dreamhsot server started emailing me the text of the drupal 404 page. Confused, I checked and saw that cron.php was indeed there. However, the 404 was in the page_cache table. I wish I knew what dreamhost was doing that caused a 404 in the first place.

I am experiencing this problem on Dreamhost, as well. Drupal 6.12. It starts with an event like this:

Apache error.log:
[Wed May 27 05:01:08 2009] [error] [client nnn.nnn.nnn.nnn] Premature end of script headers: cron.php
===
Drupal RECENT LOG ENTRIES page:
page not found  May 27 2009 - 5:01am internal_error.html Anonymous
cron            May 27 2009 - 4:00am Cron run completed. Anonymous
cron            May 27 2009 - 3:00am Cron run completed. Anonymous
===
PAGE NOT FOUND ENTRY from log entries page:
Type      page not found
Date      Wednesday, May 27, 2009 - 05:01
User      Anonymous
Location  http://www.[...].com/cron.php
Referrer
Message   internal_error.html
Severity  warning
Hostname  nnn.nnn.nnn.nnn

Subsequent GETs to cron.php result in a 404 error.

Based on logging statements I've added to the code, the if ($cache) { [...] exit; } path in case DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE is being used. I didn't confirm there was a 404 entry for cron.php in the cache, but presume there was based on the HTTP 404 response to the GET.

When an anonymous user is experiencing the problem, an authenticated user doesn't.

Clearing cache brings things back to normal.

I'm currently trying to track down what's causing the "Premature end of script headers" problem, but that probably isn't relevant to this discussion. Based on my read of the initial report, cron.php should never get cached in the first place.

My workaround was to add the following code to the top of page_set_cache() in includes/common.inc

if (preg_match('/cron.php$/', request_uri())) { return; }

I still receive the 404 emails from Dreamhost's cron daemon, but at least they aren't cached. The problem has occurred ten times in the last week. Interestingly, it only happens at 3AM, 9AM, 3PM, and 9PM eastern time.

The server I was on had a serious melt-down and Dreamhost (told me they) redistributed clients. Performance has been much better. I'm still seeing regular cron failures, but not getting 404's on cron.php.

I had some logging statements in cron.php and a number of .inc files for a while. When a failure occurred, execution of cron.php was completing, in as much as the execution reached the end of the cron.php file where the log statement was. So it was as if the error was occurring when either the file was closed or resources were being released. I don't have time to do further debugging so haven't tried following this up.

When the performance was severely impaired (read totally f****d), premature end of script header errors were happening all over the place and many other pages, including the front page via "/", returned 404's after the server returned to normal. Clearing cache resolved the 404's. I think there is an issue with caching/error handling, but it obviously is an edge case and not worth aggressively pursuing. I have Drupal 5 instances on the same server and they didn't suffer from the 404 problem, so it appears specific to Drupal 6.

So in my opinion, two problems. The one described in the previous paragraph, and something weird with Dreamhost (that triggers the Drupal problem).

So how is everybody doing on this issue. I was hoping for the problem to solve itself before I start hacking core files, but I'm still getting the 404's, and they are still being cached. Did Dreamhost support solve it for others?

Version:6.11» 6.14

Yes, saw this same problem on dreamhost today. Looked in cache_page and there was the entry for cron.php. Renamed the cache key and immediately it works.

I am unable to demonstrate it in a test environment, so I'm not sure what even causes the 404 to be cached.

The contents of the cached page were the 404.

Status:Postponed (maintainer needs more info)» Active

Update: I am able to recreate this in a test environment.

1. Turn on page caching.
2. Run cron.php from a browser. It should work.
3. mv cron.php cron.php.bak
4. Run cron.php from a not-logged-in browser.
5. Confirm that you now have a cached 404 page for cron.php in the cache_page table.
6. Put cron back. mv cron.php.bak cron.php
7. Hit cron.php with the not-logged-in browser. You'll see the 404.

A theory: Dreamhost has some kind of Apache file caching going on that occasionally makes a mistake about whether cron.php exists or not. Perhaps the cache expiration and the cache table are not maintained in sync with each other, so somehow Apache (without looking at the filesystem) declares cron.php not to be there. However, index.php is there, so drupal bootstraps.

Status:Active» Postponed (maintainer needs more info)

I have now seen this problem, but I still don't know how to reproduce it.

It seems to happen when the servers are heavily loaded or for some of my installations that have a lot of complex stuff that they're doing on cron (i.e. TattlerApp doing all the news aggregation and content tagging).

Status:Postponed (maintainer needs more info)» Active

sorry, crosspost.

I created an issue with Dreamhost about this. I think we should fix it in Drupal, but it would be nice if the cron didn't 404 on people.

Again: This is a Drupal bug in addition to a Dreamhost bug.

The Dreamhost part of the story: Dreamhost has requested that any customer who encounters this file a ticket with them with as much information as possible. Please give a link to this issue, mention that it's been widely reported, and give them the best possible details of what happened to you.

This sql command will show that the issue has in fact happened and give you some relevant time information about when cron.php was missing.

SELECT cid, from_unixtime(created) FROM cache_page where cid like "%cron.php";

Here's Dreamhost's response:

I see from the Drupal issue page the age of the issue. It's pretty
frustrating to see such an issue has persisted so long without some kind
of update. I really want it to get looked into, but I'll truly need an
example that can be repeated, since I'm not seeing wide reports of it,
even in internal communications. I know that may be tough since it may
not happen through a set of fixed steps, but at least a time and domain
may help. I'm sorry for the inconvenience.

Thanks!
Ryan C

If you have the ability to run Drush that seems to reduce the frequency of this problem (and may eliminate it entirely).

I use

/home/username/bin/drush-HEAD/drush -q -r /home/username/drupalroot/ -l example.com cron

And it seems to work well. You may also need to do:

export TERM=xterm-color

If you are running multiples of these inside a shell script.

#16 is a much better approach if you're on Dreamhost because it prevents a cached page from ever happening.

The core issue may be that Dreamhost has a process killer waiting to kill any long-running or high-resource process. So this issue may have to do with the point at which Dreamhost actually kills the php process.

So far, with drush running under cron, I've seen the process killer kill it anyway, but it doesn't result in the cached page (because no page is needed). It may still result in the 'cron_semaphore' variable being left inappropriately in there.

One other tip. To get drush running you may need an entry more like:

/usr/local/php5/bin/php /home/yourusername/bin/drush-HEAD/drush.php -q -r /home/yourusername/drupal_6/ -l example.com cron

Note the explicit path to php at the beginning which is necessary because the default command line interface is php4 (at least on some machines). I didn't notice this when I setup #16 because I had some stuff in my .profile which was taking care of that but I ran into it after updating the .profile

Note also that I created a directory called "bin" in my home directory and have downloaded the latest version of Drush into that directory. So, those are required steps as well.

subscribe

Title:[DoS?] Late cache hit can keep cron.php from working in certain circumstancesLate cache hit can keep cron.php from working in certain circumstances

What was the DoS part about?

I'm seeing a new, related problem on Dreamhost.

A cached 404 will suddenly be in place. An authorized user will have the page just fine, but anon gets a cached 404. If you clear the cache all is well.

Again, I think we have two problems:
1. Drupal caches things it shouldn't, like cron.php and the (front page as a 404)?
2. Dreamhost seems to kill processes at very unexpected times, provoking behavior #1.

Next time this happens I'll grab the database and study it more closely.

I just ran into the same problem after moving to a new host. The old host left its cron job in place and it was creating some unnecessary load. They weren't cooperative, so I blocked their access to cron.php with a file directive in htaccess. Unfortunately, that yielded a 404, which got cached, which subsequently gave my own cron job a 404 when it tried to hit cron.php. I blocked the old host at the firewall instead and removed the cron.php entry from cache_page and all seems well now.

I think #76824: Drupal should not handle 404 for certain files could fix this possibly (for D7). See #236

Version:7.x-dev» 6.14
Status:Needs work» Active

This is affecting me too (in Druapl 6.19), but here's what seems to be happening in my case:

I have a .htaccess containing

<Files "cron.php">
  Order Deny,Allow
  Deny from all
  Allow from localhost
  Allow from ...
  ...
</Files>

If I clear the page cache and hit cron.php from an allowed IP, everything is fine.
If i then hit cron.php from a disallowed IP, the request returns 404.
This 404 response seems to be cached, and subsequent hits from an allowed IP then also return 404 until the cache is cleared again (or presumably the cache entry expires).

So, my theory is that web crawlers are hitting cron.php, the 404 is getting cached, and subsequent legit hits from the cron jobs get 404.

Update 23 Feb 2011:
I think I might have fixed this by adding

  ErrorDocument 403 "Access denied

inside the <Files> section. It seems that without ErrorDocument 403, the main ErrorDocument 404 /index.php was being used, and I'm guessing that is where the cache entry was generated.

subscribe

Version:6.14» 7.x-dev
Status:Active» Needs review
StatusFileSize
new3.74 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 177790-dont-use-cache-on-technical-pages.patch.
[ View ]

Regardless of why the page was cached in the first place, Drupal should not try to load the page from the cache on technical callbacks. Those include:

  • cron.php
  • update.php
  • install.php
  • authorize.php
  • modules/simpletest/tests/http.php and modules/simpletest/tests/https.php
  • scripts/generate-d6-content.sh
  • and scripts/run-tests.sh

Status:Needs review» Needs work

The last submitted patch, 177790-dont-use-cache-on-technical-pages.patch, failed testing.

Version:6.14» 7.x-dev
Status:Active» Needs review
StatusFileSize
new3.71 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 177790-dont-use-cache-on-technical-pages_0.patch.
[ View ]

ninja p0

Status:Needs review» Reviewed & tested by the community

this fixes the problem imprefectly but is good enough.

Version:7.x-dev» 8.x-dev
Issue tags:+needs backport to D7

Status:Reviewed & tested by the community» Needs work

The last submitted patch, 177790-dont-use-cache-on-technical-pages.patch, failed testing.