Due to the late page cache, having a spurious entry in the page cache for 'cron.php' can prevent cron from working.
If for any reason you rename cron.php temporarily, and something hits it as anonymous while it is not there, it will cause a page cache entry for 'cron.php' to be stored due to the 404 handling procedure.
After moving it back, cron.php will continue to return 404s during bootstrap due to the late page cache.
This probabaly also affects xmlrpc.php and any third party handlers.
(If a way was discovered to cause this entry to be stored without having to move the file first, this would be a pretty serious issue. Leaving at critical until this can be investigated more thoroughly...)
[11:40] <cornernote> so it is crapping itself at phase 5
[11:40] <Bdragon> 404's usually come while running the menu handler, but that shouldn't happen for cron.php
[11:41] <Bdragon> Because index.php is responsible for it
[11:41] <cornernote> DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE
[11:41] <cornernote> thats where it gets too...
[11:42] <Bdragon> Hmm...
[11:42] <Bdragon> Did you remove cron.php and then try to visit it as anonymous?
[11:43] <Bdragon> There could be a spurious cache entry
[11:43] <Bdragon> Try truncating the page cache
[11:44] <Bdragon> If this fixes it, I know exactly what's happening...
[11:44] <cornernote> it just magically fixed itself
[11:44] <Bdragon> yeah
[11:45] <cornernote> but i got it down to 2 lines.. either this:
[11:45] <cornernote> $conf = variable_init(isset($conf) ? $conf : array());
[11:45] <cornernote> or this: _drupal_cache_init($phase);
[11:45] <Bdragon> It would be #2.
[11:45] <Bdragon> Here's what's happening:
[11:46] <Bdragon> At some point, you attempted to visit cron.php as anonymous when it had been renamed or moved.
[11:47] <Bdragon> When a file is not found, the webserver tries again by rewriting it to index.php?q=(requested file..)
[11:47] <Bdragon> This causes a normal drupal startup.
[11:47] <Bdragon> Drupal attempts to find a menu handler for cron.php
[11:47] <Bdragon> This doesn't exist.
[11:48] <Bdragon> Drupal returns a 404 and adds a page cache entry to match.
[11:48] <Bdragon> NOW, you move cron.php BACK.
[11:48] <Bdragon> Next time you try to call cron.php, it actually runs cron.php
[11:48] <Bdragon> BUT, cron.php does a full bootstrap
[11:49] <Bdragon> A full bootstrap involves the late page cache check.
[11:49] <cornernote> i am on dreamhost, and they have a nasty habbit of killing scripts... usually most issues i blame on that... rekon that could have caused it ?
[11:49] <Bdragon> The cache happens to have a stored entry for cron.php
[11:49] <Bdragon> The system returns this result and aborts the bootstrap
[11:50] <Bdragon> So, the result is, despite not even using the menu system, the bootstrap returns 404
[11:50] <cornernote> i think DH tinker with sites when servers we under load.. i think they did it
[11:50] <Bdragon> Possibly
[11:50] <Bdragon> So, this is a bug in Drupal.
[11:50] <cornernote> so... now i fully understand the problem
[11:51] <Bdragon> Cron.php should not be bootstrapping that far
[11:51] <cornernote> now, to prevent this from recuring... can i just delete that page from cache once per day with an external php script ?
[11:51] <Bdragon> No, change cron.php to not DRUPAL_BOOTSTRAP_FULL
[11:52] <Bdragon> Should go to DRUPAL_BOOTSTRAP_SESSION, I reckon
[11:53] <Bdragon> The further steps are only relevant for index.php invoked bootstraps
[11:54] <cornernote> that gives me: Call to undefined function drupal_cron_run()
[11:57] <Bdragon> ahh, there's the problem...
[11:57] <Bdragon> Yeah, it isn't pulling in common.inc until the last part...
[11:57] <Bdragon> Damn...
[11:58] <cornernote> shouldnt the cache handler check for things like caching the cron.php page ?
[11:58] <Bdragon> Yeah... That might be a good idea.
[11:58] <Bdragon> page_get_cache?
[11:58] <Druplicon> page_get_cache: Retrieve the current page from the cache. => page_get_cache() => http://api.drupal.org/api/5/function/page_get_cache
[11:58] <Bdragon> Should be ensuring that the handler is index.php....
[12:00] <Bdragon> I'll bugreport it
Comment | File | Size | Author |
---|---|---|---|
#40 | 177790-40-dont-use-cache-on-technical-pages.patch | 3.62 KB | ksenzee |
Comments
Comment #1
gregglesI couldn't reproduce this...
I'm marking active needs more info to ask for the list of steps necessary to make this happen.
My current list:
1. Install Drupal6
2. wget http://localhost/cron.php returns 200 and runs cron
3. mv cron.php cron.php2
4. wget http://localhost/cron.php returns 404 and fails to run cron
5. mv cron.php2 cron.php
6. wget http://localhost/cron.php
Expected results:
wget in step 6 returns a 404.
I tried this without page caching, with normal page caching, and with aggressive page caching.
Downgrading to normal until we get steps to successfully repeat (it may even be normal then...)
Comment #2
jurgenhaasI have just had this issue on Drupal 5.6, for some reason the http://www.example.com/cron.php ended up in the table example_cache_page and therefore the cron did not run anymore because if was taken from the cache during boot_strap. I have patched
function page_get_cache()
so that if the uri equals/cron.php
it won't load it from the cache table.It's a dirty fix but it works.
Comment #3
gregglesjurgenhaas - can you provide some "steps to repeat" for this issue? That should be the first step of a bug report.
Comment #4
jurgenhaasOh sorry, I forgot. Well, in fact I'm not sure how it happened, I only realised that suddenly the cron wasn't running anymore. It was called and finished without an error message but it didn't do anything. So I started to investigate and finally found out that the cron.php was in the cache_page table which caused the system to call it from there and then exiting.
I have no idea when and how this happened but when I found this issues on the Drupal repository it sounded very similar to my problem.
What could have happened is that the Windows scheduler was calling the cron job while I was updating my Drupal system and for a short period of time, either the cron.php wasn't available or that the .htaccess wasn't available and therefore prevented access to the maybe available cron.php.
Hope that helps, but please let me know if you need more detail.
Comment #5
sciurus CreditAttribution: sciurus commentedI am able to reproduce this on Drupal 6.11 with normal caching.
First, cron.php works normally. If I move cron.php, then try to load it, it ends up in the database cache. If I move it back, I receive the cached 404. If I delete the cache row, it works normally again.
I stumbled upon this bug report after the cron daemon on a dreamhsot server started emailing me the text of the drupal 404 page. Confused, I checked and saw that cron.php was indeed there. However, the 404 was in the page_cache table. I wish I knew what dreamhost was doing that caused a 404 in the first place.
Comment #6
dale42I am experiencing this problem on Dreamhost, as well. Drupal 6.12. It starts with an event like this:
Subsequent GETs to cron.php result in a 404 error.
Based on logging statements I've added to the code, the
if ($cache) { [...] exit; }
path incase DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE
is being used. I didn't confirm there was a 404 entry for cron.php in the cache, but presume there was based on the HTTP 404 response to the GET.When an anonymous user is experiencing the problem, an authenticated user doesn't.
Clearing cache brings things back to normal.
I'm currently trying to track down what's causing the "Premature end of script headers" problem, but that probably isn't relevant to this discussion. Based on my read of the initial report, cron.php should never get cached in the first place.
Comment #7
sciurus CreditAttribution: sciurus commentedMy workaround was to add the following code to the top of page_set_cache() in includes/common.inc
if (preg_match('/cron.php$/', request_uri())) { return; }
I still receive the 404 emails from Dreamhost's cron daemon, but at least they aren't cached. The problem has occurred ten times in the last week. Interestingly, it only happens at 3AM, 9AM, 3PM, and 9PM eastern time.
Comment #8
dale42The server I was on had a serious melt-down and Dreamhost (told me they) redistributed clients. Performance has been much better. I'm still seeing regular cron failures, but not getting 404's on cron.php.
I had some logging statements in cron.php and a number of .inc files for a while. When a failure occurred, execution of cron.php was completing, in as much as the execution reached the end of the cron.php file where the log statement was. So it was as if the error was occurring when either the file was closed or resources were being released. I don't have time to do further debugging so haven't tried following this up.
When the performance was severely impaired (read totally f****d), premature end of script header errors were happening all over the place and many other pages, including the front page via "/", returned 404's after the server returned to normal. Clearing cache resolved the 404's. I think there is an issue with caching/error handling, but it obviously is an edge case and not worth aggressively pursuing. I have Drupal 5 instances on the same server and they didn't suffer from the 404 problem, so it appears specific to Drupal 6.
So in my opinion, two problems. The one described in the previous paragraph, and something weird with Dreamhost (that triggers the Drupal problem).
Comment #9
Tobi2000 CreditAttribution: Tobi2000 commentedSo how is everybody doing on this issue. I was hoping for the problem to solve itself before I start hacking core files, but I'm still getting the 404's, and they are still being cached. Did Dreamhost support solve it for others?
Comment #10
rfayYes, saw this same problem on dreamhost today. Looked in cache_page and there was the entry for cron.php. Renamed the cache key and immediately it works.
I am unable to demonstrate it in a test environment, so I'm not sure what even causes the 404 to be cached.
The contents of the cached page were the 404.
Comment #11
rfayUpdate: I am able to recreate this in a test environment.
1. Turn on page caching.
2. Run cron.php from a browser. It should work.
3. mv cron.php cron.php.bak
4. Run cron.php from a not-logged-in browser.
5. Confirm that you now have a cached 404 page for cron.php in the cache_page table.
6. Put cron back. mv cron.php.bak cron.php
7. Hit cron.php with the not-logged-in browser. You'll see the 404.
A theory: Dreamhost has some kind of Apache file caching going on that occasionally makes a mistake about whether cron.php exists or not. Perhaps the cache expiration and the cache table are not maintained in sync with each other, so somehow Apache (without looking at the filesystem) declares cron.php not to be there. However, index.php is there, so drupal bootstraps.
Comment #12
gregglesI have now seen this problem, but I still don't know how to reproduce it.
It seems to happen when the servers are heavily loaded or for some of my installations that have a lot of complex stuff that they're doing on cron (i.e. TattlerApp doing all the news aggregation and content tagging).
Comment #13
gregglessorry, crosspost.
Comment #14
rfayI created an issue with Dreamhost about this. I think we should fix it in Drupal, but it would be nice if the cron didn't 404 on people.
Comment #15
rfayAgain: This is a Drupal bug in addition to a Dreamhost bug.
The Dreamhost part of the story: Dreamhost has requested that any customer who encounters this file a ticket with them with as much information as possible. Please give a link to this issue, mention that it's been widely reported, and give them the best possible details of what happened to you.
This sql command will show that the issue has in fact happened and give you some relevant time information about when cron.php was missing.
Here's Dreamhost's response:
Comment #16
gregglesIf you have the ability to run Drush that seems to reduce the frequency of this problem (and may eliminate it entirely).
I use
/home/username/bin/drush-HEAD/drush -q -r /home/username/drupalroot/ -l example.com cron
And it seems to work well. You may also need to do:
export TERM=xterm-color
If you are running multiples of these inside a shell script.
Comment #17
rfay#16 is a much better approach if you're on Dreamhost because it prevents a cached page from ever happening.
The core issue may be that Dreamhost has a process killer waiting to kill any long-running or high-resource process. So this issue may have to do with the point at which Dreamhost actually kills the php process.
So far, with drush running under cron, I've seen the process killer kill it anyway, but it doesn't result in the cached page (because no page is needed). It may still result in the 'cron_semaphore' variable being left inappropriately in there.
Comment #18
gregglesOne other tip. To get drush running you may need an entry more like:
/usr/local/php5/bin/php /home/yourusername/bin/drush-HEAD/drush.php -q -r /home/yourusername/drupal_6/ -l example.com cron
Note the explicit path to php at the beginning which is necessary because the default command line interface is php4 (at least on some machines). I didn't notice this when I setup #16 because I had some stuff in my .profile which was taking care of that but I ran into it after updating the .profile
Note also that I created a directory called "bin" in my home directory and have downloaded the latest version of Drush into that directory. So, those are required steps as well.
Comment #19
jennifer.chang CreditAttribution: jennifer.chang commentedsubscribe
Comment #20
jennifer.chang CreditAttribution: jennifer.chang commentedWhat was the DoS part about?
Comment #21
rfayI'm seeing a new, related problem on Dreamhost.
A cached 404 will suddenly be in place. An authorized user will have the page just fine, but anon gets a cached 404. If you clear the cache all is well.
Again, I think we have two problems:
1. Drupal caches things it shouldn't, like cron.php and the (front page as a 404)?
2. Dreamhost seems to kill processes at very unexpected times, provoking behavior #1.
Next time this happens I'll grab the database and study it more closely.
Comment #22
rclemings CreditAttribution: rclemings commentedI just ran into the same problem after moving to a new host. The old host left its cron job in place and it was creating some unnecessary load. They weren't cooperative, so I blocked their access to cron.php with a file directive in htaccess. Unfortunately, that yielded a 404, which got cached, which subsequently gave my own cron job a 404 when it tried to hit cron.php. I blocked the old host at the firewall instead and removed the cron.php entry from cache_page and all seems well now.
Comment #23
rfayI think #76824: Drupal should not handle 404 for certain files could fix this possibly (for D7). See #236
Comment #24
david.houlder CreditAttribution: david.houlder commentedThis is affecting me too (in Druapl 6.19), but here's what seems to be happening in my case:
I have a .htaccess containing
If I clear the page cache and hit cron.php from an allowed IP, everything is fine.
If i then hit cron.php from a disallowed IP, the request returns 404.
This 404 response seems to be cached, and subsequent hits from an allowed IP then also return 404 until the cache is cleared again (or presumably the cache entry expires).
So, my theory is that web crawlers are hitting cron.php, the 404 is getting cached, and subsequent legit hits from the cron jobs get 404.
Update 23 Feb 2011:
I think I might have fixed this by adding
inside the
<Files>
section. It seems that withoutErrorDocument 403
, the mainErrorDocument 404 /index.php
was being used, and I'm guessing that is where the cache entry was generated.Comment #25
ardr CreditAttribution: ardr commentedsubscribe
Comment #26
Damien Tournoud CreditAttribution: Damien Tournoud commentedRegardless of why the page was cached in the first place, Drupal should not try to load the page from the cache on technical callbacks. Those include:
Comment #28
bdragon CreditAttribution: bdragon commentedninja p0
Comment #29
moshe weitzman CreditAttribution: moshe weitzman commentedthis fixes the problem imprefectly but is good enough.
Comment #31
catchComment #32
catch#28: 177790-dont-use-cache-on-technical-pages.patch queued for re-testing.
Comment #34
fabriceg CreditAttribution: fabriceg commentedI have the same problem with D7. I now have to deactivate the cache completely so that cron runs correctly.
Comment #38
fietserwinIn D8, cron is accessed via a route, so this problem should not occur in D8. Back to 7.x-dev.
Comment #39
fietserwinPatch does not apply to 7.x branch, changing to NW.
Comment #40
ksenzeeReroll.
Comment #41
fietserwin@ksenzee: thanks for joining.
Based on:
- Me still getting this error on various (up to date) D7 sites: the issue mentioned in #23 does not solve this problem.
- Me reviewing the patch. Note: function drupal_page_is_cacheable() is part of bootstrap.inc (and stays in bootstrap .inc). As all added calls are just before an already existing call to function drupal_bootstrap() the script won't fail on function drupal_page_is_cacheable() not being available (because bootstrap.inc is not loaded at that place in the script).
- #29: the current patch being a reroll of #28, so the RTBC back then is valid for this patch as well.
- AFAIK: there are no knew "technical pages" introduced since.
- #31: @catch moving it to 8.x. Currently (I don't know how it was 6 years ago), 8.x uses a route to access cron (example.com/cron?... instead of example.com/cron.php?...). So even if 8.x does cache some things it should not cache, I consider that a bug different from this one. So in my opinion all comments as of #31 up to the rerolled patch no longer to be taken into account.
I conclude: back to RTBC.
Comment #42
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedThe patch isn't specific to cron.php though. So given what it's currently trying to solve, I think it would need to be addressed in Drupal 8 too (which also has plenty of standalone .php scripts).
I reviewed and tested the patch, and the solution is a bit ugly, but mostly works. I say "mostly" because it only prevents reading from the cache (not writing to it) and therefore won't help at all if a 404-returning request for that page has already been cached outside of Drupal (e.g. in Varnish or something like that). A more robust and not-necessarily-uglier fix would be for Drupal to simply maintain a list of .php files that it knows it never wants to write to the cache (either hardcoded or configurable in settings.php) and then ensure that even on 404 requests those never get cached.
I think it's also fair to ask whether this issue should just be closed as "won't fix". If you've been deleting and restoring files on the filesystem, you really need to clear caches after doing that. And if that's happening on its own because something is wrong with your hosting company... maybe you need a new hosting company :) Drupal can try to help here, and maybe it should, but this seems somewhat outside of Drupal's scope.
Some smaller issues with the patch specifically:
Comment #47
catchThese are no longer separate front controllers, even update.php is a route now.