hook_exit() not invoked for some cached requests
| Project: | Drupal |
| Version: | 5.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Arguably this is a feature/by design but it also causes an inconsistency: when normal page cacheing is enabled and an anonymous user views a page for the 2nd or subsequent times, and an up-to-date copy is in their local browser cache, Drupal returns a 304 not modifed header and exits without invoking exit hooks.
In this situation, for an anonymous user only those page views that actually lead to the page HTML being returned to the browser score a hit, so multiple page views will *usually* not register multiple hits in {accesslog}, whereas multiple page views by an authenticated user *are* logged.
Some discussion here: http://drupal.org/node/307419.
Advantages of current approach:
- marginally better performance when normal cacheing is enabled because hook_exit() is invoked less often
Disadvantages of current approach:
- inconsistency in handling of repeat views of the same page by anonymous users compared with authenticated users
- it is not guaranteed that we are only recording unique page views for anonymous users
- to get consistent results you need to turn off all page cacheing which makes performance much worse
Note that statistics_exit() is the only exit hook currently in core. I needed to record all page views (including all non-unique page views) so ended up turning off cacheing.
For my use case I'd either need to patch session.inc or supply an alternative session library (specified in settings.php via static variable 'session_inc') in order to be able to record all page views and use cacheing at the same time.
We could argue whether non-unique page views should be counted or not (note that analytics packages give you analysis of both these metrics) but the real question here I guess is whether hook_exit() should be invoked after a 304 header response. IMO it should, but possibly http://api.drupal.org/api/function/hook_exit/7 needs an additional argument to indicate whether it is being invoked after a 304 header response, a full but cached page response, or full and dynamically generated page response.

#1
The main difference between normal and "aggressive" caching is that hook_boot() and hook_exit() are called in normal caching and not in aggressive caching. Given that, I don't understand the issue you are facing.
#2
>hook_boot() and hook_exit() are called in normal caching
Well yes, the performance settings page says
but this is not strictly accurate since hook_exit() is skipped in the situation I described...
In http://api.drupal.org/api/function/drupal_page_cache_header/7, after the 304 and Etag headers are set we immediately hit an exit() ... no invocation of hook_exit()...
#3
#4
Ok, understood. Here is a patch (candidate for the trivial patch of the month).
#5
This needs to be changed in drupal_page_cache_header. The situation being described is hitting the following logic in drupal_page_cache_header:
<?phpif ($if_modified_since && $if_none_match
&& $if_none_match == $etag // etag must match
&& $if_modified_since == $last_modified) { // if-modified-since must match
header($_SERVER['SERVER_PROTOCOL'] . ' 304 Not Modified');
// All 304 responses must send an etag if the 200 response for the same object contained an etag
header("Etag: $etag");
exit();
}
?>
Note that the exit() call will not allow the hook_exit call to be made from _drupal_bootstrap:
<?phpcase DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE:
...
// If the skipping of the bootstrap hooks is not enforced, call hook_boot.
if ($cache_mode != CACHE_AGGRESSIVE) {
bootstrap_invoke_all('boot');
}
// If there is a cached page, display it.
if ($cache) {
drupal_page_cache_header($cache);
// If the skipping of the bootstrap hooks is not enforced, call hook_exit.
if ($cache_mode != CACHE_AGGRESSIVE) {
bootstrap_invoke_all('exit');
}
// We are done.
exit;
}
?>
Attached patch changes the exit in drupal_page_cache_header to return, which will allow processing to return to _drupal_bootstrap and fire the exit hook.
#6
Patch in #5 is right. I changed the wrong exit() (oups) in #4 :)
#7
This bug will need to be backported to D6 and D5 as well, so I've made patches for those versions.
#8
Well yes I think Damien was going for the record with a 7-minute patch!
Also I think an exit hook (at least in 7.x+ since this would be an API change) needs to be able to find out if it has been invoked after a cached page or header reponse, or after full page generation. e.g. poormanscron.module -> #231724: Incompatible with page caching (incomplete cron runs, possible PHP fatal errors) and see #5 (as you can see I was struggling to work this out a way round this question - the problem being that poormanscron_exit wants to run cron and that really needs a full bootstrap so you either need to simulate a full bootstrap regardless or else be certain whether it has taken place). Any thoughts?
#9
@gpk: This has nothing to do with this issue. hook_exit() have to call drupal_bootstrap() themselves if they need a fully bootstrap environment (it's harmless to run it in an already bootstrapped environment, so the fix to #231724 seems trivial.
#10
>harmless to run it in an already bootstrapped environment
I tried that and got a PHP error - the problem being that http://api.drupal.org/api/function/_drupal_bootstrap_full/7 sets a header, which is not a particularly clever thing to do in an exit hook. Agreed, this discussion belongs elsewhere.
#11
poormanscron_exit should be able to run this check to see if the page is being served cached:
<?phpglobal $user;
if ($user->uid || variable_get('cache', CACHE_DISABLED) == CACHE_DISABLED || drupal_set_header()) {
// Run!
}
?>
#12
@gpk: _drupal_full_bootstrap() (or drupal_set_header) should learn to be more clever. Open a new issue so that we can improve them :)
#13
@Dave: Thanks, I think only drupal_set_header() is needed, very smart :D but maybe the following will do the trick:
@Damien: OK I had a go, not sure I've got it quite right, please improve/clarify as needed ;) #318306: _drupal_full_bootstrap() (or drupal_set_header()) should learn to be more clever
[update] That issue supercecded by #318310: Fatal PHP error with _drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL) on cached request.
#14
Tiny tiny patch. Not sure if this needs tests, but it's ready. Marking as RTBC.
#15
Shortening title.
#16
A more accurate title.
#17
And the executive summary:
When a request for a cached page can be answered with a
304 Not modifiedresponse code,drupal_page_cache_header()incorrectly stops the execution of the script, preventing hook_exit() to be fired.Should be backported to D5.
#18
I'm going to assume that this can be marked as fixed as I just saw this core commit. You can shoot me if I'm wrong.
#19
Ah, shoot. Yeah, I did commit this. Drupal.org's been a little funky tonight. Thanks. :)
However, this should actually be moved down to 6.x.
#20
This is going to have a massive effect on statistics module. It will start counting cached visits which is good for reporting but not so good for performance. Statistics itself is not a good idea for high traffic sites so i am not too worried about this change.
#21
Agreed with Moshe in #20, so looks appropriate for committing to Drupal 6. Moving to 5.x.
#22
Committed to 5.x
#23
Automatically closed -- issue fixed for two weeks with no activity.