Arguably this is a feature/by design but it also causes an inconsistency: when normal page cacheing is enabled and an anonymous user views a page for the 2nd or subsequent times, and an up-to-date copy is in their local browser cache, Drupal returns a 304 not modifed header and exits without invoking exit hooks.
In this situation, for an anonymous user only those page views that actually lead to the page HTML being returned to the browser score a hit, so multiple page views will *usually* not register multiple hits in {accesslog}, whereas multiple page views by an authenticated user *are* logged.
Some discussion here: http://drupal.org/node/307419.
Advantages of current approach:
- marginally better performance when normal cacheing is enabled because hook_exit() is invoked less often
Disadvantages of current approach:
- inconsistency in handling of repeat views of the same page by anonymous users compared with authenticated users
- it is not guaranteed that we are only recording unique page views for anonymous users
- to get consistent results you need to turn off all page cacheing which makes performance much worse
Note that statistics_exit() is the only exit hook currently in core. I needed to record all page views (including all non-unique page views) so ended up turning off cacheing.
For my use case I'd either need to patch session.inc or supply an alternative session library (specified in settings.php via static variable 'session_inc') in order to be able to record all page views and use cacheing at the same time.
We could argue whether non-unique page views should be counted or not (note that analytics packages give you analysis of both these metrics) but the real question here I guess is whether hook_exit() should be invoked after a 304 header response. IMO it should, but possibly http://api.drupal.org/api/function/hook_exit/7 needs an additional argument to indicate whether it is being invoked after a 304 header response, a full but cached page response, or full and dynamically generated page response.
Comment | File | Size | Author |
---|---|---|---|
#7 | cache-exit-318102-D6.patch | 635 bytes | Dave Reid |
#7 | cache-exit-318102-D5.patch | 664 bytes | Dave Reid |
#5 | cache-exit-318102-5-D7.patch | 644 bytes | Dave Reid |
#4 | 318102-hook-exit-not-called.patch | 577 bytes | Damien Tournoud |
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe main difference between normal and "aggressive" caching is that hook_boot() and hook_exit() are called in normal caching and not in aggressive caching. Given that, I don't understand the issue you are facing.
Comment #2
gpk CreditAttribution: gpk commented>hook_boot() and hook_exit() are called in normal caching
Well yes, the performance settings page says
but this is not strictly accurate since hook_exit() is skipped in the situation I described...
In http://api.drupal.org/api/function/drupal_page_cache_header/7, after the 304 and Etag headers are set we immediately hit an exit() ... no invocation of hook_exit()...
Comment #3
gpk CreditAttribution: gpk commentedComment #4
Damien Tournoud CreditAttribution: Damien Tournoud commentedOk, understood. Here is a patch (candidate for the trivial patch of the month).
Comment #5
Dave ReidThis needs to be changed in drupal_page_cache_header. The situation being described is hitting the following logic in drupal_page_cache_header:
Note that the exit() call will not allow the hook_exit call to be made from _drupal_bootstrap:
Attached patch changes the exit in drupal_page_cache_header to return, which will allow processing to return to _drupal_bootstrap and fire the exit hook.
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedPatch in #5 is right. I changed the wrong exit() (oups) in #4 :)
Comment #7
Dave ReidThis bug will need to be backported to D6 and D5 as well, so I've made patches for those versions.
Comment #8
gpk CreditAttribution: gpk commentedWell yes I think Damien was going for the record with a 7-minute patch!
Also I think an exit hook (at least in 7.x+ since this would be an API change) needs to be able to find out if it has been invoked after a cached page or header reponse, or after full page generation. e.g. poormanscron.module -> #231724: Incompatible with page caching (incomplete cron runs, possible PHP fatal errors) and see #5 (as you can see I was struggling to work this out a way round this question - the problem being that poormanscron_exit wants to run cron and that really needs a full bootstrap so you either need to simulate a full bootstrap regardless or else be certain whether it has taken place). Any thoughts?
Comment #9
Damien Tournoud CreditAttribution: Damien Tournoud commented@gpk: This has nothing to do with this issue. hook_exit() have to call drupal_bootstrap() themselves if they need a fully bootstrap environment (it's harmless to run it in an already bootstrapped environment, so the fix to #231724 seems trivial.
Comment #10
gpk CreditAttribution: gpk commented>harmless to run it in an already bootstrapped environment
I tried that and got a PHP error - the problem being that http://api.drupal.org/api/function/_drupal_bootstrap_full/7 sets a header, which is not a particularly clever thing to do in an exit hook. Agreed, this discussion belongs elsewhere.
Comment #11
Dave Reidpoormanscron_exit should be able to run this check to see if the page is being served cached:
Comment #12
Damien Tournoud CreditAttribution: Damien Tournoud commented@gpk: _drupal_full_bootstrap() (or drupal_set_header) should learn to be more clever. Open a new issue so that we can improve them :)
Comment #13
gpk CreditAttribution: gpk commented@Dave: Thanks, I think only drupal_set_header() is needed, very smart :D but maybe the following will do the trick:
@Damien: OK I had a go, not sure I've got it quite right, please improve/clarify as needed ;) #318306: _drupal_full_bootstrap() (or drupal_set_header()) should learn to be more clever
[update] That issue supercecded by #318310: Fatal PHP error with _drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL) on cached request.
Comment #14
Dave ReidTiny tiny patch. Not sure if this needs tests, but it's ready. Marking as RTBC.
Comment #15
Dave ReidShortening title.
Comment #16
Damien Tournoud CreditAttribution: Damien Tournoud commentedA more accurate title.
Comment #17
Damien Tournoud CreditAttribution: Damien Tournoud commentedAnd the executive summary:
When a request for a cached page can be answered with a
304 Not modified
response code,drupal_page_cache_header()
incorrectly stops the execution of the script, preventing hook_exit() to be fired.Should be backported to D5.
Comment #18
Dave ReidI'm going to assume that this can be marked as fixed as I just saw this core commit. You can shoot me if I'm wrong.
Comment #19
webchickAh, shoot. Yeah, I did commit this. Drupal.org's been a little funky tonight. Thanks. :)
However, this should actually be moved down to 6.x.
Comment #20
moshe weitzman CreditAttribution: moshe weitzman commentedThis is going to have a massive effect on statistics module. It will start counting cached visits which is good for reporting but not so good for performance. Statistics itself is not a good idea for high traffic sites so i am not too worried about this change.
Comment #21
Gábor HojtsyAgreed with Moshe in #20, so looks appropriate for committing to Drupal 6. Moving to 5.x.
Comment #22
drummCommitted to 5.x
Comment #23
Anonymous (not verified) CreditAttribution: Anonymous commentedAutomatically closed -- issue fixed for two weeks with no activity.