Arguably this is a feature/by design but it also causes an inconsistency: when normal page cacheing is enabled and an anonymous user views a page for the 2nd or subsequent times, and an up-to-date copy is in their local browser cache, Drupal returns a 304 not modifed header and exits without invoking exit hooks.

In this situation, for an anonymous user only those page views that actually lead to the page HTML being returned to the browser score a hit, so multiple page views will *usually* not register multiple hits in {accesslog}, whereas multiple page views by an authenticated user *are* logged.

Some discussion here: http://drupal.org/node/307419.

Advantages of current approach:
- marginally better performance when normal cacheing is enabled because hook_exit() is invoked less often
Disadvantages of current approach:
- inconsistency in handling of repeat views of the same page by anonymous users compared with authenticated users
- it is not guaranteed that we are only recording unique page views for anonymous users
- to get consistent results you need to turn off all page cacheing which makes performance much worse

Note that statistics_exit() is the only exit hook currently in core. I needed to record all page views (including all non-unique page views) so ended up turning off cacheing.

For my use case I'd either need to patch session.inc or supply an alternative session library (specified in settings.php via static variable 'session_inc') in order to be able to record all page views and use cacheing at the same time.

We could argue whether non-unique page views should be counted or not (note that analytics packages give you analysis of both these metrics) but the real question here I guess is whether hook_exit() should be invoked after a 304 header response. IMO it should, but possibly http://api.drupal.org/api/function/hook_exit/7 needs an additional argument to indicate whether it is being invoked after a 304 header response, a full but cached page response, or full and dynamically generated page response.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Status: Active » Postponed (maintainer needs more info)

The main difference between normal and "aggressive" caching is that hook_boot() and hook_exit() are called in normal caching and not in aggressive caching. Given that, I don't understand the issue you are facing.

gpk’s picture

>hook_boot() and hook_exit() are called in normal caching
Well yes, the performance settings page says

The normal cache mode is suitable for most sites and does not cause any side effects.

but this is not strictly accurate since hook_exit() is skipped in the situation I described...

In http://api.drupal.org/api/function/drupal_page_cache_header/7, after the 304 and Etag headers are set we immediately hit an exit() ... no invocation of hook_exit()...

gpk’s picture

Status: Postponed (maintainer needs more info) » Active
Damien Tournoud’s picture

Status: Active » Needs review
FileSize
577 bytes

Ok, understood. Here is a patch (candidate for the trivial patch of the month).

Dave Reid’s picture

FileSize
644 bytes

This needs to be changed in drupal_page_cache_header. The situation being described is hitting the following logic in drupal_page_cache_header:

  if ($if_modified_since && $if_none_match
      && $if_none_match == $etag // etag must match
      && $if_modified_since == $last_modified) {  // if-modified-since must match
    header($_SERVER['SERVER_PROTOCOL'] . ' 304 Not Modified');
    // All 304 responses must send an etag if the 200 response for the same object contained an etag
    header("Etag: $etag");
    exit();
  }

Note that the exit() call will not allow the hook_exit call to be made from _drupal_bootstrap:

    case DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE:
      ...
      // If the skipping of the bootstrap hooks is not enforced, call hook_boot.
      if ($cache_mode != CACHE_AGGRESSIVE) {
        bootstrap_invoke_all('boot');
      }
      // If there is a cached page, display it.
      if ($cache) {
        drupal_page_cache_header($cache);
        // If the skipping of the bootstrap hooks is not enforced, call hook_exit.
        if ($cache_mode != CACHE_AGGRESSIVE) {
          bootstrap_invoke_all('exit');
        }
        // We are done.
        exit;
      }

Attached patch changes the exit in drupal_page_cache_header to return, which will allow processing to return to _drupal_bootstrap and fire the exit hook.

Damien Tournoud’s picture

Patch in #5 is right. I changed the wrong exit() (oups) in #4 :)

Dave Reid’s picture

This bug will need to be backported to D6 and D5 as well, so I've made patches for those versions.

gpk’s picture

Well yes I think Damien was going for the record with a 7-minute patch!

Also I think an exit hook (at least in 7.x+ since this would be an API change) needs to be able to find out if it has been invoked after a cached page or header reponse, or after full page generation. e.g. poormanscron.module -> #231724: Incompatible with page caching (incomplete cron runs, possible PHP fatal errors) and see #5 (as you can see I was struggling to work this out a way round this question - the problem being that poormanscron_exit wants to run cron and that really needs a full bootstrap so you either need to simulate a full bootstrap regardless or else be certain whether it has taken place). Any thoughts?

Damien Tournoud’s picture

Also I think an exit hook (at least in 7.x+ since this would be an API change) needs to be able to find out if it has been invoked after a cached page or header response, or after full page generation.

@gpk: This has nothing to do with this issue. hook_exit() have to call drupal_bootstrap() themselves if they need a fully bootstrap environment (it's harmless to run it in an already bootstrapped environment, so the fix to #231724 seems trivial.

gpk’s picture

>harmless to run it in an already bootstrapped environment
I tried that and got a PHP error - the problem being that http://api.drupal.org/api/function/_drupal_bootstrap_full/7 sets a header, which is not a particularly clever thing to do in an exit hook. Agreed, this discussion belongs elsewhere.

Dave Reid’s picture

poormanscron_exit should be able to run this check to see if the page is being served cached:

global $user;
if ($user->uid || variable_get('cache', CACHE_DISABLED) == CACHE_DISABLED || drupal_set_header()) {
  // Run!
}
Damien Tournoud’s picture

@gpk: _drupal_full_bootstrap() (or drupal_set_header) should learn to be more clever. Open a new issue so that we can improve them :)

gpk’s picture

Title: hook_exit() not invoked for some cached requests » hook_exit() is not invoked when a 304 (not modifed) header is returned to browser (causes inconsistency in page access logging)
Status: Reviewed & tested by the community » Needs review

@Dave: Thanks, I think only drupal_set_header() is needed, very smart :D but maybe the following will do the trick:

@Damien: OK I had a go, not sure I've got it quite right, please improve/clarify as needed ;) #318306: _drupal_full_bootstrap() (or drupal_set_header()) should learn to be more clever

[update] That issue supercecded by #318310: Fatal PHP error with _drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL) on cached request.

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

Tiny tiny patch. Not sure if this needs tests, but it's ready. Marking as RTBC.

Dave Reid’s picture

Title: hook_exit() is not invoked when a 304 (not modifed) header is returned to browser (causes inconsistency in page access logging) » hook_exit() not invoked for cached requests

Shortening title.

Damien Tournoud’s picture

Title: hook_exit() not invoked for cached requests » hook_exit() not invoked for some cached requests

A more accurate title.

Damien Tournoud’s picture

And the executive summary:

When a request for a cached page can be answered with a 304 Not modified response code, drupal_page_cache_header() incorrectly stops the execution of the script, preventing hook_exit() to be fired.

Should be backported to D5.

Dave Reid’s picture

Title: hook_exit() is not invoked when a 304 (not modifed) header is returned to browser (causes inconsistency in page access logging) » hook_exit() not invoked for some cached requests
Status: Needs review » Fixed

I'm going to assume that this can be marked as fixed as I just saw this core commit. You can shoot me if I'm wrong.

webchick’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Reviewed & tested by the community

Ah, shoot. Yeah, I did commit this. Drupal.org's been a little funky tonight. Thanks. :)

However, this should actually be moved down to 6.x.

moshe weitzman’s picture

This is going to have a massive effect on statistics module. It will start counting cached visits which is good for reporting but not so good for performance. Statistics itself is not a good idea for high traffic sites so i am not too worried about this change.

Gábor Hojtsy’s picture

Version: 6.x-dev » 5.x-dev

Agreed with Moshe in #20, so looks appropriate for committing to Drupal 6. Moving to 5.x.

drumm’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 5.x

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.