If I've correctly understood developments in the cache and session handling this year:
#201122: Drupal should support disabling anonymous sessions
#147310: Implement better cache headers for reverse proxies
#477944: Fix and streamline page cache and session handling
#370454: Simplify page caching
the main benefit (for larger sites) is the possibility of easily handling caching outside PHP/Drupal.

The patch at #201122-3: Drupal should support disabling anonymous sessions was I think the first to disable the page cache for anonymous users with non-empty $_SESSION. From #5 in that issue:

I would assume that if a module puts something in $_SESSION for an anonymous user, then the page output of the following pages is not necessarily identical to that of a completely anonymous user with an empty $_SESSION and thus shouldn't be cached

and from #9:

The replacement of count(drupal_set_message()) == 0 with empty($_SESSION) is not essential to the patch and can be omitted without problems. It was just an attempt to generalize the check, i.e. check not only $_SESSION['messages'] but all of $_SESSION.

How does Drupal know that a page displaying the contents of an anonymous user's shopping cart should not be saved to the cache and displayed to other anonymous users?

To answer the last question - a module that modifies the page output depending on $_SESSION should use drupal_page_is_cacheable(FALSE) to stop the current page from being saved into the page cache. Note that http://api.drupal.org/api/function/drupal_set_message/7 already does this. I don't think Drupal core should unilaterally assume that a non-empty $_SESSION is going to affect the current page output.

Benefit: Improved performance (server and browser) for anonymous users with non-empty $_SESSION.

Patch to follow.

Comments

gpk’s picture

Status: Active » Needs review
StatusFileSize
new5.69 KB

This includes some minor cleanup of comments.

Not quite convinced it can be this simple...!

#322104: Allow hook_boot() to disable page cache would also be a useful further extension since then, for example, something like a shopping cart could be displayed to anonymous users who have something in it, but other anonymous users could continue to see cached pages.

moshe weitzman’s picture

Makes sense to me. Would be good to get the authors of those linked issues to chime in here.

gpk’s picture

Title: Page cache could/should work for anonymous users with non-empty $_SESSION » Page cache should work for anonymous users with non-empty $_SESSION
StatusFileSize
new5.71 KB

Thanks moshe, should I try to track 'em down in IRC????

Slightly modified patch.

I think this could be useful for things like http://drupal.org/project/affiliate, which stores referral information in the session. Also for a requirement we have for anonymous sessions #591348: How to enable anonymous sessions.

Actually #322104: Allow hook_boot() to disable page cache would also be *required* if the current behavior introduced in 7.x (where users with data stored in $_SESSION never get a cached response) is ever needed. A contrib module could then easily examine $_SESSION in hook_boot() and decide whether to force a non-cached response.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

OK, lets move this along ... If this gets committed, needs backport into pressflow.

damien tournoud’s picture

Status: Reviewed & tested by the community » Needs work

Make sense. But:

-    if (!empty($user->uid) || !empty($_SESSION)) {
+    // Disable caching if user is logged in or if there are messages to
+    // display from a previous page request.
+    if (!empty($user->uid) || count(drupal_get_messages(NULL, FALSE)) > 0) {
       drupal_page_is_cacheable(FALSE);
     }
   }

Let's move the cache invalidation related to status message to theme_status_messages(), please.

gpk’s picture

Status: Needs work » Needs review

Umm... unless I've misunderstood the suggestion, this cache invalidation is being done during bootstrap *before* trying to get the page from the cache with http://api.drupal.org/api/function/drupal_page_get_cache/7 ...

Also this is cache invalidation of the "don't serve this request from the cache" variety, rather than the "don't save this page into the cache" variety, drupal_page_is_cacheable() seems to be already used for both.

damien tournoud’s picture

Status: Needs review » Needs work

True. Which basically proves that this patch is a no-go. There are no way for modules to prevent a page from being served from the cache as soon as another version of the same page has been cached.

Taking the shopping cart example from above, that would mean that a shopping cart block would have to unconditionally call drupal_page_is_cacheable(FALSE) on every page it is displayed on, even if the cart is actually empty.

gpk’s picture

>prevent a page from being served from the cache as soon as another version of the same page has been cached
#322104: Allow hook_boot() to disable page cache, which would allow the cart module to revert Drupal to the behavior currently in core by doing this in hook_boot():

if (isset($_COOKIE[session_name()])) {
  drupal_page_is_cacheable(FALSE)
}

Or it could boot up to DRUPAL_BOOTSTRAP_SESSION phase and actually examine $_SESSION to determine if the cart is empty or not.

Let's combine the patch at #322104-18: Allow hook_boot() to disable page cache into this one and then repurpose the other issue to look at adding additional parameters to the page cache key as you suggested.

gpk’s picture

Status: Needs work » Needs review
StatusFileSize
new6.9 KB

New patch that also allows modules to use hook_boot() prevent the current page request from being served from the cache. Seems to do what it says on the tin with respect to hook_boot().

The only sub-optimal thing I can find is that if you have an anonymous cookie which then gets removed and subsequently you get a cache hit for a page you viewed while you had the cookie you get a 200 cached response rather than a 304 cached response, because of Vary: Cookie, but that was not introduced by this patch.

c960657’s picture

I think this looks good and is RTBC with these two tiny nits fixed:

+ * This happens, for example, when the user's $_SESSION contains status
+ * messages from a form submission that should not be cached
+ * and displayed to other users.

The second line wraps too soon.

+          // Make sure there is a user object because it's timestamp will be

Should be “its” instead of “it's”.

gpk’s picture

StatusFileSize
new6.9 KB

Dear me, standards slipping ;)

Thanks for dropping by, c9..., and checking this out.

dries’s picture

New patch that also allows modules to use hook_boot() prevent the current page request from being served from the cache. Seems to do what it says on the tin with respect to hook_boot().

Can you clarify the second sentence? Not sure I understand what you mean by that.

gpk’s picture

Sorry, you're right, not making myself clear.

What I meant to say was that I tested calling drupal_page_is_cacheable(FALSE) from within a module's hook_boot() and confirmed that this disabled the page cache for the current request (in answer to Damien's #7). I tried a couple of variants - unconditionally calling drupal_page_is_cacheable(FALSE), and conditionally calling it depending on the local path and also depending on whether isset($_COOKIE[session_name()]), thereby replicating the current behavior of core. These variants were probably unnecessary but confirmed in my mind that everything was behaving as it should (and actually highlighted another minor problem: #592482: Regression: data stored in $_SESSION in hook_boot() or hook_exit() during a cached page response is lost).

chx’s picture

My problem here is that if you have logic that depends on anonymous users having a session variable that now won't fire unless you explicitly set page no caching. That's a terrible developer experience.

Also, I would like to hear about the use case. My example: you have a shop, the session holds the cart and then every page needs to display that the cart has N items in it, it's not cacheable in its entirety any more. So, what is the use case where you have an anonymous session and yet, you want page cache?

gpk’s picture

Yes I think I can see where you are coming from in that if something is stored in $_SESSION then presumably you'd want to do something with it (e.g. modify the page output) and that can't be done if you are serving cached pages.

Our specific use case is not so much concerned with the content of $_SESSION but using the session to track site visitors. We use this for optimising AdWords campaigns since it's possible to associate individual sessions (and hence numbers of pageviews, registrations) with keywords etc. This supplements the analysis available from within AdWords and Google Analytics.

http://drupal.org/project/affiliate appears to do something similar - it stores referral information in the session, but this remains latent until e.g. the visitor actually signs up.

gpk’s picture

Different approach: default Drupal 7.x behavior is unchanged, but you can override the logic that determines whether the current page may be served from the cache or saved to it, by defining custom functions which will be invoked if they exist (à la custom_url_rewrite...).

Not sure what the merits of this are .. except that it should make chx's developer experience less terrible.. ;) I should say the naming of the "custom..." functions fills me with some revulsion.

Probably the !empty($user->uid) in drupal_session_initialize() should be put inside the scope of the custom behavior, to be more symmetric with custom_check_if_page_may_be_served_from_cache(). The latter has to include the necessary logic that prevents cached pages being served to logged-in users (the default isset($_COOKIE[session_name()] covers this case plus any non-empty session in one fell swoop). Of course this means that if the custom cache checks are done wrongly you could end up serving pages from or saving pages to the cache for logged-in users, but there seems no way of absolutely preventing developers from getting this wrong.

With this patch I can get the specific behavior I want by adding the following to settings.php (a custom cache.inc would also work):

/**
 * Enable page cache for anonymous users with non-empty $_SESSION.
 */
function custom_check_if_request_may_be_served_from_cache() {
  // We don't know yet if the user is anonymous or not.
  if (isset($_COOKIE[session_name()])) {
    // Need to check the session ... initializing the session in effect checks
    // whether the page may be served from cache :-) ... first of all checking for
    // $user->uid and possibly invoking custom_check_if_page_may_be_saved_to_cache(),
    // and calling drupal_page_is_cacheable(FALSE) if appropriate. This
    // blocks the serving of a cached page in drupal_page_get_cache().
    drupal_bootstrap(DRUPAL_BOOTSTRAP_SESSION, FALSE);
  }
  // Always return TRUE: if the request should not be served from the cache
  // then drupal_page_is_cacheable(FALSE) will have been called from
  // session_initialize().
  return TRUE;
}

function custom_check_if_page_may_be_saved_to_cache() {
  // If the user is logged in then the saving of the page to the cache will
  // already have been prevented. For anonymous users we are happy to save the
  // page to the cache if $_SESSION is non-empty, unless there are status
  // messages to display. 
  return count(drupal_get_messages(NULL, FALSE)) == 0;
}
gpk’s picture

StatusFileSize
new4.17 KB

Ahhhhh trying again!

Status: Needs review » Needs work

The last submitted patch failed testing.

gpk’s picture

Issue tags: +Performance

Adding tag, though it seems not many other people are affected/concerned by this..!

chx doesn't like #11, and I don't really like #16/#17. Another approach would be to have a simple variable that could be set in settings.php (e.g. 'page_cache_with_anonymous_sessions') that is FALSE by default (giving the current behavior) and if set to TRUE gives the current 6.x behavior.

Whether I get round to rolling a patch and whether there is time for it to be reviewed before 1 Dec remains to be seen :P

Anonymous’s picture

subscribe, late to the party, but this looks interesting.

gpk’s picture

StatusFileSize
new3.92 KB

And this is even later!

Implements #19. When variable 'page_cache_with_anonymous_sessions' is not set there is no change from current behavior and anon visitors never get a cached response if something is stored in their $_SESSION.

When variable 'page_cache_with_anonymous_sessions' is set to TRUE in settings.php then anon visitors can receive a cached response even with a non-empty session, but not if a status message is displayed. Logged in users never get a cached response.

gpk’s picture

Status: Needs work » Needs review

[udpate: just to note that Firebug 1.4.5 running on Firefox 3.5.5 displays the X-Drupal-Cache header even for a 304 response. Obviously with a 304 response you must have hit the cache, and indeed Drupal tries to send the X-Drupal-Cache: HIT header again, but Firebug prefers to show the cached copy of the header which may say MISS. Don't let this confuse you ...! Live HTTP headers on the other hand doesn't show this header when a 304 is received.]

damien tournoud’s picture

Status: Needs review » Closed (won't fix)

Our specific use case is not so much concerned with the content of $_SESSION but using the session to track site visitors. We use this for optimising AdWords campaigns since it's possible to associate individual sessions (and hence numbers of pageviews, registrations) with keywords etc. This supplements the analysis available from within AdWords and Google Analytics.

There is no ground for this: the session cookie is for uniquely identify a session storage bin. If you want/need to do tracking, use another cookie. Do not mix them.

Marking as won't fix. The cache system as it stands is simple and elegant: we only cache a page if there are no session data. Blindly allowing pages to be cached when there is session data is a receipt for failure.

moshe weitzman’s picture

Status: Closed (won't fix) » Needs review

@Damien - I think you are enforcing your own views on app developers. There could certainly be individual pages within a site where it would be safe to serve from page cache even though user has a non-empty $_SESSION. We are discussing how best to allow this. Agreed that "blindly" cacheing these pages is a bad idea.

damien tournoud’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs review » Needs work

@Moshe: the best way to do that is to enforce caching in your proxy cache on a page-by-page basis. You, as the site administrator, is the only one that can know whether this is safe or not. Drupal has no such facility (at least in Drupal 7)...

Such a facility would require three things:

(1) a way to ask all the content displayed on the page for their 'cache keys',
(2) a way to generate those cache keys for a given context (requested page, current user, weather in San Francisco, etc.)
(3) a way to fetch the correct cache entry(-ies) from the cache, based on the best match between the context cache keys and the cache keys stored for a given page

We have none of those three things right now, so this particular issue is a won't fix for Drupal 7. We don't need either to promote bad design (ie. using the session for tracking users) nor to allow site administrators to shot themselves in the foot. Moved to Drupal 8.

moshe weitzman’s picture

Thats all great, for sites that run a reverse proxy.

Another way to do it is for admins to identify paths that are safe to serve from page cache. It can be a simple array in settings.php, for example. I'm thinking of an ecommerce site that has a weekly specials page which is very popular. That page would not show the usual personalized shopping cart icon.

gpk’s picture

@26: #21 might do what you want, by letting you define

$conf['page_cache_with_anonymous_sessions'] = TRUE;

in settings.php, which lets Drupal use the page cache for (non-empty) anon sessions, as in 6.x. However although this variable could be set conditionally, it might be hard to do so intelligently this early in the bootstrap; you'd probably have to do some tests in cache.inc where you could raise the boostrap level without triggering the page cache. BTW, I'm sure I'm teaching granny to suck eggs ;)

@23:
>If you want/need to do tracking, use another cookie
Well yes, I was rather coming to the same conclusion. Much of my reluctance being 1) statistics.module +Views pretty well does what's needed in 6.x (I found this surprisingly useful and identified various usability problems on one site this way); and 2) I have little understanding (yet) of cookie handling! Maybe it's not so hard.

>Agreed that "blindly" cacheing these pages is a bad idea.
Well at least we all agree there!!

sun’s picture

andypost’s picture

is this issue still valid?

TDBA’s picture

Yes it is affecting me right now on Drupal 7.

The fact that a session cookie is set (by my theme) and boost checks is_cacheable breaks boost for me at the moment.

It seems questionable to say "don't cache at all if a session cookie exists".

I wanted to just use patch #21 but it's five years old and some core code changed e.g the patch references "$cache_mode == CACHE_NORMAL" and the current bootstrap.inc has "$cache_enabled" so yeah.

I guess I am just going to edit my session.inc and delete the session check which triggers "drupal_page_is_cacheable(FALSE);".

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

catch’s picture

Status: Needs work » Closed (won't fix)

Going to mark this won't fix again:

1. It's not at all clear how existing code would be notified that it needs to use the kill switch instead of page caching. Additionally, the kill switch only works when you add something to $_SESSION, not when you read from it, and it's the latter that is trickier.

2. Dynamic cache and big pipe work for both authenticated users and anonymous users with a session, so this isn't the all-on/all-off situation it used to be.