If I've correctly understood developments in the cache and session handling this year:
#201122: Drupal should support disabling anonymous sessions
#147310: Implement better cache headers for reverse proxies
#477944: Fix and streamline page cache and session handling
#370454: Simplify page caching
the main benefit (for larger sites) is the possibility of easily handling caching outside PHP/Drupal.
The patch at #201122-3: Drupal should support disabling anonymous sessions was I think the first to disable the page cache for anonymous users with non-empty $_SESSION. From #5 in that issue:
I would assume that if a module puts something in $_SESSION for an anonymous user, then the page output of the following pages is not necessarily identical to that of a completely anonymous user with an empty $_SESSION and thus shouldn't be cached
and from #9:
The replacement of count(drupal_set_message()) == 0 with empty($_SESSION) is not essential to the patch and can be omitted without problems. It was just an attempt to generalize the check, i.e. check not only $_SESSION['messages'] but all of $_SESSION.
How does Drupal know that a page displaying the contents of an anonymous user's shopping cart should not be saved to the cache and displayed to other anonymous users?
To answer the last question - a module that modifies the page output depending on $_SESSION should use drupal_page_is_cacheable(FALSE) to stop the current page from being saved into the page cache. Note that http://api.drupal.org/api/function/drupal_set_message/7 already does this. I don't think Drupal core should unilaterally assume that a non-empty $_SESSION is going to affect the current page output.
Benefit: Improved performance (server and browser) for anonymous users with non-empty $_SESSION.
Patch to follow.
| Comment | File | Size | Author |
|---|---|---|---|
| #21 | page_cache_with_anon_session.patch | 3.92 KB | gpk |
| #17 | page_cache_with_anon_session.patch | 4.17 KB | gpk |
| #11 | page_cache_with_anon_session.patch | 6.9 KB | gpk |
| #9 | page_cache_with_anon_session.patch | 6.9 KB | gpk |
| #3 | page_cache_with_anon_session.patch | 5.71 KB | gpk |
Comments
Comment #1
gpk commentedThis includes some minor cleanup of comments.
Not quite convinced it can be this simple...!
#322104: Allow hook_boot() to disable page cache would also be a useful further extension since then, for example, something like a shopping cart could be displayed to anonymous users who have something in it, but other anonymous users could continue to see cached pages.
Comment #2
moshe weitzman commentedMakes sense to me. Would be good to get the authors of those linked issues to chime in here.
Comment #3
gpk commentedThanks moshe, should I try to track 'em down in IRC????
Slightly modified patch.
I think this could be useful for things like http://drupal.org/project/affiliate, which stores referral information in the session. Also for a requirement we have for anonymous sessions #591348: How to enable anonymous sessions.
Actually #322104: Allow hook_boot() to disable page cache would also be *required* if the current behavior introduced in 7.x (where users with data stored in $_SESSION never get a cached response) is ever needed. A contrib module could then easily examine $_SESSION in hook_boot() and decide whether to force a non-cached response.
Comment #4
moshe weitzman commentedOK, lets move this along ... If this gets committed, needs backport into pressflow.
Comment #5
damien tournoud commentedMake sense. But:
Let's move the cache invalidation related to status message to theme_status_messages(), please.
Comment #6
gpk commentedUmm... unless I've misunderstood the suggestion, this cache invalidation is being done during bootstrap *before* trying to get the page from the cache with http://api.drupal.org/api/function/drupal_page_get_cache/7 ...
Also this is cache invalidation of the "don't serve this request from the cache" variety, rather than the "don't save this page into the cache" variety, drupal_page_is_cacheable() seems to be already used for both.
Comment #7
damien tournoud commentedTrue. Which basically proves that this patch is a no-go. There are no way for modules to prevent a page from being served from the cache as soon as another version of the same page has been cached.
Taking the shopping cart example from above, that would mean that a shopping cart block would have to unconditionally call
drupal_page_is_cacheable(FALSE)on every page it is displayed on, even if the cart is actually empty.Comment #8
gpk commented>prevent a page from being served from the cache as soon as another version of the same page has been cached
#322104: Allow hook_boot() to disable page cache, which would allow the cart module to revert Drupal to the behavior currently in core by doing this in hook_boot():
Or it could boot up to DRUPAL_BOOTSTRAP_SESSION phase and actually examine $_SESSION to determine if the cart is empty or not.
Let's combine the patch at #322104-18: Allow hook_boot() to disable page cache into this one and then repurpose the other issue to look at adding additional parameters to the page cache key as you suggested.
Comment #9
gpk commentedNew patch that also allows modules to use hook_boot() prevent the current page request from being served from the cache. Seems to do what it says on the tin with respect to hook_boot().
The only sub-optimal thing I can find is that if you have an anonymous cookie which then gets removed and subsequently you get a cache hit for a page you viewed while you had the cookie you get a 200 cached response rather than a 304 cached response, because of Vary: Cookie, but that was not introduced by this patch.
Comment #10
c960657 commentedI think this looks good and is RTBC with these two tiny nits fixed:
The second line wraps too soon.
Should be “its” instead of “it's”.
Comment #11
gpk commentedDear me, standards slipping ;)
Thanks for dropping by, c9..., and checking this out.
Comment #12
dries commentedNew patch that also allows modules to use hook_boot() prevent the current page request from being served from the cache. Seems to do what it says on the tin with respect to hook_boot().
Can you clarify the second sentence? Not sure I understand what you mean by that.
Comment #13
gpk commentedSorry, you're right, not making myself clear.
What I meant to say was that I tested calling drupal_page_is_cacheable(FALSE) from within a module's hook_boot() and confirmed that this disabled the page cache for the current request (in answer to Damien's #7). I tried a couple of variants - unconditionally calling drupal_page_is_cacheable(FALSE), and conditionally calling it depending on the local path and also depending on whether isset($_COOKIE[session_name()]), thereby replicating the current behavior of core. These variants were probably unnecessary but confirmed in my mind that everything was behaving as it should (and actually highlighted another minor problem: #592482: Regression: data stored in $_SESSION in hook_boot() or hook_exit() during a cached page response is lost).
Comment #14
chx commentedMy problem here is that if you have logic that depends on anonymous users having a session variable that now won't fire unless you explicitly set page no caching. That's a terrible developer experience.
Also, I would like to hear about the use case. My example: you have a shop, the session holds the cart and then every page needs to display that the cart has N items in it, it's not cacheable in its entirety any more. So, what is the use case where you have an anonymous session and yet, you want page cache?
Comment #15
gpk commentedYes I think I can see where you are coming from in that if something is stored in $_SESSION then presumably you'd want to do something with it (e.g. modify the page output) and that can't be done if you are serving cached pages.
Our specific use case is not so much concerned with the content of $_SESSION but using the session to track site visitors. We use this for optimising AdWords campaigns since it's possible to associate individual sessions (and hence numbers of pageviews, registrations) with keywords etc. This supplements the analysis available from within AdWords and Google Analytics.
http://drupal.org/project/affiliate appears to do something similar - it stores referral information in the session, but this remains latent until e.g. the visitor actually signs up.
Comment #16
gpk commentedDifferent approach: default Drupal 7.x behavior is unchanged, but you can override the logic that determines whether the current page may be served from the cache or saved to it, by defining custom functions which will be invoked if they exist (à la custom_url_rewrite...).
Not sure what the merits of this are .. except that it should make chx's developer experience less terrible.. ;) I should say the naming of the "custom..." functions fills me with some revulsion.
Probably the
!empty($user->uid)in drupal_session_initialize() should be put inside the scope of the custom behavior, to be more symmetric with custom_check_if_page_may_be_served_from_cache(). The latter has to include the necessary logic that prevents cached pages being served to logged-in users (the defaultisset($_COOKIE[session_name()]covers this case plus any non-empty session in one fell swoop). Of course this means that if the custom cache checks are done wrongly you could end up serving pages from or saving pages to the cache for logged-in users, but there seems no way of absolutely preventing developers from getting this wrong.With this patch I can get the specific behavior I want by adding the following to settings.php (a custom cache.inc would also work):
Comment #17
gpk commentedAhhhhh trying again!
Comment #19
gpk commentedAdding tag, though it seems not many other people are affected/concerned by this..!
chx doesn't like #11, and I don't really like #16/#17. Another approach would be to have a simple variable that could be set in settings.php (e.g. 'page_cache_with_anonymous_sessions') that is FALSE by default (giving the current behavior) and if set to TRUE gives the current 6.x behavior.
Whether I get round to rolling a patch and whether there is time for it to be reviewed before 1 Dec remains to be seen :P
Comment #20
Anonymous (not verified) commentedsubscribe, late to the party, but this looks interesting.
Comment #21
gpk commentedAnd this is even later!
Implements #19. When variable 'page_cache_with_anonymous_sessions' is not set there is no change from current behavior and anon visitors never get a cached response if something is stored in their $_SESSION.
When variable 'page_cache_with_anonymous_sessions' is set to TRUE in settings.php then anon visitors can receive a cached response even with a non-empty session, but not if a status message is displayed. Logged in users never get a cached response.
Comment #22
gpk commented[udpate: just to note that Firebug 1.4.5 running on Firefox 3.5.5 displays the X-Drupal-Cache header even for a 304 response. Obviously with a 304 response you must have hit the cache, and indeed Drupal tries to send the X-Drupal-Cache: HIT header again, but Firebug prefers to show the cached copy of the header which may say MISS. Don't let this confuse you ...! Live HTTP headers on the other hand doesn't show this header when a 304 is received.]
Comment #23
damien tournoud commentedThere is no ground for this: the session cookie is for uniquely identify a session storage bin. If you want/need to do tracking, use another cookie. Do not mix them.
Marking as won't fix. The cache system as it stands is simple and elegant: we only cache a page if there are no session data. Blindly allowing pages to be cached when there is session data is a receipt for failure.
Comment #24
moshe weitzman commented@Damien - I think you are enforcing your own views on app developers. There could certainly be individual pages within a site where it would be safe to serve from page cache even though user has a non-empty $_SESSION. We are discussing how best to allow this. Agreed that "blindly" cacheing these pages is a bad idea.
Comment #25
damien tournoud commented@Moshe: the best way to do that is to enforce caching in your proxy cache on a page-by-page basis. You, as the site administrator, is the only one that can know whether this is safe or not. Drupal has no such facility (at least in Drupal 7)...
Such a facility would require three things:
(1) a way to ask all the content displayed on the page for their 'cache keys',
(2) a way to generate those cache keys for a given context (requested page, current user, weather in San Francisco, etc.)
(3) a way to fetch the correct cache entry(-ies) from the cache, based on the best match between the context cache keys and the cache keys stored for a given page
We have none of those three things right now, so this particular issue is a won't fix for Drupal 7. We don't need either to promote bad design (ie. using the session for tracking users) nor to allow site administrators to shot themselves in the foot. Moved to Drupal 8.
Comment #26
moshe weitzman commentedThats all great, for sites that run a reverse proxy.
Another way to do it is for admins to identify paths that are safe to serve from page cache. It can be a simple array in settings.php, for example. I'm thinking of an ecommerce site that has a weekly specials page which is very popular. That page would not show the usual personalized shopping cart icon.
Comment #27
gpk commented@26: #21 might do what you want, by letting you define
in settings.php, which lets Drupal use the page cache for (non-empty) anon sessions, as in 6.x. However although this variable could be set conditionally, it might be hard to do so intelligently this early in the bootstrap; you'd probably have to do some tests in cache.inc where you could raise the boostrap level without triggering the page cache. BTW, I'm sure I'm teaching granny to suck eggs ;)
@23:
>If you want/need to do tracking, use another cookie
Well yes, I was rather coming to the same conclusion. Much of my reluctance being 1) statistics.module +Views pretty well does what's needed in 6.x (I found this surprisingly useful and identified various usability problems on one site this way); and 2) I have little understanding (yet) of cookie handling! Maybe it's not so hard.
>Agreed that "blindly" cacheing these pages is a bad idea.
Well at least we all agree there!!
Comment #28
sunsubscribing
Comment #29
andypostis this issue still valid?
Comment #30
TDBA commentedYes it is affecting me right now on Drupal 7.
The fact that a session cookie is set (by my theme) and boost checks is_cacheable breaks boost for me at the moment.
It seems questionable to say "don't cache at all if a session cookie exists".
I wanted to just use patch #21 but it's five years old and some core code changed e.g the patch references "$cache_mode == CACHE_NORMAL" and the current bootstrap.inc has "$cache_enabled" so yeah.
I guess I am just going to edit my session.inc and delete the session check which triggers "drupal_page_is_cacheable(FALSE);".
Comment #39
catchGoing to mark this won't fix again:
1. It's not at all clear how existing code would be notified that it needs to use the kill switch instead of page caching. Additionally, the kill switch only works when you add something to $_SESSION, not when you read from it, and it's the latter that is trickier.
2. Dynamic cache and big pipe work for both authenticated users and anonymous users with a session, so this isn't the all-on/all-off situation it used to be.