Problem/Motivation

This is step one of #2257695: [meta] Modernize the page cache.

In Drupal 7 it was necessary to enable internal page caching in order make external page caching work. In Drupal 8 this was reversed and now internal page caching is only possible when external caching is configured (i.e. a max-age value has been set). This interdependence is purely artificial. This is due to the fact that the code responsible for generating appropriate Cache-Control for external caches originating in Pressflow 6 was just bolted onto the existing internal page cache implementation during the development of Drupal 7.

Proposed resolution

Extract the code responsible for generating Cache-Control and related headers from drupal_serve_page_from_cache into the FinishResponseSubscriber. Ensure that the internal page cache exclusively operates on Symfony requests/responses and stop passing around the cache-object.

Remaining tasks

User interface changes

API changes

  • Remove the $cache parameter from drupal_serve_page_from_cache()
  • drupal_page_get_cache() returns a symfony response object instead of a cache-object
  • drupal_page_set_cache() returns nothing instead of a cache-object
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

znerol’s picture

Status: Active » Needs review
FileSize
28.69 KB
znerol’s picture

Regrettably the first patch is unreadable around drupal_serve_page_from_cache(), therefore I rearranged the remaining code such that we get a better version. Also used the --histogram algorithm, hope this is easier to read.

Wim Leers’s picture

Issue tags: +Performance, +Fast By Default
FileSize
6.12 KB
29.27 KB

Also tagging with Performance and Fast By Default, because correct HTTP cache revalidation support is essential for good performance.


Great work, as usual :)

I found a few subtle bugs that I fixed (see below) and fixed a bunch of nitpicks. You can use HTTP expert Mark Notthingham's http://redbot.org (also comes as a CLI tool) to verify the correctness of HTTP header usage — it helped me verify that the two bugs I found were actually bugs, and that my fixes actually fixed it.

  1. +++ b/core/includes/bootstrap.inc
    @@ -865,100 +866,60 @@ function drupal_page_header() {
    +  // Only allow caching in the browser and prevent that the response is stored
    +  // by an external proxy server when the following conditions apply:
    +  // 1. There is a session cookie on the request.
    +  // 2. The Vary: Cookie header is on the response.
    +  // 3. The Cache-Control header does not contain the no-cache directive.
    

    Oh my … this is *so* much clearer!

  2. +++ b/core/includes/bootstrap.inc
    @@ -865,100 +866,60 @@ function drupal_page_header() {
    +    if ($if_modified_since && $if_none_match
    +      && $if_none_match == $response->getEtag() // etag must match
    +      && $if_modified_since == $created) {  // if-modified-since must match
    

    This means a 304 response will only be sent if both If-Modified-Since and If-None-Modified request headers are present. Either of them should be sufficient.

    Fixed.

  3. +++ b/core/includes/bootstrap.inc
    @@ -865,100 +866,60 @@ function drupal_page_header() {
    +      // In the case of a 304 response, certain headers must be sent, and the
    +      // remaining may not (see RFC 2616, section 10.3.5).
    +      foreach (array_keys($response->headers->all()) as $name) {
    +        if (!in_array($name, array('content-location', 'expires', 'cache-control', 'vary'))) {
    +          $response->headers->remove($name);
    +        }
    +      }
    

    This should also allow the ETag header to be sent.

    Fixed.

The last submitted patch, 3: 2257709-extract-cache-control-3.patch, failed testing.

Wim Leers’s picture

FileSize
33.34 KB
4.72 KB

The existing test coverage was wrong. It is not required for both If-Modified-Since and If-None-Match to be specified to trigger a 304 response. If both an ETag and a Last-Modified response header are specified, then UAs SHOULD use both for HTTP revalidation, but it's not a requirement.

Fixed the test coverage, should come back green now.

znerol’s picture

If we are changing the behavior anyway, I suggest to just use the Response::isNotModified() method. However, as a result the two assertions checking whether revalidation with "obsolete If-Modified-Since date" works will fail. In my opinion we could remove them which would basically mean reverting #327269: drupal_page_cache_header() should compare timestamp.

Crell’s picture

Status: Needs review » Needs work
  1. +++ b/core/includes/bootstrap.inc
    @@ -865,100 +866,62 @@ function drupal_page_header() {
    +  // Perform HTTP revalidation.
    +  $last_modified = $response->getLastModified();
    +  if ($last_modified) {
    +    // See if the client has provided the required HTTP headers.
    +    $if_modified_since = $request->server->has('HTTP_IF_MODIFIED_SINCE') ? strtotime($request->server->get('HTTP_IF_MODIFIED_SINCE')) : FALSE;
    +    $if_none_match = $request->server->has('HTTP_IF_NONE_MATCH') ? stripslashes($request->server->get('HTTP_IF_NONE_MATCH')) : FALSE;
    +
    +    $last_modified_match = $if_modified_since === $last_modified->getTimestamp();
    +    $etag_match = $if_none_match == $response->getEtag();
    +
    +    if (($if_modified_since && $if_none_match && $last_modified_match && $etag_match) // If both revalidation request headers are present, apply both.
    +        || ($if_modified_since && !$if_none_match && $last_modified_match) // Only Last-Modified must match.
    +        || ($if_none_match && !$if_modified_since && $etag_match) // Only ETag must match.
    +    ) {
    +      $response->setStatusCode(304);
    +      $response->setContent(NULL);
    +
    +      // In the case of a 304 response, certain headers must be sent, and the
    +      // remaining may not (see RFC 2616, section 10.3.5).
    +      foreach (array_keys($response->headers->all()) as $name) {
    +        if (!in_array($name, array('etag', 'content-location', 'expires', 'cache-control', 'vary'))) {
    +          $response->headers->remove($name);
    +        }
    +      }
    +    }
    

    Most of this is already possible directly in the Response object. We don't need to reimplement it.

    f ($response->isNotModified($request)) {
      $response->setNotModified();
    }
    

    In fact, here's the entire cache-handling code I had from a recent Silex project.

    $cache_lifetime = $app['cache.lifetime'] ?: 0;
    
    $response->setTtl($cache_lifetime)->setClientTtl($cache_lifetime);
    
    $expires = new \DateTime("now +{$cache_lifetime} seconds", new \DateTimeZone('UTC'));
    $response->setExpires($expires);
    
    $etag = sha1($response->getContent());
    $response->setEtag($etag);
    
    f ($response->isNotModified($request)) {
      $response->setNotModified();
    }
    

    Symfony takes care of most of this for us...

  2. +++ b/core/includes/common.inc
    @@ -2873,55 +2873,42 @@ function _drupal_bootstrap_full($skip = FALSE) {
    +  if ($response->headers->has('Expires')) {
    +    $timestamp = $response->getExpires()->getTimestamp();
    +    if ($timestamp > REQUEST_TIME) {
    +      $expire = $timestamp;
         }
    

    The initial if() is not necessary, as getExpires() will always return a value. If there is no header the expiration date will always be in the past, so we know that it shouldn't be cached.

    There's also no need to convert back down to ints. DateTime can compare against itself just fine.

  3. +++ b/core/includes/common.inc
    @@ -2873,55 +2873,42 @@ function _drupal_bootstrap_full($skip = FALSE) {
    +  $tags = HtmlViewSubscriber::convertHeaderToCacheTags($response->headers->get('X-Drupal-Cache-Tags'));
    

    I'm sure this isn't the issue adding this static, but WTF? This is a completely unnecessary hard dependency. When was that added?

  4. +++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php
    @@ -57,57 +72,153 @@ public function onRespond(FilterResponseEvent $event) {
    +    // There is no point in sending along headers necessary for cache
    +    // revalidation, if caching by proxies and browsers is denied in the first
    +    // place. Therefore remove ETag, Last-Modified and Vary in that case.
    +    $response->setEtag(NULL);
    +    $response->setLastModified(NULL);
    +    $response->setVary(NULL);
    

    This seems like something to push upstream to Symfony. (Not a blocker here, but a @todo for us.)

  5. +++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php
    @@ -57,57 +72,153 @@ public function onRespond(FilterResponseEvent $event) {
    +    $response->headers->set('Cache-Control', 'public, max-age=' . $max_age);
    

    $response->setMaxAge().

  6. +++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php
    @@ -57,57 +72,153 @@ public function onRespond(FilterResponseEvent $event) {
    +    $response->setEtag($timestamp);
    

    Why is the timestamp a reasonable ETag? Wouldn't a sha1() of the content be more reasonable? It's the last time the *content* changed, not the last-modified header we're already messing with.

  7. +++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php
    @@ -57,57 +72,153 @@ public function onRespond(FilterResponseEvent $event) {
    +  protected function setCacheControlNoCache(Response $response) {
    +    $response->headers->set('cache-control', 'no-cache, must-revalidate, post-check=0, pre-check=0');
    +  }
    

    If there isn't already a single $response-> method to do this, we should submit that upstream, too. (I'd be surprised if there isn't.) The prepare() method will do some cache mangling anyway.

  8. +++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php
    @@ -57,57 +72,153 @@ public function onRespond(FilterResponseEvent $event) {
    +  protected function setExpiresNoCache(Response $response) {
    +    $response->setExpires(\DateTime::createFromFormat('j-M-Y H:i:s T', '19-Nov-1978 05:00:00 GMT'));
       }
    

    Ibid. Although I suppose this one we can leave in, as we have reasons... ;-)

  9. +++ b/core/modules/system/lib/Drupal/system/Tests/Bootstrap/PageCacheTest.php
    @@ -196,15 +215,31 @@ function testPageCache() {
    diff --git a/core/modules/toolbar/toolbar.module b/core/modules/toolbar/toolbar.module
    
    diff --git a/core/modules/toolbar/toolbar.module b/core/modules/toolbar/toolbar.module
    index 0f58cb2..f1efa61 100644
    
    index 0f58cb2..f1efa61 100644
    --- a/core/modules/toolbar/toolbar.module
    
    --- a/core/modules/toolbar/toolbar.module
    +++ b/core/modules/toolbar/toolbar.module
    
    +++ b/core/modules/toolbar/toolbar.module
    +++ b/core/modules/toolbar/toolbar.module
    @@ -120,13 +120,11 @@ function _toolbar_initialize_page_cache() {
    
    @@ -120,13 +120,11 @@ function _toolbar_initialize_page_cache() {
       // If we have a cache, serve it.
       // @see _drupal_bootstrap_page_cache()
       $request = \Drupal::request();
    -  $cache = drupal_page_get_cache($request);
    -  if (is_object($cache)) {
    -    $response = new Response();
    +  $response = drupal_page_get_cache($request);
    +  if ($response) {
    

    Not introduced by this patch, I know, but I have to ask... why is toolbar module doing its own cache serving? That seems... not good.

sun’s picture

why is toolbar module doing its own cache serving? That seems... not good.

That code was invented in admin_menu module, and mostly copied as-is into core by others. I was surprised that it did end up in core almost identically — I always considered it as a gross hack, an acceptable experiment for a contrib module, but never intended that code to be taken over literally into Drupal core (because core is supposed to provide an architecture that makes such hacks unnecessary).

In short, I agree, it's not only bad, it's also a huge can of worms for unexpected bugs/regressions, because that code is (intentionally) stepping out of all regular pipelines. We can and we should do better in D8 now. Prolly a separate issue tho.

znerol’s picture

Status: Needs work » Needs review
FileSize
29.04 KB
9.31 KB

#7.1, #7.6: Let's keep the focus. Rolled changes affecting cache-revalidation from #3 and #5 and opened a followup which will address this separately: #2259489: Leverage symfony for HTTP revalidation

#7.2: Fixed.

#7.5, #7.7: The various response-methods operate on individual directives of the Cache-Control header. However, we need to replace the whole header in order to effectively remove any directives previously set on it.

Re #7.9 and #8: As shown in #2177461: Refactor page caching into a service we can easily support that use case in a proper way with the right architecture (pluggable policy).

Also fixed a problem where controllers add their own Cache-Control headers. Those headers should only be replaced when page caching was cancelled (e.g. by a message).

sun’s picture

Status: Needs review » Reviewed & tested by the community

This looks very good to me. Further improvements/adjustments should be performed in separate follow-up issues, I think.

Wim Leers’s picture

RTBC+1.

Sorry for the derailing in #3 & #5 — I didn't realize you were just moving around existing code. I.e. my remarks were out of scope. My apologies!

Anonymous’s picture

i was pointed at this patch from here #2259823: Make cache->created be to the millisecond.

just wanted to +1 the RTBC, this looks good to go to me.

znerol’s picture

znerol’s picture

Status: Reviewed & tested by the community » Needs review
Crell’s picture

Status: Needs review » Reviewed & tested by the community

I'm not sure it's really necessary to punt using the Symfony code to a follow up, but hey, progress is progress. Let's get this in and move on to #2259489: Leverage symfony for HTTP revalidation as quickly as possible.

Wim Leers’s picture

Agreed; baby steps!

catch’s picture

Status: Reviewed & tested by the community » Needs review
-
-  // Send the remaining headers.
-  foreach ($cache->data['headers'] as $name => $value) {
-    $response->headers->set($name, $value);
-    drupal_add_http_header($name, $value);
-  }

I see this removed, but I don't see it put back anywhere?

znerol’s picture

We're storing the whole Response in the cache (including status code, headers, everything).

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Question answered.

catch’s picture

Status: Reviewed & tested by the community » Fixed

Yes that makes lots of sense and cleans things up a lot.

Committed/pushed to 8.x, thanks!

  • Commit 5a01a26 on 8.x by catch:
    Issue #2257709 by znerol, Wim Leers: Remove the interdependence between...
Wim Leers’s picture

So now on to the next step in #2257695: [meta] Modernize the page cache:

2. Introduce a robust and extensible cache-policy framework

:)

Great work, znerol!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.