| Project: | Drupal core |
| Version: | 8.x-dev |
| Component: | base system |
| Category: | task |
| Priority: | major |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | kernel-followup, Needs profiling, symfony, WSCCI |
Issue Summary
One of the big intended benefits from the Kernel work is the ability to use Http caching logic directly rather than implementing our own caching logic. The Kernel component includes a PHP-space implementation of an Http reverse proxy cache. We want to use that for all kernel-called page segments; that is, the entire page itself plus all of the sub-requests. The easiest way to do that is to move the kernel from the 'kernel' DI entry to some other entry, and then make the 'kernel' entry an instance of the HttpCache object, configured with a Store class of our own that saves stuff to the Drupal cache system.
For now, we should let the HttpCache use whatever logic it uses. We can refine it later after the pieces are in place.
Of course, we'll also want a way to disable that cache object when a site is running behind a real proxy cache like Varnish, which is going to be way faster anyway.
This probably needs to wait for #1595146: Load the HttpKernel from the DI Container
Comments
#1
Tagging.
#2
This needs benchmarks, we may well want to do it even if there's a regression, but let's not have another nasty surprise like #1064212: Page caching performance has regressed by 30-40% in D7 (which was due to a very tiny, innocent patch).
I'm also interested to see how this could tie in with cache tag support. We don't have proper content-based cache tags attached to cache items yet, but there's proof of concept code that integrates with cache tags with varnish already, so if varnish can handle it, then HttpCache should be able to as well.
#3
Agreed entirely with #2. I have no idea how cache tag support could tie in here, frankly, but it should be investigated. :-)
That said, there are key differences between the way Drupal has traditionally cached things and the way Symfony assumes you'll cache things.
In Drupal, we assume "cache forever, but flush often". That emphasizes data freshness over performance or cache efficency, and results in less useful caches.
Symfony, from what I've seen, encourages "cache briefly, don't bother flushing." That emphasizes cache efficiency over data freshness, at the expense of some rendered information being briefly stale.
I don't believe we should switch over to "cache briefly, don't flush" wholesale; however, we should consider where that is a more effective strategy; if the code is simplier and more stable and more reliable if we cache for 5 seconds and accept that "meh, this view could be stale for 5 seconds, whatever", is that OK? Where is that OK?
I don't know, but we should be asking that.
#4
I think that's a site-specific question. You can currently set Drupal up to take either of these approaches via contrib and/or configuration, and we should be careful not to close either of them off to much (for example if you don't want content to clear on cache tags, it's 5 minutes to write a cache backend that ignores them).
#5
Well, then we need to make it code-free to adjust your caching strategy, at least within reason. That won't be a simple task. :-)
Either way, I think the first implementation should be kept as simple as possible and just restructure the code to use/not use HttpCache wrapped around the kernel.
#6
By cache tags you mean a unique identifier for the content, so that if the tags are equal, then the content does not need to be regenerated? HTTP (and Symfony2) supports the validation approach of caching via ETags.
#7
re #6, cache tags as in the cache tags support in Drupal's cache API - http://drupalcode.org/project/drupal.git/blob/refs/heads/8.x:/core/lib/D...
#8
Here's a rough POC that I've been working on.
This should replace our default page caching with the file-based HttpCache. We'll obviously need to make additional storage backends, but I'm not against changing our default MySQL backend for flat files.
The biggest problem I see thus far is that the earliest we can serve a page from cache is after DRUPAL_BOOTSTRAP_CODE. This is much later than I'd like, but as long as we know there's work to be done to instantiate the HttpKernel earlier, this shouldnt be a blocker.
#9
bot snack?
#10
The last submitted patch, drupal_httpcache.patch, failed testing.
#11
+++ b/core/lib/Drupal/Core/HttpCache.php@@ -0,0 +1,14 @@
+use Symfony\Component\HttpKernel\HttpCache\HttpCache as SymfonyHttpCache;
+
+class HttpCache extends SymfonyHttpCache {
+ protected function getOptions() {
+ $config = config('system.performance');
+ return array(
+ 'debug' => TRUE
+ );
+ }
+}
As soon as the bundles patch goes in, which I believe puts the config system into the DIC, we should start injecting it and eliminating the use of the config() wrapper function.
I especially like how this is 2/3 removing code rather than adding it. :-) Is that because the existing Symfony code does what we were doing, or because you just didn't get to that part yet?
#12
Agreed.
A little of both. Its hard to follow at the moment because we left in a bunch of code that calls header() directly after the initial kernel patch. In this case it was easier to just rip that code out and get the cache working as we'd expect it to. A lot of what is removed is handling around 304s, which I think we're getting already. It's possible that we'll need to use HttpCache's validation to do some additional 304 checking.
#13
Gotcha. I'm OK with a temporary regression in the elegance of our 304 handling while we suss out what if anything we need to do atop Symfony's existing support for that.
The original kernel patch had a target of "keep tests passing", so there's definitely some needlessly redundant header() calls lying about still. I think those should all be eliminated entirely.
#14
Bundles patch is in. Hoping someone can get to this soon.
#15
The patch is not what I would expect. The HttpCache is added at the end just before the kernel handle's the request. This is after the complete bootstrap. Shouldn't the cache be utilized in the DRUPAL_BOOTSTRAP_PAGE_CACHE phase. Is this even possible with the Kernel + HttpCache at that early stage?
#16
Another thing we have to think about is the File Store because we can't assume we can use /tmp. How can we retrieve the file store location without getting the configuration from db?
#17
@#15 I guess that full bootstrap is slowly being freed from its fat while services are being ported into the DIC. I hope that in stable 8.x bootstrap will only be a matter of a few millisec.
#18
@#16 I think that this kind of early needed configuration variables should be set in settings.php (pretty much like advanced cache backends configuration).
#19
What would be the proper solution to be able to override HttpCache and Store in for examply your settings.php?
#20
I don't think I can answer that much, I'm, just like you, waiting to see a working patch to know :)
#21
@#17 so should we just aim for removing the DRUPAL_BOOTSTRAP_PAGE_CACHE completely?
@#18 Then settings.php should be changed to have the configuration in it and it should be set during install. Also what to do with the UI where you can set the temporary directory?
@#20 I'll do something similar like the cache_backends:
$class = isset($cache_backends[$bin]) ? $cache_backends[$bin] : $cache_backends['cache'];$cache_objects[$bin] = new $class($bin);
#22
My latest version of the patch. Lets meet up at the sprint.
#23
Forgot to add the HttpCache.
#24
So these are the things I think we need to sort out:
UPDATE: added 11
#25
On 11: D7 stores cache_page (and cache_block) entries as CACHE_TEMPORARY, even with minimum cache lifetime.
If HttpCache assumes a ttl, it sounds like we may be scrapping cache minimum for cache maximum.
#26
Triggering bot
+ return array(+ 'debug' => TRUE
Formatting issue
#27
The last submitted patch, drupal_httpcache.patch, failed testing.
#28
+++ b/core/lib/Drupal/Core/HttpCache.phpundefined@@ -0,0 +1,29 @@
+ $config = config('system.performance');
+ return array(
+ 'debug' => TRUE
Why define $config here if we're not going to do anything with it?
+++ b/index.phpundefined@@ -13,7 +13,9 @@
use Drupal\Core\DrupalKernel;
use Symfony\Component\HttpFoundation\Request;
-
+use Drupal\Core\HttpCache;
+use Symfony\Component\HttpKernel\HttpCache\Store;
Would be nice if we could group all included Symfony libraries and all Drupal libraries, Symfony first.
+++ b/index.phpundefined@@ -30,6 +32,11 @@ drupal_bootstrap(DRUPAL_BOOTSTRAP_CODE);
+if (config('system.performance')->get('cache') && drupal_page_is_cacheable()) {
+ $kernel = new HttpCache($kernel, new Store("/tmp/meuk"));
Another reason not to define $config within Drupal\Core\HttpCache if we're not going to use it. We're loading it again here.
#29
Thanks to everyone who has recently reviewed this patch. Although many points brought up here are valid, I'd like to postpone most of them until we get the basics working with tests passing. For example, I'm purposefully not dealing with alternative stores atm. Just need to get the default working first. Also, being called so late in the bootstrap is a problem, but perhaps not this patch's problem. Let's revisit once it works.
Some changes in #1698108: Update Drupal's dependencies broke this patch pretty badly, so here's a quick reroll which I believe only takes those changes into account. Actual caching of pages seems to work correctly, but the tests that aren't passing are legit. We still need to figure out how and where to handle the logic around "page_cache_invoke_hooks".
#30
The last submitted patch, drupal_httpcache-29.patch, failed testing.
#31
Let's try that again.
#32
#33
The last submitted patch, drupal_httpcache-31.patch, failed testing.
#34
if we intend to completely replace our caching mechanisms with the HttpCache, it might be beneficial to rip that out now and see what breaks. There seems to be a lot of issues involving the caching of user information. It's unclear to me what's the cause. With guidance I can try to reproduce the tests and focus on part of this patch.
#35
Found quite a few problems caused by my subclassing HttpCache from HttpKernel instead of from the framework bundle which the docs show. That said, there's parts of the framework bundle version that we don't want, so I just pulled in the relavant bits into our HttpCache class. Attached is a new patch that fixes some of this.
Unfortunately, I don't think we can fix the bootstrap hook issues with the way things are now. We need to move all of our bootstrap, with maybe DRUPAL_BOOTSTRAP_CONFIGURATION as an exception, inside the kernel. It's very awkward that we're mixing them right now and too much work is already done by the time we even instantiate the kernel.
This would allow us to fix the hook issues and it would also eliminate the performance regression I mentioned earlier, so it's worth doing asap. I'm considering this issue blocked until that is done.
#36
We can eventually replace all our HTML caching with HttpCache (i.e. page/block/drupal_render() in core), but there's plenty else that's not replaceable, and Symfony has plenty of its own caching (just renamed to compiling so it can pretend it's not).
@msonnabaum I'd consider having only one bootstrap path a critical task for Drupal 8 - it's OK if we have a small Drupal bootstrap before the kernel, but it's not really OK to have these two bootstraps mixed in with each other wrapping around.
#37
I agree with catch that we should move it all in, but it turned out that it was a tiny change to get what I need and also not break anything by leaving DRUPAL_BOOTSTRAP_CONFIGURATION out.
However, this has the rest of the bootstrap in the DrupalKernel, not the HttpCache wrapper. We all talked in IRC and were mostly in favor of removing the "normal" cache setting in favor of "aggressive" only. Supporting bootstrap hooks will be messy, so whether we can do it eventually or not, this is the simplest version start with.
#38
Actually, as beejeebus pointed out to me, having the bootstrap in init() doesn't fix anything. It needs to go in boot().
#39
updated patch to add #38.
#40
The last submitted patch, 1597696-39-http-cache.patch, failed testing.
#41
+++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php@@ -83,6 +80,21 @@ class FinishResponseSubscriber implements EventSubscriberInterface {
+ $vary = array('Accept-Encoding');
+ if (!variable_get('omit_vary_cookie', FALSE)) {
+ $vary[] = 'Cookie';
+ }
+ $response->setVary($vary);
On the assumption that this variable_get() will turn into a config object lookup, we should drop a @todo here noting that and that the config object should get injected when we get to that part.
+++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php@@ -83,6 +80,21 @@ class FinishResponseSubscriber implements EventSubscriberInterface {
+ $max_age = (int)config('system.performance')->get('cache.page.max_age');
In fact, that's probably the config object it should be on! :-)
+++ b/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php@@ -83,6 +80,21 @@ class FinishResponseSubscriber implements EventSubscriberInterface {
+ $max_age = !isset($_COOKIE[session_name()]) ? $max_age : 0;
Cookies and sessions should come from the $request. This is where the sessions->symfony issue comes in useful. If we can't do that yet because it's blocked on that patch, at least we should drop a @todo in here.
+++ b/core/lib/Drupal/Core/HttpCache.php@@ -0,0 +1,73 @@
+ \AD::ffs('HttpCache::__construct');
WAT? That could be why this won't install, perhaps?
+++ b/core/lib/Drupal/Core/HttpCache.php@@ -0,0 +1,73 @@
+ public function cacheEnabled() {
+ return config('system.performance')->get('cache.page.enabled');
+ }
Hm. I don't know how we can properly inject that config object. Probably it will have to just be hand-rolled in index.php. This far down, I think I'm OK with that.
-1 days to next Drupal core point release.
#42
updated patch addresses #41.
#43
The last submitted patch, 1597696-42-http-cache.patch, failed testing.
#44
spent some fun time* figuring out why 14k tests fail. the main problem seems to be this flow:
Drupal\Core\HttpCache::cacheEnabled --> config --> Drupal\Core\Config\Config::load --> Drupal\Core\Config\DatabaseStorage::read --> Drupal\Core\Config\DatabaseStorage::getConnection --> Drupal\Core\Database\Database::getConnection --> Drupal\Core\Database\Database::openConnection --> Drupal\Core\Database\Driver\mysql\Connection::__construct --> Drupal\Core\Database\Connection::__construct --> Drupal\Core\Database\Connection::setPrefix
this happens before _drupal_bootstrap_database(), so we haven't had a chance to munge global $databases with the test prefix, so all the db queries hit the parent drupal.
i've fixed that with a total hack of just calling drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE) in Drupal\Core\HttpCache::cacheEnabled(), which should let us see what the next series of fails are.
fixing the way we handle needing the database early in the request is out of scope for this issue.
* that may have been sarcasm.
#45
The last submitted patch, 1597696-44-http-cache.patch, failed testing.
#46
a note for reviewers/testers: as well as turning on the page cache, you need to set a max lifetime.
#47
ok, spent some time actually reading the Symfony HttpCache docs, and in the very helpful #symfony channel.
i don't think the pattern we've started with here is the right way to go, at all.
the symfony docs and the IRC channel make it clear we should only wrap our app kernel *if there's no http cache in front of drupal*.
so i guess we need to work out the best way to implement this pseudo-code for real:
<?php
drupal_bootstrap(DRUPAL_BOOTSTRAP_CONFIGURATION);
if (config('system.performance')->get('use.symfony.http.cache')) {
// We're on a shared host or something, so use a slow, non-scaling http cache.
$kernel = new HttpCache(new DrupalKernel('prod', FALSE), new Store(variable_get('file_public_path', conf_path() . '/files/http_cache')));
}
else {
// We have a real http cache in front of us.
$kernel = new DrupalKernel('prod', FALSE);
}
?>
#48
Hi,
Just my 2 cents after the discussion with beejeebus on IRC to give the same knownledge to others.
Wrapping the kernel in the HttpCache will indeed be bad when you want to use Varnish, as it would continue to handle the ESI in PHP instead of letting Varnish doing it.
However, it is possible to check very easily if the HttpCache is needed. See https://gist.github.com/3551077
I haven't read the full discussion and I haven't checked the latest status of Drupal, so forgive me if the following advice is useless now.
When using the HttpCache, the instantiation of the DrupalKernel should be as lightweight as possible, as it will be done even when the cache is used (and so the drupal code does not need to be called). The heavy initialization should be done only when booting the kernel (which will occur only on cache miss, when the HttpCache needs to forward the request to the DrupalKernel).
If the instantiation of the kernel is heavy, you will have a bigger performance impact when switching between HttpCache and Varnish (as Varnish will also avoid the instantiation when the cache is used of course).
EDIT: here the code so that you don't need to go to the gist, as your comments support syntax highlighting
<?php
use Drupal\Core\DrupalKernel;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpKernel\HttpCache\Esi;
use Symfony\Component\HttpKernel\HttpCache\HttpCache;
use Symfony\Component\HttpKernel\HttpCache\Store;
// Build a drupal kernel
$kernel = new DrupalKernel()
$request = Request::createFromGlobals()
$esi = new Esi()
if (!$esi->hasSurrogateEsiCapability($request)) {
// There is no reverse proxy supporting ESI in front, so add the Symfony HttpCache
// this is not executed when you have Varnish in front and properly configured
$kernel = new HttpCache($kernel, new Store(), $esi);
}
$response = $kernel->handle($request);
?>
#49
The hasSurrogateEsiCapability() function depends on headers from the proxy:
public function hasSurrogateEsiCapability(Request $request)
{
if (null === $value = $request->headers->get('Surrogate-Capability')) {
return false;
}
return false !== strpos($value, 'ESI/1.0');
}
The ESI module already has a client-side JavaScript implementation for ESI, so there's a use-case where ESIs should be delivered, but there are no HTTP headers to indicate this.
I'd prefer to handle it via config instead.
#50
@manarth Could the JS add additional headers instead as part of the downgrade behavior?
#51
If we wanted to use in-browser ESI, we should just use hInclude instead. Symfony already includes support for that in the HttpKernel from the FrameworkBundle, that we're already using.
#52
i've created #1766186: Move test prefix $databases munging earlier in the bootstrap to deal with the test-side global $databases munging.
with that patch, we should be able to call config() right after drupal_bootstrap(DRUPAL_BOOTSTRAP_CONFIGURATION), so i'm going to reroll with that patch included just to keep this moving.
things will shift further as config changes, but at least we can get the 'check config to see if we should wrap' pattern in, and fix the remaining fails.
#53
#48 implies that we don't need a conditional. We just detect headers and we're done with it. What's the use case for doing otherwise? (JS-based pseudo-ESI is not one, because we can either add the headers we need or drop that entirely in favor of hInclude.)
#54
ESI or no ESI, who cares.
'just check http headers' is not going to fly as the sole determinant of 'should we wrap our kernel in HttpCache or not'.
#55
i'm not working on this anymore, sorry if i've held it up.
#56
I completely agree that the "Surrogate-Capability" header is not enough to decide whether or not to use HttpCache. We should use either config or a $conf variable.
One thing that occurred to me while looking at this again however, is that our cache settings don't make much sense now that we've removed CACHE_TEMPORARY. We still have a cache.page.enabled option, yet it does nothing unless you set cache.page.max_age.
I'm thinking we should remove cache.page.enabled and replace it with http.cache.use.internal. If you are using a reverse proxy, all you have to do is set the max_age to something. I started on that but noticed we're checking the existing cache.page variables in some odd places, so I wanted to make sure we had consensus before making that change.
We should also conditionally pass the esi object in \Drupal\Core\HttpCache's constructor since we don't need that running if you don't have the equivalent of block_cache = TRUE. That can probably just be a @todo for now since removing block cache is out of scope for this issue.
#57
Mark: So, you're saying request cache (which for now is synonymous with page cache, but later on won't be) is always-enabled, period, and to effectively disable it you just set the page 0 seconds? I'm OK with that. The default value should probably be something somewhat useful, say 5 min, but I'm easy on that part.
Or do I have it backward?
#58
I'm saying that the max age setting determines whether pages get cached. If it's at 0, drupal sends out the typical "Expires: Sun, 19 Nov 1978 05:00:00 GMT" that you'd get when page cache is off. If it's > 0, we send out the appropriate cache headers.
The only thing you explicitly enable is the internal page cache, which would mean we wrap the kernel with HttpCache.
#59
That sounds fine to me.
#60
I'd be fine with #58, we'll need to update the help text on the performance settings to reflect the change. Also fwiw completely fine with having the default max_age as 5 minutes (probably only in the standard profile).
#61
Related: #1808080: Add an _internal route
#62
Created #1853086: Remove cache.page.enabled in favor of an explicit internal cache setting for the changes I described in #58.
#63
Getting back to this at last...
Here's a new patch, mostly from scratch but with some code copied from Mark's patch above. For those playing along at home there's also a branch in the WSCCI sandbox. Overall, there should be more minus signs than plus signs, which is a good sign.
Changes:
- Removes the old page cache, and the bootstrap phase along with it. Ooo...
- Because there's no more page cache bootstrap phase, and the functions in cache.inc are now tiny, I moved them into bootstrap.inc and eliminated cache.inc. Ooo...
- Wrap the kernel in an HttpCache object, if a setting is set. It defaults to true.
- Sets cache headers that seem to be triggering the cache appropriately. I think. :-) It works in my testing, but could use more.
- Sets the headers conditionally, so if a response object from a controller already has cache-related headers set they don't get overridden. That way individual pages can opt-out of caching, or opt for a longer cache lifetime, or whatever else they feel like doing.
Still todo:
- Lots of testing.
- Figure out what happens to drupal_is_page_cacheable(). That sort of global approach totally won't work anymore. Best suggestion: Eliminate it outright and be done with it. If you want to mess with caching, that's what the response event is for. Fire after FinishResponseSubscriber and do whatever black magic you want.
- We should probably wire clearing the HttpCache page cache into drupal_flush_all_caches(), but I've not done so yet.
- Hook up the ESI support in HttpCache, which I've ignored for the time being.
Why???
- Because we want to have a single cache API all the way through, regardless of where it's getting cached. That cache API is Http. This lets a controller set a response object and headers on it, and they'll be honored regardless of where or how the response is being cached.
- Because we want to eliminate the "anonymous caches forever, authenticated never caches" distinction and move toward "things cache when they should cache, and don't when they shouldn't". This is a step in that direction.
#64
Erm. One thing I forgot to mention. This does include the contents of #1945024: Switch from FrameworkBundle's HttpKernel to rendering strategies, because I expected it would matter. So far it doesn't.
Attached is a patch that contains just the cache changes, or should. :-)
#65
None of this has anything to do with wrapping $kernel with HttpCache.
There are some worthwhile changes here, but it really should be handled outside of trying to use HttpCache. This issue should be about making page cache work with the new Request/Response objects.
#66
+++ b/core/includes/bootstrap.incundefined
@@ -2207,6 +2104,11 @@ function drupal_handle_request($test_only = FALSE) {
$kernel->boot();
drupal_bootstrap(DRUPAL_BOOTSTRAP_CODE);
+ if (settings()->get('use_http_cache', TRUE)) {
+ $store_path = variable_get('file_public_path', conf_path() . '/files/http_cache');
+ $kernel = new HttpCache($kernel, new Store($store_path));
+ }
+
Pretty sure we don't want the kernel booted for cached pages.
#67
The last submitted patch, 1597696-httpcache-2.patch, failed testing.
#68
#69
looking at the patch in #64, my primary comment is along the lines of katbailey's in #66 - if we're going to wrap the main kernel with the cache kernel, it needs to be done a little earlier. i'm not sure about booting/not booting the kernel, but i am sure that we should be slotting in this caching mechanism as early in bootstrap as possible, at least similar to where the old page caching implementation was. that may have implications on where/how we set the storage location var, but that shouldn't be awful -
sys_get_temp_dir()is diiirty, but perhaps adequate for these purposes?beyond that, i agree with #65 - the rest of this issue is about fixing our interaction with Request/Response elsewhere in the code. fixing the failed tests should be a good start to that.