Updated: Comment #71
Problem/Motivation
After #1597696: Consider whether HttpCache offers any significant benefit over the existing page cache has been postponed to 9.x, the page caching mechanism inherited from 7.x needs to be refactored into a container service.
As the current maintainer of Authenticated User Page Caching (Authcache) I have a distinct interest into preparing the Drupal 8 page cache such that it can be easily extended by contrib modules.
Proposed resolution
- Follow the proposition from #2023495-4: Make bootstrap three steps:
Page cache should be a service that works only in index.php and not maintenance mode
- Extract three major components: Storage (record and deliver cached pages), Policy (allow/deny recording/delivery) and Vary / Version control (record/deliver different versions according to properties of the request, i.e. Accept-headers)
The scope of this issue is restricted to the conversion of procedural code into injected container services. It is especially important to maintain execution order, such that the risk of introducing performance regression is minimized. Therefore the delivery of cached pages is attempted immediately after the initialization of the kernel and before drupal_bootstrap(DRUPAL_BOOTSTRAP_CODE) is called in drupal_handle_request.
As a consequence, there will be no page-caching for subrequests/fragments for the moment. Only main-requests delivered through index.php will be cacheable.
If we also succeed in properly splitting up responsibilities like outlined in point 2, this will help with several pending issues. E.g.:
- #2062463: allow page cache cid to be alterable
- Better implementation of #1855260: Page caching broken by accept header-based routing
#40 has a detailed description of the architecture implemented by the patch in #41.
Performance
In order to asses whether the chosen approach is feasible, two different entry points for page delivery from the cache have been benchmarked (see #45): a) Immediately before the kernel and DIC are initialized (pre-kernel) and b) immediately after the kernel and DIC are initialized. The benchmark reveals a performance regression of an order of an order of magnitude (post-kernel) and up to a factor of 5 (pre-kernel). Neither connecting to the storage backend (DB, Redis, Memcache) nor the client-server connection overhead is included in the measurement. When arbitrarily adding 40ms to the whole resultset, the regression drops to around 30% #47.
While post-kernel page caching leads to a performance regression, pre-kernel hurts usability (only configurable through settings.php and no more via performance admin screen). In IRC @catch states that dropping the "Use internal page cache" checkbox on the performance admin page is not an option. Therefore the goal here is to build a system where most components can be reused for both variants.
Security
In order to integrate the toolbar page cache (intended for authenticated users), it is necessary to remove the following hook from session.inc. Removal of this check is compensated by Drupal\Core\PageCache\PolicyRule/NoSessionOpen. This change however needs special attention from a security team member. It would be possible to remove toolbar page caching altogether if this change is unjustifiable. However that would also prevent contrib from using the new API for any other authenticated page caching.
--- a/core/includes/session.inc
+++ b/core/includes/session.inc
@@ -253,9 +253,6 @@ function drupal_session_initialize() {
// anonymous users not use a session cookie unless something is stored in
// $_SESSION. This allows HTTP proxies to cache anonymous pageviews.
drupal_session_start();
- if ($user->isAuthenticated() || !empty($_SESSION)) {
- drupal_page_is_cacheable(FALSE);
- }
}
else {
// Set a session identifier for this request. This is necessary because
Remaining tasks
Decide on the approach- Figure out whether/how this interferes with #1032936: Maintenance mode should neither create nor retrieve cache
- Implement it during SprintWeekend2014
User interface changes
None
API changes
New services / classes:
- page_cache_policy
- Drupal\Core\PageCache\Policy (from factory method
createDefault) - page_cache_veto
- Drupal\Core\PageCache\Veto (from factory method
createDefault) - page_cache_kill_switch
- Drupal\Core\PageCache\VetoRule\KillSwitch (instance of
Drupal\Core\PageCache\VetoRuleInterface, used to prevent that pages are recorded in the cache when e.g. messages are displayed to the user) - page_cache_response_subscriber
- Drupal\Core\PageCache\CacheControlResponseSubscriber (responsible for making sure that appropriate cache-control headers are set on any response)
- page_cache_cid_generator
- Drupal\Core\PageCache\Storage\CidGenerator (responsible for computing the cache key according to a given request object.)
- page_cache_storage
- Drupal\Core\PageCache\Storage\InternalCache (responsible for recording / retrieving cached pages to / from the page cache bin.
- toolbar.page_cache_policy.toolbar_path
- Drupal\toolbar\PageCache\IsToolbarPath (responsible for allowing caching on the
/toolbar/subtreespath, even for authenticated users)
Removed procedural code:
- Remove
DRUPAL_BOOTSTRAP_PAGE_CACHEand_drupal_bootstrap_page_cache - New namespace:
Drupal\Core\PageCache. - Replace
drupal_page_cache_get_cid</li> with CidGenerator::getCid() (service <code>page_cache.cid_generator) - Remove
drupal_page_get_cache(), replaced with\D\C\PageCache\StorageInterface::get - Remove
drupal_page_set_cache(), replaced with\D\C\PageCache\StorageInterface::get - Remove
drupal_page_is_cacheable(), instead use\Drupal::service('page_cache_kill_switch')->trigger();(in order to prevent a page from being recorded to the cache) or add a policy rule to the container and tag it according to your needs (one ofpage_cache_policy.allow,page_cache_policy.denyandpage_cache_veto.rule). - Replace
drupal_serve_page_from_cacheby\D\C\PageCache\CacheControlResponseSubscriber, service <code>page_cache_response_subscriber
| Comment | File | Size | Author |
|---|---|---|---|
| #88 | 2177461-fourth-generation-refactor-page-cache-88.patch | 79.7 KB | znerol |
| #87 | 2177461-fourth-generation-refactor-page-cache-86.patch | 79.95 KB | znerol |
| #86 | 2177461-fourth-generation-refactor-page-cache-85.patch | 79.22 KB | znerol |
| #85 | interdiff.txt | 12.71 KB | znerol |
| #85 | 2177461-fourth-generation-refactor-page-cache-84.patch | 0 bytes | znerol |
Comments
Comment #1
znerol commentedOk, this is a bit nasty to take apart. At least #2168011: Remove all 7.x to 8.x update hooks and disallow updates from the previous major version will kill some of the weirder hacks around page caching.
I think this patch shouldn't do much more than moving around some blocks of code. Until now it moves the following stuff:
core/includes/cache.incis gone. Moving the remaining functionscache()andcache_invalidate_tags()intocore/includes/bootstrap.inc.interface Drupal\Core\PageCache\PageCachePolicyInterfaceclass Drupal\Core\PageCache\DefaultPageCachePolicyInterfaceinterface Drupal\Core\PageCache\InternalPageCacheInterfaceclass Drupal\Core\PageCache\InternalPageCachepage_cache.policy: Drupal\Core\PageCache\DefaultPageCachePolicypage_cache.internal: Drupal\Core\PageCache\InternalPageCachedrupal_page_cache_get_cid(): InternalPageCache::getCid()(protected)drupal_page_get_cache(): InternalPageCache::get()(protected)drupal_page_is_cacheable(): PageCachePolicyInterface::isCacheable()(public)drupal_serve_page_from_cache(): InternalPageCache::populateResponse()(protected)_drupal_bootstrap_page_cache(): InternalPageCacheInterface::deliver()(public)drupal_page_set_cache(): InternalPageCache::set()(protected)drupal_rebuild()entirelyComment #2
znerol commentedActual patch...
Comment #3
znerol commentedComment #5
znerol commentedActually it is necessary to ensure that
database.incgets included somewhere during bootstrap. Because the page cache phase does not exist anymore, lets move that to the variable phase.I still have at least the Text editor administration test failing in spectacular way though.
Comment #6
znerol commentedReroll after #2169267: Replace drupal_cron_run() with a Cron service
Comment #7
znerol commentedInterestingly enough this is only on my machine, the test passes on simplytest.me.
Comment #11
znerol commentedGot around removing most static invocations and instead use injected services. Most notable changes since #6:
page_cache_without_database. After #1764474: Make Cache interface and backends use the DIC has landed, the database connection is only initiated when necessary. Therefore when a cache service is used which does not trigger any database traffic, the db connection will not be initiated in the first place.LanguageNegotiatorstill references this setting, however I do not really understand the code over there. I hope for failing tests in order to get some clues...global $useris populated. #1890878: Add modular authentication system, including Http Basic; deprecate global $user introduced a check for empty$userindrupal_get_user_timezonepathandtitlekeys on the cache object were introduced in order to allow the statistics module to operate for cached pages (#692044: A site with statistics module enabled is much slower in serving cached pages in D7 than in D6). Meanwhile the statistics module switched to Ajax and boot/exit-hooks were removed. Therefore it is save to also get rid of this metadata.Comment #12
dawehnerGreat work so far.
I tried to figure out whether this will let run more code or less code and indeed this does not change the behavior, as we still basically execute after the kernel is ready. This should be better documented in the issue summary to be honest.
Comment #13
znerol commentedComment #14
znerol commentedIssue summary updated, I hope it is clearer now. I'd like to minimize the risk of introducing performance regressions, and therefore I very much would prefer to focus onto the conversion of procedural code into container services. The patch already moves a lot of code around.
Comment #15
znerol commentedReroll after #2167109: Remove Variable subsystem.
Comment #17
znerol commentedScripts should bootstrap to
DRUPAL_BOOTSTRAP_KERNEL, then include legacy API by hand (database.inc). Only care aboutauthorize.phpandstatistics.php, but do not touchupdate.phpbecause most of that crazy stuff will be removed anyway.Comment #18
dawehnerHere is a bit more deep review.
Initial thought:
Just in general it is odd that we need two services to be injected in the finish response subscriber.
The new code is certainly less descriptive.
Is it just me that the configuration should be hidden inside the page_cache.internal service? Maybe we should also find a better name.
This certainly looks wrong.
Is it just me that moving this deprecated methods feels out of scope of this method? Can't we simply load cache.inc a little bit earlier?
What happened with this functionality?
Note: we do have an interface for this now.
Maybe we find a better word for the pageCachePolicy object that this becomes basically obvious.
So what about using PageCacheStorage instead?
The convention is to use "Contains \Drupal..."
Both the name and the documentation should explain what this thing is doing.
maybe we could use something like disableForThisRequest() or something like that?
I wonder whether we need all this public methods if just isCacheable is important at the end..
Nice splitup!
Aren't these two information stored on the request object as well? Request->getMethod() should do that.
So just to be clear, this caches the response of a toolbar not something we consider as "page"?
Mhh it really feels wrong to dynamically exchange the service to be honest, though copying the page cache logic, as it was before, is certainly even worse. ... possible patterns we could use here would be observer (throwing an event on which multiple page cache policies can work on)
Comment #19
znerol commentedRemoved another set of static invocations (especially calls to
drupal_get_http_headeranddrupal_add_http_header).The substitution of the
$boot_headerswith$response->headersinInternalPageCache::populateResponseprobably requires further explanation.drupal_get_http_headerreturns all headers set during a request. When a page was delivered from the cache, the headers added duringhook_bootare returned. Becausehook_bootis gone, we can forget about that. There is still some code around calling the old APIdrupal_add_http_headeron full page builds though. However headers set using the old way already get collected byFinishResponseSubscriber::onRespondand added to the response object. Therefore we can get rid of$boot_headersand use$response->headersinstead.@dawehner: Thanks for the review. The header-stuff took a bit longer than I hoped for. I will address your points in the next iteration.
Comment #22
znerol commentedToolbarController::subtreesJsonpand filed #2184201: ToolbarController::subtreesJsonp has no test coverageComment #23
znerol commentedAddresses #18: 4, 5, 6:
cache.incwhere necessary.drupal_page_is_cacheableis safe, because the risk that a cached version is delivered during the bootstrap does not exist anymore.Comment #25
znerol commentedReroll after #2167039: Regression: page cache tags broken, also: some routes incorrectly use _controller -> No page object in template variables -> Fatal in seven_preprocess_page()
Comment #27
znerol commented25: 2177461-refactor-page-cache-24.patch queued for re-testing.
Comment #28
ianthomas_ukMarking the old function names as deprecated as part of #2183187: [META] Remove or document all obsolete APIs before beta. I've also opened #2184985: Remove cache_invalidate_tags() and cache() to actually remove them, postponed on this.
Doing this I spotted that the use statement in cache.inc was incorrect. I'm surprised this wasn't picked up by tests. Hopefully that's because we are testing the new versions of the functions, rather than the legacy wrappers.
Comment #31
znerol commentedNot really.
Drupal\Core\Cacheis the namespace andDrupal\Core\Cache\Cacheis the class name.Comment #32
ianthomas_ukI clearly need to brush up on my namespaces. This just reverses the change I made to the use statement.
Comment #33
znerol commentedAfter #2168011: Remove all 7.x to 8.x update hooks and disallow updates from the previous major version it is now possible to remove
DRUPAL_BOOTSTRAP_PAGE_CACHEentirely.I'm still exploring ways on how to further improve the design. Especially I'd like to decouple the internal page cache from adding headers for external caches (proxies / browsers). However this might result in larger changes than desirable for this issue.
Comment #34
znerol commentedOk, this is an intermediate update of what I have to this date in the branch with the radical approach. Regrettably I have some tests failing there because for example cache revalidation works a little different in symfony. I'd rather not touch the tests but instead preliminary mimic the behavior of the old implementation if possible.
This attempt has the following key characteristics:
At this point I'd appreciate some comments an the architecture as well as class / interface naming.
Comment #35
znerol commentedThis #34, however there is a remote chance that tests pass here... The interdiff demonstrates the changes necessary to work around the tests exercising legacy-behavior.
Comment #36
znerol commentedComment #38
znerol commentedAlso extract cid generator as a separate service (this will help with #2062463: allow page cache cid to be alterable).
Comment #39
znerol commentedComment #40
znerol commentedNow that tests are passing I'd like to discuss the changes introduced by #39.
Page cache service (event dispatcher) and global policy
The
page_cacheservice is an instance ofPageCacheInterface. When allowed by thepage_cache.policy, it is responsible for emittingPageCacheEventsbefore and after a request is handled by the kernel. Only requests coming in through the CLI or those neither having theGETnor theHEADmethod are rejected by theGlobalPolicy. More restrictive policy rules are then assigned directly to cache backends and filters. For example only allow caching when no session is open implemented viaDefaultPolicy.The
PageCacheInterface::handlemethod largely corresponds to_drupal_bootstrap_page_cachewhilePageCacheInterface::recordmostly incorporates code extracted from theFinishResponseSubscriber.The
PageCacheservice is also responsible for adding default headers (Cache-Control: no-cache, ...andExpires: distant past) when caching is prohibited.When user-specific content is rendered on a page (e.g. messages), page caching can be disabled entirely by issuing
\Drupal::service('page_cache.policy')->isNotCacheable(). This wasdrupal_page_is_cacheable(FALSE)before.Note that
Etagfor non-cacheable requests is gone (as suggested in #1573064: Remove unique ETag hack as it is no longer necessary). If this is not desired, it would be possible to reintroduce it inPageCache::setDefaultHeaders().Internal page caching based on the cache_page bin (event subscriber)
The
CacheStorageclass listens toPageCacheEvents::GETandPageCacheEvents::SETevents. The class largely implements the responsibilities ofdrupal_page_get_cacheanddrupal_page_set_cache. Thepage_cache.storage.cacheuses thepage_cache.policy.cache, an instance ofCacheStoragePolicy. Because this policy derives fromDefaultPolicy, only requests without an open session can be stored. Also the policy rules only evaluate toTRUEwhen internal cache is effectively turned on.In order to #2062463: allow page cache cid to be alterable, the cid-generation is implemented in a separate service, namely
page_cache.cid_generator(CidGenerator:getCidformerly known asdrupal_page_cache_get_cid).Also note that the HTTP status code is now saved along with the cache-entry.
Page compression (event subscriber)
The methods for compressing and conditionally decompressing response content found a new home in
CompressFilterandUncompressFiltercontrolled byCompressFilterPolicyandUncompressFilterPolicyrespectively. Instead of relying on thepage_compressedflag of the cache-data object introduced with #1476810: (followup) drupal_serve_page_from_cache can serve uncompressed data with Content-Encoding gzip header with page_cache_without_database = 1, the standard HTTP headerContent-Encodingis used to specify whether a cache-entry contains compressed content. This allows the separation of the internal cache service from page compression.Cache-Control headers for external caches (event subscriber)
The
ExternalStorageclass is responsible for adding appropriateCache-Control,EtagandVaryheaders, as well as perform cache-revalidation (a.k.a respond with 304 Not Modified) when appropriate. TheExternalStoragePolicyalso inherits fromDefaultPolicyand therefore only allows caching for requests without a session. Additionally caching is only allowed when the site administrator configured amax_age.Custom cache backend implementation for toolbar (event subscriber)
Requests directed to
/toolbar/subtrees/{hash}<?code> are handled by a the module specific <code>ToolbarStoragepage cache backend. When a request is accepted by theToolbarPolicy, this backend adds customCache-ControlandExpireheaders. The policy explicitly does not reject requests from authenticated users and therefore page caching as well as external caching is enabled for all users. As a consequence, the hacky_toolbar_initialize_page_cacheis gone completely.Outlook
Note that due to the modular design, it is now possible to mix and match policies and backends. For example it would be possible to substitute the external backend and issue
Cache-Controlheaders with differentmax_agedepending on the status-code of the response. Also it is easier for contrib modules to provide their own page cache backends and policies.Comment #41
znerol commentedReroll after #2159915: Remove drupal_load()
Comment #42
znerol commentedRegrettably the new architecture has an impact on page delivery performance. I took measurements with Apache Bench and it looks like the OO-architecture makes serving pages from the cache about 3ms slower. By using the APC loader, it is possible to compensate some 5ms.
However what is worrying me more is the performance regression compared to Drupal 7. Especially due to the service container initialization serving pages from the internal cache takes twice as long in Drupal 8. The picture gets even worse when comparing to a tuned Drupal 7 setup with a high performance cache backend (e.g. redis) and
$conf['cache_without_database'] = TRUE;. In this case Drupal 7 performs an order of magnitude better than Drupal 8.I think we should at least provide a way for third party implementations to perform cache delivery before the kernel / service container is initialized.
Comment #43
znerol commentedForgot to add ab-test results.
Comment #44
znerol commentedIn order to complete the picture, I've taken another benchmark. This time with a fake response built and delivered just after bootstrapping to configuration and before the kernel gets initialized.
Comment #45
znerol commentedEdit: 7x-no-db-do-not-test.diff in #46 is the correct/tested patch.
I took another benchmark with this explicit goal in mind: Gather the data necessary to decide whether pre-kernel/container page cache retrieval is worth considering.
Pre-kernel page-caching is harder to implement and also much harder to configure. Backends operating at this level have no access to config, and therefore need to be configured via
settings.php. Also cache-bins are not there at this early stage. Therefore page cache backends would be responsible for setting up connections to storage. Also when delivering pages from the cache, backends would be responsible for adding appropriate headers for external caches (proxies / browsers) to the response. However this problem could be solved by just saving Cache-Control headers along with cached pages. With this method it would be possible to preserve the decoupling of internal storage based backends and external caching.Benchmark method: Measure the time it takes to generate and send a fake-cache-hit-response (see attached patches) under the following conditions:
The goal was to measure raw bootstrapping-performance without any database-overhead. Also in order to minimize connection overhead, HTTP keep-alive was used such that subsequent requests went through the same TCP session. I'm using
abin the following manner:ab -k -c1 -n500 -g /dev/shm/8x-pre-kernel-apc.csv hxxp://localhost:8080/8x-pre-kernel/node > /dev/shm/8x-pre-kernel-apc.logI did run the full test suite once on crappy hardware (Somewhat dated SOHO NAS with slow CPU und spinning disks in LVM RAID-1 configuration) and once on a recent laptop (with SSD).
The results are as follows:
Drupal 7 was released 36 months ago. According to Moore's law CPUs are supposed to have 3 times as many transistors nowadays (and hence should be 3 times more powerful). Lets pretend that this is true and apply it to our results:
8/3=2.6714/3=4.67: D8 post-kernel page delivery compared to D7 is still slower by a factor of 2 through 5 after applying Moore's law as of today.2.5/3=0.8345.0/3.0=1.67: D8 pre-kernel page delivery compared to D7 is faster by a factor of 1.2 or slower by a factor of 1.67, depending on hardware/system configuration. However this is only after applying More's law as of today.The important question is now:
What is more important: Keep the "Use internal page cache" checkbox on the performance admin page or keep the page cache delivery performance of Drupal 7 (taking Moor's law into account).
Comment #46
znerol commentedComment #47
ianthomas_ukI discussed the figures in #42 on irc and pointed out that my real-life D7 site takes around 40-80ms to serve a page from the cache, rather than the 9ms shown in the tests above.
If we assume the network is adding 40ms to the page load time, then moving from 7x DB cache to 8x patch default loader would take the mean load time from 49ms to 63ms. That's still a 14ms regression, but it's now only about 30%.
Max page throughput is likely to be greatly reduced, but if that's a concern then you should probably be using a dedicated cache such as varnish or a CDN rather than Drupal's page cache.
Comment #48
znerol commentedComment #49
znerol commentedAs per #47 and @catch on IRC ("we need to keep the checkbox IMO. This should be enabled by default in the standard profile."), pre-kernel cache delivery is out of the game.
Comment #50
znerol commentedComment #51
berdirDidn't we use to have basically both with the old no_database thing, that you could set in settings.php, before it connected to the database to load variables? Should be easy enough to keep that concept with a setting that wins when set otherwise fall back to a config?
Comment #52
berdirOh BTW, the weird test fails that you had locally have been fixed in #2182077: Editor manager in $form_state causes test failures on PHP 5.4
Comment #53
znerol commentedComment #54
znerol commentedComment #55
znerol commentedRe #51: I actually experimented with something like
$settings['page_cache_pre_kernel'] = 'Drupal\Driver\PageCache\YourClassHere'. Like this it would be possible to provide a location where incoming page requests can be intercepted by third party code before the kernel and the DIC gets initialized. However this might spill over in the proposed architecture in quite ugly ways. Because the DIC is not available, it would be necessary to somehow initialize the cache-policy in advance and reinject it into the DIC on cache miss and pass. The other possibility would be to simplify the proposed architecture and lump everything including the cache-policy back into one class and try to make it inheritable, such that third-party implementors may reuse core logic by subclassing. I only want to go down that path if it is inevitable...Comment #56
znerol commentedI've been busy rewriting the patch again during the last week. This time I'm trying to come up with an architecture which fits both use cases, i.e. pre-kernel and post-kernel (container based) cache retrieval. It is still a bit rough around the edges. Also I didn't have the chance to test the pre-kernel stuff yet.
I plan to implement a simplified storage backend based on redis or memcache in order to gather some hard numbers on the new pre-kernel stuff. Not sure whether I have much time to work on that during the next week though.
Comment #57
znerol commentedThis is another work-in-progress patch, adding a pre-kernel storage factory for database backed page cache. I will now concentrate on another performance analysis and after that on cleaning up the patch.
When reviewing please concentrate your attention on the architecture - aka the big picture.
Comment #58
znerol commentedReroll after #2209583: Remove use of drupal_set_title()/drupal_get_title()/_system_path from page cache, also separate pre-kernel and post-kernel page cache factories (
FastPageCacheFactoryvsPageCacheFactory).Comment #59
wim leersI came here via #2209583: Remove use of drupal_set_title()/drupal_get_title()/_system_path from page cache — didn't know about this at all!
Great work so far, and thanks for your excellent issue summary and comments — that really helps when digesting a patch as big as this one! Also, big kudos for taking this on in the first place, and finally: thanks for cutting away so much cruft!
General feedback
(Partial) code review
"before kernel" vs. "after container"
s/container/kernel/? :)
s/PageCacheHelper/PageCache/ would make the code cleaner.
This comment is now probably outdated.
Just an FYI: splitting these sorts of changes into separate issues would make this patch a bit smaller and easier to review.
'a "Vary: Cookie" header should be"
An override for
setCacheable(), yet disabling caching in ancient browsers. Why?max-ageshould be limited to 1 year in case it's greater than one year, according to Google PageSpeed's docs, which say that's according to the RFC.It's known that (enormously) huge values confuse browsers, so better to be safe than sorry.
s/Entity tag/ETag/
This is not the interface, but a base class.
Apparently also
Vary.I like this separation between HTTP 1.1 and 1.0. Very clear, very explicit. This really helps with long-term maintenance :)
s/does/do/
Additional comments here explaining which things are configurable through config and which are configurable through settings, would be most welcome here.
Im missing docs here that contain a justification to explain why
Symfony\Component\HttpFoundation\HeaderBagis insufficient. That also has aaddCacheControlDirective()method, plus siblings.We always have a newline after the opening of a class/interface and a newline before its closing.
Please fix that here and elsewhere.
s/Expire/Expires/
s/A policy/A page cache policy/
?
Docs don't match.
"set" vs. "list" vs. "compound"
Let's make this consistent.
Does order matter? -> "list"
Does order not matter? -> "set"
"storage selector" vs. "cache control selector"
Confusing.
Why "cache pass"? It's always been called "cache hit".
I appreciate the elegance here.
Maybe rename
InternalCacheConfigtoRequireInternalCacheConfigfor consistency?Instead of returning a cache ID, you could also opt to return cache keys.
Ideally, you'd leverage
Drupal\Core\CacheableInterfacein this patch.Comment #60
znerol commented@berdir told me that someone tested that once during Drupalcon Prague. He said that performance breaks down once the compiled container surpasses apc.max_file_size (or the equivalent limit of any other opcode cache in use).
I can see two strategies to reduce the number of services (and remove some complexity) introduced in this patch:
HeaderBagI actually have some unit tests lying around in the branch for the second generation patch. However I discovered that unit-testing event-driven code (lacking proper interfaces) is no fun at all, and that's one of the reasons that motivated me to rewrite the patch again... I will sure be able to recycle some test-code.
This shouldn't be too much hassle with the current architecture. There is
PageCache\StorageInterfaceandpage_cache.internal_storageservice which can be overridden. In fact I already have a minimal working Redis-implementation.Thanks for your review, I'll save the rest of it for the next train-ride.
Comment #61
wim leersI don't see why that can't happen? :)
Excellent!
Interesting. Would you say that's a valid alternative to circumvent the problems of "before vs. after kernel initialization"?
Comment #62
znerol commentedRe #59 (Partial code review)
1. "After kernel" is a bit ambiguous: The entry point really is after the kernel was construct AND the container is initialized. Instead of describing the entry point in the method name, it might be better to simply use generic terms: deliverPageFromFastCache, deliverPageFromCache.
2. I would like to save the term
PageCachefor the proper service.PageCacheHelperis really a utility class responsible for keeping track of the fast-page-cache instance and such. The service itself inherits fromPageCacheInterfacewhich is completely unrelated toPageCacheHelper.3. In fact the comment still applies. Further down we loop over all cache bins and flush them.
4. Sure. Filed #2212411: Code style cleanup of language system and #2212427: Use interface instead of class name for language manager where appropriate.
5. Right
6. Ok, let's just copy over the docs from the method definition.
7. The highest value one could choose from the UI is 24 hours (86400 seconds). I'd rather limit the number in one of the factories instead of in the
PublicExternalimplementation. I added a sentence to the constructor parameter documentation for now.8. Right. I think our ETag-stuff is broken anyway. I rather would use a hash instead of the time stamp. According to the RFC 2616 (13.3.3 Weak and Strong Validators), ETag based on the timestamp with one-second resolution should be marked as weak-validator. We are claiming a strong-validator here. However I'd rather dig into that in a follow-up (or reuse #1573064: Remove unique ETag hack as it is no longer necessary). A change here will require a lot of research.
9.-13. Done
14. The justification is that Drupal 7 does it like this... I will try and remove the cache-control stuff in favor of Symfony HeaderBag later.
15.-20. Fixed
21. I borrowed this term from Varnish: pass = bypass (not miss, not hit).
22. Great. I was trying to come up with common pairs (Allow/Deny, Accept/Reject, etc.) but that did not really work. Let's use Require and Deny as class-name prefix.
23. The cid (cache-id) and Cache-Tags are two very different concepts. The only input we can use to generate the cid is the request object. Nothing else is known at the point in time when pages should be delivered from the cache. Therefore the
CacheableInterfacedoes not seem to be a good fit for the page cache.Comment #63
znerol commentedThis is a performance analysis of the third-generation-patch. Unlike in #40 the measurements were conducted with real cache backends (mysql database and redis). The rest of the setup was the same (hardware).
The goals for this analysis were:
DIC regression
I found that moving the page caching stuff from procedural
_drupal_bootstrap_page_cacheinto a couple of services results in a performance regression of 6-8% (on crappy hardware). On modern hardware, the new DIC-approach varied between 10% faster (APC not enabled) or 25% slower (APC enabled). The results for modern hardware seem a bit bogus, I probably need to redo this part of the test.Pre-Kernel page caching vs 7x page cache without database
Pre-Kernel page caching is still slow compared to the page-cache without database in 7x. On crappy hardware the regression is between 38%-56% (with better results when APC is on). On modern hardware the numbers are even worse: 56%-79% (also with better results when APC is on).
Before (7x) it was possible to boost page caching up to 6 times (crappy HW, no APC) when switching from DB page caching to Redis. With the third-generation patch an improvement of 2-3.5 is possible when moving from the post-kernel DB cache to pre-kernel Redis.
Comment #64
znerol commentedThis is the fourth generation. I'd like to highlight the following changes:
Cache-Control-stuff before cache-storage and thus simplify pre-kernel cache-delivery a lot\Drupal\Core\PageCache\Policyas the only thing required to be shared between pre-kernel and post-kernel page-cache implementations.ResponseHeaderBagcache-control stuff as much as possibleThis is more or less a complete rewrite - the interdiff is actually bigger than the diff...
This is still wip, I hope to deliver more documentation / performance numbers later.
Comment #66
dawehnerI am sorry that you actually use matlab :P Impressive work in general.
What are your long-term goals here would both code paths merge one day, for example in a stacked http kernel? I guess we always need a pre-kernel page cache for performance?
tbh it is a bit odd that we kind of require the page cache to be part of our main tool chain. I wonder whether we should at least still deliver/send the result in this method.
Do you plan to change that?
These once got coverted to vetos?
Have you tried to lazy-load the service references. Would that speed up the performance? We could for example introduce service proxies to achieve this goal.
+1 for this DX!
Ensure that all classes use "Contains ..."
Is this really a base class?
Do we really need a singleton here?
Comment #67
znerol commentedThanks for the review dawehner.
Actually its GNU octave, that doesn't make the language any better though. Measurements are in milliseconds, I will try to remember adding a label when performing the analysis the next time. Data series for Drupal 8 (without the patch) are labeled "8x DB" and "8x DB APC" on the graph in #63. The class loader and the service container are indeed the worst offenders performance-wise. The APC-series actually were conducted with
$settings['class_loader'] = 'apc';andDrupal\Component\PhpStorage\FileReadOnlyStoragefor the service container.settings.php. In order to make it easy to set up, we always wil need post-kernel (i.e. post-container) cache retrieval. I do not yet have an opinion on whether wrapping the kernel is a desirable goal and I'm not clear yet which consequences this would have on the proposed implementation.$response->prepare($request)(at least that's what Symfony HTTPCache::handle suggests). It would be possible to leave the$response->send()</li> in <code>drupal_handle_request.session.inc. The cache exclusion inutility.incis not necessary anymore because with the new architecture the page cache is only active when the request is handled usingdrupal_handle_request.PageCacheHelperlike Wim suggested before, its only little left of it anyway.Note that the test fails from #64 also revealed #2218463: Private file and image style downloads throw 500 error when page cache is active. Setting to needs review to trigger a test run, still needs work...
Comment #68
Crell commentedCode review only:
We don't have this added protection on any other tagged services. No need to do it here.
Ibid.
Wait, what? this patch has a big OO framework to handle cache policy and then tosses in a global singleton? Does not compute.
The response class already does most if not all of this in the prepare() method.
The name of the class says explicitly that the class is doing too much.
Comment #69
Crell commentedznerol: I appreciate the work you're putting into this issue. Really. But I can't help but think that this is all horribly over-engineered.
1) The baseline for page caching in 2014 is a reverse proxy. We should assume that anyone "serious" is using a reverse proxy (Varnish, nginx, etc.), and our PHP-space page caching is simply for those people who don't have one (but should).
2) That means we *should not* offer more functionality and flexibility than can be represented by HTTP itself, because that's all the proxy will see.
3) That means, effectively, writing a PHP reverse proxy.
4) We already have a PHP reverse proxy in core. It's called HttpCache, and is linked from the issue summary. :-)
5) That issue was bumped to Drupal 9 in June of 2013 on the grounds that Simpletest and the installer prevented us from using a wrapped Kernel, which is what HttpCache is. However, the work you're doing here is all pre-kernel... which means it could just as easily wrap the kernel. Meanwhile, Larowlan has made huge progress on #2016629: Refactor bootstrap to better utilize the kernel. (larwolan++!) It may be we can actually pull that off after all.
6) All of that points to saying that we should revisit HttpCache, as it is *existing code that does what we need* that we're already even shipping, and it would be less work to use than writing our own mini-framework. In fact, the Response object already tracks nearly all of the data we need, and as noted above does the necessary header mangling for various response statuses already. That in turn means that all we need are a few response listeners to set the headers needed, and maybe a service or two to act as an API to the rest of the system, maybe. (Although for partial page caching purposes we really want to make caching automatic when it is practical rather than relying on module developers to get it right.)
That is... I am really not at all convinced this is close to the right approach for us to take.
Comment #70
znerol commentedThanks for the review @crell.
Re #68.1: Actually I copied that over from somewhere. I do not care whether this code is in or not, #66 is pro #68 is con so its 1:1
Re #68.3: Noted. We can get rid of that and replace it with a synthetic service. Reasons for my original choice are stated in #67.9.
Re #68.4:
Response::preparealone does not cut it, we need to useResponse::isNotModified(), like stated in the @todo. The reason because this legacy code is in there is that I do not intend to break existing tests during a refactoring and apparentlyResponse::isNotModified()produces slightly different results than the legacy code.Re #68.5: Noted.
Re #69.1: Agree absolutely. That's why I did put great effort into strictly separating the internal page cache from adding cache-control headers to the response.
Re #69.2-#69.5: I follow. The problem with any pre-kernel page cache (and consequently wrapped kernel) is that we loose the ability to configure it through the GUI. @catch stated in IRC that this is not acceptable. I'm open to revisit this precondition.
Re #69.6: I'm not opposed to that.
Comment #71
znerol commentedAnother iteration... This time I've explored ways to wrap the kernel (to be precise the
\Drupal\Core\DrupalKerneland notDrupal\Core\HttpKernel). This made it possible to get rid of the uglyPageCacheHelperand its static methods with funny names. It is possible to swap out the implementation via thepage_cachesetting. I still need to work on the factory, it looks a bit ugly at the moment.Rebuilding the page cache as a kernel wrapper also revealed that
RequestCloseSubscriberis currently broken: #2220687: Convert system_run_automated_cron into a subscriber for KernelEvents::TERMINATE.Also I was able to get rid of the policy-singleton by splitting it into policy-service and veto-service. The policy is applied before attempting to retrieve a response from the cache while veto is applied before a response is recorded in the cache. Veto must be respected by all possible page level cache implementations (internal, external). For example it can be used to prevent that a page with messages on it is stored. However, because veto only needs to be present at a late stage in the request it can live in the container. This makes it possible that any cache-backend or request-subscriber can use its own policy and therefore it is not necessary anymore to inject the policy via singleton pattern or synthetic service into the container.
Comment #72
Crell commentedThere's no reason for this to be static. A for-reals object should be the default. If you want a static single-method factory, don't bother with a separate class for that.
That said, holy crap you managed to wrap another HttpKernelInterface around DrupalKernel! :-) Please humor me... Can you do that with HttpCache? It's *already there*, so if we can use it directly we should. (With all the usual caveats of "you really should use varnish instead of all of this crap.")
Comment #73
znerol commented@Crell: My understanding is that the
HttpCacheis supposed to wrap theHttpKernel. In contrast the patch in #71 wrapsDrupalKernel. It is possible that I'm completely wrong about that, but I think that the more interesting things implemented byHttpCache(like e.g. ESI) only work properly when all requests (including sub-requests) are passing through it (?). #71 does not work that way, only the main request is passed through the cache, it does not see any sub-requests.Comment #74
znerol commentedUpdate issue summary reflecting #71
Comment #75
znerol commentedComment #76
ParisLiakos commentedThe architecture looks straight forward to me. thanks for the great issue summary @znerol
Tbh, i think i agree with Crell here, but i havent check HttpCache much yet, so i wont comment on that yet
Here is a code review. There are docs missing in several places but i show the @todos so i didnt bother
should be {@inheritdoc}
you probably forgot to assign the passed variable to the property
It is not clear to me as to why the reference is needed here?
per our coding standards it should be
isCliRequestComment #77
znerol commentedThanks for the review. Committed 1,2,4 to my local repository, will be part of the next reroll.
Re #76.3: This is actually not a reference assignment but a bitwise-and operation (
$x &= $yis equivalent with$x = $x & $y). It could be rewritten into a longer form, if this is too confusing (e.g.if (!$y) { $x = FALSE }).Comment #78
ParisLiakos commentedoh, thanks, sorry, i am blind, totally misread this
Comment #79
Crell commentedHttpCache can wrap anything that implements HttpKernelInterface; that includes HttpKernel, Symfony's KernelInterface, or DrupalKernel. That's the point.
HttpKernel subrequests never leave HttpKernel, so the wrapper (or lack thereof) doesn't matter. ESI-esque requests, from the POV of the DrupalKernel, are the same regardless of whether they come from a wrapping kernel or from Varnish/nginx. Remember, the goal is that midrange to high-level sites will be using a real reverse proxy anyway, so using only what HTTP supports; we just need a fallback for sites that don't have that option available.
Comment #80
znerol commentedChasing HEAD, also contains suggestions by #76.
Comment #81
wim leersAwesome work! This is significantly simpler already, which was the major concern for several people in earlier iterations AFAICT.
I think the policy vs. veto concept is almost sufficiently clear. Almost, becasue maybe the naming isn't: AFAICT policies are for requests and vetoes are for responses? If this is the case, this should be made much more explicit.
Then there is
PageCachevs.FastPageCache. The differences there also aren't clear enough. AFAICTFastPageCacheis for the pre-kernel case, andPageCachefor the post-kernel case?This response subscriber contains "default Drupal assumptions". Maybe we should rename it to
DefaultCacheControlResponseScruber?Agreed! I didn't even know we supported this right now.
Can we have parentheses here to remove any ambiguity? :)
Hurray! :) Maybe refer to the RFC, to explain why there is this seemingly arbitrary limit?
The docs for these are identical, though they serve very different purposes. The differences should be more explicit.
What are "lean requests"?
This implies a 304 will only be sent if both an
If-Modified-SinceandIf-None-Matchare sent. That seems wrong.It feels weird/wrong to have this (useful) helper hardcoded in here. I think a
DefaultPolicy extends Policymight be cleaner?Can you explain the value of having the page cache ID generator in a separate interface and class? Why not make it part of the thing that uses it?
This is the number of hours in a year, not the number of seconds :)
But this was apparently already the case in the old code!
(Apologies if the review is off, I'm still wrecked from Drupal Dev Days Szeged.)
Comment #82
znerol commentedThank you for the review Wim.
OK, renaming
PolicytoRequestPolicyandVetotoResponsePolicy. AlsoCacheControlResponseSubscribertoDefaultCacheControlResponseSubscriber. Interdiff-renames.txt is produced withgit diff -M25%in order to keep it reviewable.Comment #83
znerol commentedComment #84
znerol commentedAddressing rest of the review from #81:
\Symfony\Component\HttpFoundation\Response::isNotModified()only in a followup because that will involve changing existing tests also.DefaultRequestPolicy<code> and <code>DefaultResponsePolicy.Comment #85
znerol commentedComment #86
znerol commentedReroll after #2228341: Objectify session management functions + remove session.inc.
Comment #87
znerol commentedReroll after #2161591: Change default active config from file storage to DB storage
Comment #88
znerol commentedReroll after #2208475: Move Settings into Drupal\Core\Site\Settings
Comment #89
znerol commentedI'm tired of constantly rerolling this, let's split this up: #2257695: [meta] Modernize the page cache
Comment #90
mgiffordComment #91
znerol commentedNext step in the meta issue is #2348679: Move the remaining procedural page cache code to the page cache stack middleware
Comment #92
alimac commentedMinor tag cleanup - please ignore.
Comment #93
mgiffordLots of progress has been made on this meta issue #2257695: [meta] Modernize the page cache - should this one still be marked postponed?
Comment #94
znerol commentedYes, the other issue brought in everything.
Comment #95
mile23related: #2549041: Remove deprecated _drupal_set_preferred_header_name()
Ugh. Added to the wrong issue. But still related. :-)