The problem
After #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers) we are still left with a very monolithic API for clearing caches on content changes.
cache_clear_all() is hard coded to the page and block caches.
Node, user, taxonomy and many other modules call cache_clear_all() after any form submission - even though the entity in question may not even be shown on any of the pages or blocks that were cached. This can result in a very low cache hit rate especially on sites with lots of content getting posted. Many sites use varnish for page caching now, however that still leaves a choice of either very short cache TTL, or using varnish commands to flush the cache in the same way as cache_clear_all() does. The same problem exists even if you're doing ESI as well.
Proposal
Introduce support for 'cache tags', adding support is likely to come in three parts:
- Low level implementation: cache()->set($cid, $data, $tags); and cache()->deleteTags($tags). This is the current focus of patches to this issue.
- Add #tags support to drupal_render() caching - so that cached HTML can inherit the tags from child elements. Make it easy for blocks and the page cache to populate this.
- Replace calls to cache_clear_all() with cache()->deleteTags($tags). Eventually remove cache_clear_all() in favour of this mechanism. (contrib/custom modules can still clear bins.)
What's the difference between cache tags and unique IDs
And ID is a unique identifier of a cache record, it might be completely fixed, or dynamically generated.
When dynamic cache IDs refer to rendered content or other dynamic structures, they are usually based on context. This can be described by the DRUPAL_CACHE_* constants - per-user, per-role, per-page, per-language etc. The context the item is generated in determines the output, but different versions are served according to different contexts - a context change (granting a role to current user or switching language) will rarely effect existing cached content - it will just cause different version to be served to the user. Each cache item has a unique ID, even if it might be a dynamically generated variant.
Cache tags are rather a way to describe the items that make up a cache item. Entities are the obvious example - a node might be cached in various view modes, as part of a cached listing of node titles in a block etc. The tag allows us to associate those blobs of HTML with the node itself, so that if the node is edited, we can invalidate those blobs (but not the thousands of other cached HTML fragments that have nothing to do with it).
Each cache item can have multiple tags - for example a cached page might contain 5 nodes, 10 taxonomy terms and 3 users.
Other systems/standards
This issue began with a late night conversation between catch and sdboyer, but has since evolved. It took a while for it to materialize into 'cache tags', but when it did, I immediately found other systems with the same concept.
However, review of those system indicated the support was not very mature, and I've ruled out actually incorporating either the exact APIs or implementations here, a quick summary follows though:
Zend Cache supports cache tags as part of it's base cache class. However many implementations such as their memcache backend don't support it, instead throwing exceptions. Additionally it contains some IMO not very useful API features like "delete all content not tagged like this".
Kohana also has tags as part of their API. But they also have extremely limited support in their implementations, and also have API functions which are not very useful, like allowing for finding content with a particular tag. Again - some implementations do not and cannot support their API, making it unsuitable for Drupal.
There is a fork of memcached (the library, not the PHP extension or Drupal module) that attempts to implement tag support - http://code.google.com/p/memcached-tag/ I have not seen much mention of this though and it looks mostly inactive at this point.
More promising is Varnish's support for regexp invalidation of cache items. See http://www.gossamer-threads.com/lists/varnish/misc/19732#19732 and http://pooteeweet.org/blog/0/1979#m1979 for discussion of this (thanks Moshe). Assuming the API given above, a varnish implementation would need to set headers containing the cache tags, then also clear varnish via regexp on cache()->clearTags(). So if we can get this right in PHP/MySQL/Memcache, it should be 99% there for Varnish as well.
Remaining tasks
We need to agree on the changes to the cache interface, and come up with a scalable implementation in MySQL.
catch's experience of working on wildcard clearing in the Drupal Memcache project suggests it should be possible to support this in memcache as well.
We need to discuss whether to try to replace prefix cache clears with this system as well, there are likely arguments for and against that.
We should be able to quickly use this to replace CACHE_TEMPORARY and cache_clear_all() once the API is available, see #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support.
Supporting this in drupal_render() is yet to be tackled.
Do we want to automate the drupal_render() support in entity rendering too so that blocks and pages get it for free?
As well as invalidating blocks and pages when items contained in them change, we also need to consider the case of a block listing items that needs to change when a new one is added. This would mainly be a convention for tags, such as array('node_posted_to_og' => array($og_ids)) for the block. Organic groups would then need to call cache()->deleteTags() when items are added too.
Our cache API is very much tied to bins. We may need to add a wrapper that clears any bin using tags, or nominate a couple of bins (cache_rendered?cache_content?) to use the automatic drupal_render() etc. handling - and any other bins you can use the API yourself but need to do the extra legwork.
Original report:
Our current methods of cache invalidation are a huge sledgehammer - we essentially let any old function clear any arbitrary cache as much as it likes, and if you need something a bit different, like system_cron() not invalidating your entire page and block cache every time it gets run, you're more or less out of luck unless you work around it in a custom cache.inc.
So, I think we need a new API, have been thinking about this for a while, but didn't really crystallize until the past week or so.
It consists primarily of two parts:
function hook_cache_info() {
$caches = array();
$caches['page'] = array(
'invalidate' => array(
'entity' => array(
'node',
'comment',
),
'callback' => 'system_page_cache_clear'.
),
);
return $caches;
}
function node_save($node) {
// blah blah
cache_invalidate('entity', 'node', 'insert', $nid);
// blah blah
cache_invalidate('entity', 'node', 'update', $nid);
}
function system_page_cache_clear($type, $type_type, $op, $extra) {
// Clear the whole page cache whenever this is called. But there's a
// hook_cache_info_alter() which would allow sites to only clear on delete
// or only on insert, or every cron run, depending on requirements.
So - modules control their own caches via a registry hook and cache clearing callback, which can be altered if you need to. Modules inform the cache clearing API about what they're doing - similar to how field_attach_*() works ("I'm doing this now, here's what I'm doing it with.", then the module controlling the cache decides what to do based on that information.
Also pasting an irc conversation with sdboyer which cleared some of this up:
<catch> sdboyer_: so I want a hook_caches() - which allows modules to expose which caches they maintain, and on what conditions they need to be cleared.
<sdboyer_> catch: i think that'd get too complicated too quickly if the clearing logic is centralized
<catch> sdboyer: so system_caches() says I've got 'page' - I need this cleared when any CRUD operation happens on content.
<catch> sdboyer: node and comment, when doing CRUD operations, do cache_invalidate('entity', $entity, $op, $id);
<catch> sdboyer: system_caches() includes a callback key for actually clearing the cache.
<catch> sdboyer: API isn't quite there yet, but basically we let modules control their own caches and ask for information about when they want to clear.
<catch> sdboyer: other modules don't clear caches directly, they do a field_attach_*() style thing of saying "this is what I'm doing here"
<catch> sdboyer: and then hook_caches() is alterable - so you can make the page cache only invalidate if content was deleted, or whatever.
<sdboyer> catch: so i like cache_invalidate('entity' ...), but that concern about centralized clearing logic still stands. i think i'd rather a system where entities, or other stuff that's often cached, notifies when its been changed and modules can react via a hook, rather than a selectively-triggered callback
<catch> sdboyer: the problem with hooks is you can't disable them.
<catch> sdboyer: so system_cron() does cache_clear_all() every cron run.
<catch> You can't stop that from having an effect unless you write a custom caching backend.
<sdboyer> catch: why would you want to disable cache invalidation?
<catch> sdboyer: because some sites need to clear the page cache every five minutes with no exceptions, some only when content is added, some only if it's taken down.
<sdboyer> catch: right yeah i'm talking about somethin diffrent.
<catch> sdboyer: so cache_invalidate(..... you can't disable.
<sdboyer> catch: cache_invalidate() is an entity's way of saying, "i've been changed - things that cache me, react as you will."
<catch> sdboyer: yes.
<sdboyer> catch: i think we're focused at two different problems, then
<sdboyer> catch: i guess the key point is: [11:05:19] <catch> sdboyer: API isn't quite there yet, but basically we let modules control their own caches and ask for information about when they want to clear. <-- what's the point of asking for information about when they want to clear if they're controlling their own caches?
<catch> sdboyer: well it's that if controlling their own caches means hook_cache_clear($type, $type, $id, $foo, $bar) - and you can never stop a hook being executed.
<catch> sdboyer: and when to clear some caches is a site specific decision - like the page cache.
<sdboyer> catch: ahh i think i see where we're talking about different things there - when i say "controlling their own caches" i don't mean "they decide and it's not configurable"
<sdboyer> catch: i'm just talkin about looser coupling. which might not actually even be necessary, because i don't know the mechanics of how a cache_clear_all() call actually works right now
<catch> sdboyer: in default cache.inc it's almost a delete query builder.
<sdboyer> catch: eew
<catch> sdboyer: so hook -> cache_clear_all() is what we can currently do with hook_node_insert() or whatever, but that's why I want this extra layer of cache clearing callbacks you can swap out somehow.
<catch> sdboyer: because by the time you get to cache_clear_all() your custom cache.inc doesn't know what the hell's going on anyway.
<sdboyer> catch: i think i see where you're going, then - the cache_invalidate() is the big new addition, by adding a layer of indirection that allows caches to decide whether or not to flush, rather than the cache_clear_all() steamroller approach
<catch> sdboyer: but yeah whether cache_invalidate() just invokes hooks, or some kind of setting, or triggers callbacks you can alter out is a bit messy at the moment.
<sdboyer> catch: yeah, that i'd fully support. i think the crucial addition for custom caching backends, or contrib-based caching, would be a very simple thing in cache_invalidate() that triggers registerable notification callbacks with not-too-much granularity
<sdboyer> catch: i think the best approach is to set some basic conditions that determine whether or not a callback is triggered ("is it an entity?" "is it a node?"), but punt the more complicated logic to the callback
<catch> sdboyer: so $page['clear callback']['page_clear_cache'] no?
<sdboyer> catch: that was what i was originally saying about being concerned that a centralized clearing system would quickly buckle under its own weight of custom clearing rules
<catch> sdboyer: OK that could work - trigger the callback at a certain point and just pass arguments in.
<sdboyer> catch: exactly
<sdboyer> catch: yup, that'd basically be it. could almost just be a hook_menu()-style registry thing
<catch> sdboyer: OK now I get it. So cache_invalidate() stays as it is.
<sdboyer> cache_invalidate?
<sdboyer> cmon druplicon!
<sdboyer> ahh wait, nm, it doesn't exist yet :)
<catch> sdboyer: the mythical cache_invalidate which doesn't exist yet stays how it was 20 minutes ago.
<sdboyer> :P
<sdboyer> catch: sounds like it, yeah. then the page cache can register something that says its callback should be fired if entity,node are both true about the invalidated item
<sdboyer> catch: and THEN inside its callback can do something like reading TTL options configured via the UI and decide how to proceed
<catch> so hook_caches() { return array('page' => array('type' => array('entity' => array('node', comment'), 'module'), 'clear' => 'page_clear_cache'); or something bizarre like that.
<catch> With less shit syntax.
<sdboyer> catch: yup, that basic approach seems sound to me
<catch> sdboyer: nice, I'll paste this into an issue if you don't mind.
<sdboyer> catch: absolutely :)
| Comment | File | Size | Author |
|---|---|---|---|
| #198 | drupal-636454-197.patch | 2.04 KB | tim.plunkett |
| #195 | drupal-636454-195.patch | 2.05 KB | tim.plunkett |
| #191 | cache_tags_8.patch | 16.91 KB | berdir |
| #189 | cache_tags_7.patch | 16.91 KB | berdir |
| #187 | cache_tags_6.patch | 16.9 KB | berdir |
Comments
Comment #1
sdboyer commentedsubscribing, and noting that the conversation above is flipped - lines starting with 'catch:' are from me, lines starting with 'sdboyer:' are from catch.
edit: this is now fixed
Comment #2
catchFixed the lack of code tags so it should be back the right way 'round now.
Comment #3
catchI'm moving this back to Drupal 7 for discussion because:
No existing APIs would be changed.
There's a reasonably limited number of places in core which would need to be converted.
Nothing in here would stop people doing cache_clear_all() directly, so it won't really affect modules already ported except they wouldn't be using D7 best practices.
if someone moves this back to D8 I won't mind, just running out of performance ideas at the moment.
Comment #5
andypostSubscribe, this approach looks very promising! If D7 going more entity-related so this could go. Right now this could be done in contrib like clearing cached views on some events.
Comment #6
catchJust a note this wouldn't only be for entities - we'd need things like cache_invalidate('css');, cache_invalidate('menu') and cache_invalidate('modules'); - although some of those are only used internally within certain subsystems - so whether we let them keep using cache_clear_all() or implement and then use this API I don't know yet.
Comment #7
merlinofchaos commentedThis is a very interesting approach, and I think I like it. One interesting addendum to this approach is that hook_cache_info could be used to create schema for cache tables, so that modules wouldn't have to handle that. Right now that's a trainwreck waiting to happen if the cache table schemas ever change again.
Comment #8
moshe weitzman commentedSubscribe. Lets release D8 right after this gets in :)
Comment #9
moshe weitzman commentedI think we want the ability to fire multiple callbacks for an event like: cache_invalidate('entity', 'node', 'insert', $nid). Perhaps a model like FAPI submit handlers is what we want here.
Comment #10
andypostI think better use arrays
Comment #11
catch@andypost: the problem is that node module has no idea about what's caching node content or where. Nodes are cached in blocks, pages, views cache, panels cache. And caches aren't always in bins, for example css and js aggregation. However node module can make the assumption that it's getting cached somewhere, so it can inform the system of what's going on, instead of the current situation of blindly calling cache_clear_all() for hardcoded tables and having other modules operate on hook_node_*() (or often having to add submit handlers to forms for things with less of an APi than nodes, for example if you wanted to cache based on user roles or permissions).
@moshe:
That seems sensible, we could change the example in the first post from:
to
Then
But probably we won't figure out the actual structure until we try to apply it.
Comment #12
catchBetter title.
Comment #13
EvanDonovan commentedSubscribing. Looks very cool.
Comment #14
andypostThere's a great discussion - Drupal Cache Roadmap for D8 http://groups.drupal.org/node/75823
Comment #15
catchThat discussion looks useful, however (fortunately) it's pretty separate from this one, which is only talking about invalidation.
Comment #16
slantview commentedI just found this, and the two conversations are separate, but any decisions made here absolutely affect the D8 cache API roadmap. Let's pull these two conversations together and i'll update the wiki page to reflect the hooks and cache API features we're talking about adding.
This naming sounds a bit cryptic currently, my only suggestion would be to spend some time to make sure we have clear naming conventions.
+1
Comment #17
dries commentedComment #18
sunsubscribing
Comment #19
sunNote that we badly need a quick fix for D7, without an API change. I've opened a separate issue and submitted a stupid-simple proposal in a patch: #918538: Decouple cache tags from cache bins
Comment #20
dave reidSubscribing. Would also find this very useful in several contribs. Maybe we can experiment in contrib for D7?
Comment #21
sdboyer commented+1 to experimenting in contrib. I was actually noodling around with some ideas the other day, thinking about a contrib module for them...
Comment #22
catchI have a start on page cache expiration changes in http://drupal.org/project/performance_hacks. It's ugly as hell at the moment (have to replace node_form_submit() entirely), but is the sort of thing that'd benefit from this change, if someone starts a contrib project for the API please ping me.
Comment #23
catchDowngrading all D8 criticals to major per http://drupal.org/node/45111
Comment #24
carlos8f commentedsubscribing
Comment #25
Anonymous (not verified) commentedsubscribe.
Comment #26
mikeytown2 commentedsubscribe as I'm always hacking around drupal's cache clear all in my various modules. I've been quite successful at it I might add.
Comment #27
pillarsdotnet commentedComment #28
Anonymous (not verified) commentedComment #29
moshe weitzman commentedD8 is open for business. Anyone willing to flesh this out a bit more? If so, assign this issue to yourself.
Comment #30
catchI've been thinking about this, but not had a chance to write it up, this is still extremely rough but has moved on a bit since the last rough idea (and changed a couple of times in the writing of this post).
I think our archetype for complex cache invalidation should be entities - since they show up in so many different kinds of cache (entitycache, blocks, page cache, views etc.) - and most things that we cache that don't include entities are usually low level and maintain their own caches relatively well (variables, locale, system info etc.). That doesn't mean we should build something that's specific to entities + fields, but we should aim to have a system that covers most use cases for those, in the assumption it will work more or less well for other things.
For example, say I want to cache the teaser view of a node - this is something I'd like us to at least support doing in core, rendering entities is one of the most expensive things that happens during page building - currently performance hacks has some support but it is a big hack. We cache rendered entities in the full page cache, blocks, views etc. but apart from views content cache all of these are very 'dumb' caches that have no knowledge of what they're caching, and have to be cleared very aggressively or allowed to get stale.
At the moment, the best you can do rendering cached nodes via the cache API is:
Now if the node is updated, I can clear the cache entry for that node with a wildcard clear:
But say I change the formatter for the image in the teaser view mode, I can't do:
since there is no caching engine that would be able to handle a wildcard prefix (MySQL could run the query, but it's very slow). This leaves us with the option of just clearing everything on that kind of change, or allowing things to be stale for some amount of time.
All of the cache key past the first part is just namespacing, it avoids collisions but it's useless for invalidation.
Similarly, say I delete a taxonomy term. This may be included in the rendered output of one node, no nodes or one million nodes. In this case I have three choices:
1. Clear 'everything' - cache_clear_all()
2. Don't clear anything.
3. Do an entityFieldQuery, find everything that references that taxonomy term, clear any caches I know about, but I might miss a block that renders the node that I don't know about.
I don't really like any of these much, the original idea in the opening post here was to make #3 easier, but it is going to end up messy I think, even if a bit better than what we do now (blow everything away or leave everything stale).
So one thing we could do, is send some metadata in when we do a cache_set(), then have modules decide whether that data is relevant to caching or not, for example:
$realms = array(
'entity' => array($node),
'view_mode' => 'teaser',
// Any key might be valid here potentially.
);
cache_set($cid, $rendered_node, $bin, $realms);
Then we invoke hook_entity_realm_alter() - this would let you add or remove realms that are passed in.
Then, for each realm, we invoke hook_cache_realm($realm, $value); and hook_cache_realm_REALM($value);
This hook would return an array of triggers (not trigger module triggers but I can't think of a better name) - a trigger is a string that is linked to the cache key - when we see the trigger, we should clear that cache item.
Entity module might do this:
And it might also want to store the view mode as a trigger, since we'll want to clear the view mode across all entities when its settings are changed:
Reference could do the same - it would add a trigger for each entity referenced by the entity being saved (or alternatively it could implement hook_cache_realm_alter() and add each of the referenced entities as a realm instead). So a node that references five taxonomy terms, would get a 'trigger' for each term - entity:taxonomy_term:$tid.
Then in the caching backend we need to store the triggers in relation to the cache id. I haven't given this bit a lot of thought, and the implementation may well be sucky, but for arguments sake:
In MySQL we'd have a {cache_trigger} table with:
cid | trigger
DrupalDatabaseCache::set() would now need to insert into the cache table, then do a multi-insert into the cache_property table.
Wildcard clears are now removed completely, and instead you get:
cache_clear_realm($realms);
So, say I clear a taxonomy term, I can then do:
cache_clear_realm(array('entity' => $taxonomy_term));
cache_clear_realm() also invokes hook_cache_realm() and hook_cache_realm_REALM() - and with the same information, builds up the same list of triggers.
Then the caching backend takes the triggers, and does something like:
DELETE FROM {cache_foo} WHERE cid IN (SELECT cid FROM {cache_trigger} WHERE trigger IN (:triggers));(there'd need to be some cron garbage collection which would do DELETE FROM cache_trigger WHERE cid NOT IN (SELECT cid FROM {cache_foo}) if we did this).
This would then remove any cache entry which has registered entity:taxonomy_term:$tid as its context - could include all cached view modes for the term, as well as all cached view modes of all entities referencing that term.
For MongoDB, the triggers could be stored in the same document as the cache item, you'd then have an index on the trigger key of the document, and do a simple delete on all records that have that value.
For memcache, the current version of the memcache module has wildcard support that keeps a count of each wildcard and compares this when items are fetched. Instead of this, the triggers could be stored with the cache item, and then you could keep a count per trigger.
Where this starts to get a bit trickier:
You have a block that caches the latest five items in a taxonomy term, this should update when:
* one of those five nodes is updated.
* a new item is added to the taxonomy term.
* the taxonomy term itself is updated.
It's easy to get the five nodes, and add each node as a realm.
It's easy to add the taxonomy term itself as a realm.
What is a bit tricky, but there might be a simple enough answer, is to trigger a cache clear of the block, when a new node is added to the realm. We might need a trigger for specific to lists of stuff for this.
Advantages of a system like this:
1. We'd be able to do a lot better than cache_clear_all() in the vast majority of cases, much better cache hit rates.
2. Realm and triggers could be customized by contrib and custom modules via alter hooks (both adding and removing).
3. We'd only add this overhead to items that are currently cleared by wildcards or blown away. If there's no realm, there's no overhead.
Disadvantages:
1. For MySQL, it's an extra multi-insert on set, and a big DELETE FROM on clear.
2. For Memcache there would be a runtime cost fetching the counts for the triggers - but we already have this for wildcards.
3. There is the potential for modules to be either over or under vigilant with realms and triggers - same as things go mad with cache_clear_all() or update the node table without using node_save() now.
4. We've exchanged $cid + $bin, for $cid, $bin, $realm and $triggers.
However, $realm could easily be swapped out for the $context object from butler - so you'd have cache_set($cid, $bin, $context);
Potentially, we could automatically generated cache keys based on context - so cache_set($context, $bin); - this somewhat relates to edge side include keys too (where you'd need to be able to build a context object from a string).
Should that all come together, the only addition here would end up being the hooks and the concept of cache clear triggers.
Comment #31
moshe weitzman commentedThis is a hard problem for sure. I do think a new table like {cache_trigger} is the only way to track exactly cache items and events which invalidate them.
I recall that mysql is (was?) really inefficient with subqueries like
DELETE FROM {cache_foo} WHERE cid IN (SELECT cid FROM {cache_trigger} WHERE trigger IN (:triggers));. Needs investigating.Also, I think that nearly every cache item is going to record triggers which will extremely rarely ever activate. Stuff like language and view mode. We still have to multi-insert these over and over again. I guess I should not sweat it but it is a bit of a bummer for slower cache backends like DB.
I can't think of a better name than trigger right now. Another way to think of these are 'the list of potential changes which would cause a given cache item to become invalid and thus get deleted'. i kinda like 'causes of death' but thats a bit too cheerful :)
Comment #32
catchMoshe pointed me to http://framework.zend.com/manual/en/zend.cache.html, which is what Symfony2 uses and I also found http://framework.zend.com/issues/browse/ZF-4253
The latter discussion is interesting since they are having the same problem with 'tags' that Drupal's memcache integration had with wildcards (I say 'had' because while the current code in the memcache project is complex, it's mostly there now).
Looking at that Zend documentation (I did not review the code there yet).
- what they call tags is exactly the same as what I'm calling triggers here.
- in Drupal we would definitely need a mechanism above triggers/tags to allow contrib modules to add their own (field modules dealing with some kind of reference is the obvious example). However we might want to support adding tags directly as well as via context.
While tags is as bad a word as triggers for confusion, I could definitely live with using that.
edit: also found http://code.google.com/p/memcached-tag/
Comment #33
catchLooking at Zend a bit more, I wanted to see how they implement tags with sqlite (the closest they appear to have to our MySQL backend) - http://open.thusa.net/tabs/browser/TABS/trunk/web/library/Zend/Cache/Bac...
edit: that was from four years ago, the latest version is at http://framework.zend.com/code/filedetails.php?repname=Zend+Framework&pa... - however there don't appear to have been many changes in four years.
There's lots of problems there in the implementation (DISTINCT query for no obvious reason, loading all records into memory regardless of result set size, no multi-insert for tags, use of @ all over the place) but the basic concept is very similar.
Comment #34
sunThe proposal is in the right direction, but limiting this whole thing to be a workaround to improve our cache invalidation doesn't make sense to me. If you remove the word "cache" from the last follow-ups, then we are at the actual and essential point: We need a system that is capable of tracking what is being involved where.
Caching and cache invalidation is just one use-case for this information. The very same information is required (and badly missing) for countless of other situations and operations. Be it module updates that need to figure out what is affected by a certain change and needs to be updated, or be it administrative lookups/audits that need to investigate what is possibly going to be affected by a configuration change. All of this (and more) requires the same kind of internal and reliable data source.
Therefore, it would be a huge gain for Drupal at glance if we'd slightly shift the focus and try to solve the what is being involved where problem, which apparently is the heart and primary challenge of the cache invalidation issue. If that info would exist, proper cache invalidation would be a lot easier.
For starters, we already have a simple/suboptimal involvement tracker with the file usage API.
Comment #35
catchWhile this issue does touch on some others that we have elsewhere, there are some parts of it which are unique to caching - for example the fact that cache entries may not exist at all (hence the attempt here to avoid the "query every possible item that could be affected and clear caches for it" pattern), and that we need the caching backend to be able to control the tagging situation for best performance.
If we're able to generalize some aspects of this to other systems, that's great, but I consider this a "stop the bleeding" issue. Someone mentioned last year that our page cache disappears whenever someone breathes, this is the motivation behind this issue.
Comment #36
andypostAgreed with @sun that we need to focus on involved bricks of system. If the butler approach with context will be realized by this way we get context + actual render (or usage of cached item)
Cache tags has a long story but the main trouble with memcache that tags periodically are stale from cache because of nature of memcache (LRU)
Another way I'd like to point into issue #1120928: Move "history" table into separate History module
Core already have node-history table so we can interpolate this approach to cache:
- extend history support for core entities
- run cache clean depending on historical usage of entity within context of block, view and whatever...
Comment #37
catchSummary of Zend Cache research so far:
- it uses the concept of 'tags' the same way that triggers are used here.
- there is no layer around tags to allow them to be modified or added to (probably because it's a framework rather than a framlication)
- the sqlite implementation is the only one (apart from Zend Platform, but that is EOL) that supports tags.
- there is a patch for memcache, but it is currently in the same state as the original wildcard support for http://drupal.org/project/memcache (store all tags in one huge array and hope there aren't too many)
- the sqlite implementation is loading every possible cid into memory via a DISTINCT query then deleting one by one by cid, this is not an option for us.
I also looked at Kohana's cache framework. They have have much the same concept of tags as Symfony.
- The SQLite implementation uses DELETE FROM cache WHERE tags LIKE '%foo%'; - this is just as unacceptable as loading every possible tag into memory since prefixed wildcards cannot use indexes.
- The Memcache tag support is there, but relies on http://code.google.com/p/memcached-tag/ - a fork of memcache.
Neither of these have support for Redis or MongoDB as cache backends, which are potentially (although I've not used either of them in practice as caching backends, and not used Redis at all yet) the best of both worlds to support this sort of thing.
Based on this, I'm reassured that two of the more highly regarded PHP frameworks have more or less the same concept baked into their caching APIs as I'm proposing here. But I'm a bit disappointed the implementations aren't very robust at all so there's unlikely to be much we can steal. CakePHP's caching API doesn't support tags at all.
If people know of better examples, that'd be great, but having looked at these I'm tempted to get some draft code in place for this sooner rather than later. While I was originally thinking the context object would be good to pass in (and it still might), it may be more useful to pass in a render array - this would allow tags to be added a similar way to cid_parts/#attached are now (particularly being able to suck up #attached to the top level would be great here).
Comment #38
catch@andypost:
"Cache tags has a long story but the main trouble with memcache that tags periodically are stale from cache because of nature of memcache (LRU)". We already came up against this in memcache wildsupport, see #1085626: Wildcard entries may go missing, leading to caches not being invalidated properly which fixes this.
The history feature is about when things were viewed by users, it is not whether something appeared somewhere or not, and even less about whether it was cached in that context.
I agree Butler can be useful here, but we can't postpone every core patch on Butler, and it's only potentially useful for a small part of this - there is a lot of work to do here that has nothing to do with butler.
Comment #39
catch@moshe - with the subselect I think you're thinking of #961604: dbtng drivers are not able to have their own condition classes. I'm assuming we can rely on that being fixed in Drupal 8 (and 7).
Comment #40
carlos8f commentedI still have to do some catch-up reading on this issue, but I wanted to mention this in light of @sun's "what is being involved where" comment.
The FlexiCache project provides something like this. You can associate users, nodes, comments, etc. with cache entries via a metadata database. Then clear the cache via those entity IDs, rather than cache IDs or "general wipes". This system sits on top of the traditional cache API as a wrapper, so it's an API addition rather than a total shift.
The main thing holding this back is that the current implementation depends on MongoDB as a metadata backend. That makes it easy to create indexed arrays and query with arrays. I have FlexiCache running on a fairly large site right now, and it has improved our cache hit rate (particularly for block cache) by a *huge* amount.
Comment #41
carlos8f commentedI read #30 now and it basically describes FlexiCache. The key idea is that wildcards are very clumsy to query on, and need to be phased out in favor of a more structured (and query-able) array of metadata.
In catch's example,
cache_clear_realm(array('entity' => $taxonomy_term));would be equivalent to the Flexicache syntax:flexicache_clear_all(array('terms' => $tid)). Instead of "realms" I called them "properties" (although I tend to think of them as tags).FlexiCache has been running in production for 4 months on a social network, and although the garbage collection aspect could be improved a bit (there are 400k+ metadata objects), the concept is working in practice, our cache lifetimes are much better, and I have confidence that this would be a really good core improvement. The only challenge is making this work in SQL without subselect chaos raining down on everyone :)
Comment #42
lars toomre commented@catch: I think this is a great topic and am looking forward to sample code for review.
Comment #43
catch@carlos8f, thanks for the link to flexicache, I hadn't seen this project before and it's good there's already something similar around. So far I've convinced myself that tag handling should be handled by the caching backend (i.e. no separate backends for tag database and cache, although it would of course be possible to build a cache backend that does that internally). If you get a chance to read up on this, or see me in irc, would be great to discuss that more.
One thing my examples above don't cover at all is tracking tag flushes against particular bins - if we clear by tag, we don't want to have to specify the bin, but we also don't want to be clearing by tag for every bin, even if that kind of tag has never been used on a bin (i.e. taxonomy term tags shouldn't be running against cache_bootstrap). I'm not sure what the answer to that is yet, although again memcache module has some logic dealing with this for wildcard flushes that we might be able to adapt even to the SQL backend - the array('terms' => $tids) syntax could help with this (namespace the tag flushes.
One the subselect #961604: dbtng drivers are not able to have their own condition classes is a lot less far on than I thought (I changed the title to reflect this), however I'd still hope we could solve that issue parallel to this and take advantage of the optimization there.
There's also an argument that if you have this many cache items, you should not be using the SQL caching backend (similar to search module), but it depends how bad the queries get with x number of items/tags.
Comment #44
Crell commentedcatch pointed me here for reference. The tags/triggers idea sounds logical to me, and the fact that the other major frameworks haven't been able to do any better means we're probably on the right track. :-) By and large I agree, at least in concept, with what catch is describing. I've not looked at any implementation yet.
As far as butler/context/WSCCI, I agree that should not be a blocker here. Rather, the cache system itself should be able to handle "tags", and then we should, later, try to figure out a good way to automate what the right tags are to use based on context data. For blocks, one thing I've discussed with David Strauss and Sam Boyer is having blocks declare the context keys that they will need. Deriving that information dynamically will be impossible. So if a given block declares that it depends on a node and the language, then you can reliably cache it based on nid and language, and use nid and language as Varnish cache keys, etc. Then we could cache blocks using what catch describes automagically, using that same meta information.
Perhaps we could expand that concept? That is, are there other "cachable thingies" that we could conceivably declare up front vary based on context keys X, Y, and Z? If so, then we could automate their caching based on those keys. Views queries, for instance, I once discussed with Earl as being something we could conceivably cache if we can reliably cache "plugins", or some subset of plugins. A views query object varies based on the arguments fed to it. If there are no arguments, great. If there are, then it varies based just on that. (OK, that's an over-simplification, but you hopefully get the idea.)
If we could define "thing thingie is cachable", then we could declare up front "and these are the keys that matter", and then automatically cache based on that data with no further user involvement. Then we just need to make sure that lots of thingies meet whatever criteria that is.
Comment #45
podaroksubscribe
Comment #46
catchComment #47
catchThere is no direct dependency, but I would like to get this clean-up of cache_clear_all() in (or at least ready) before adding new features to the cache API - #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers).
Comment #48
jherencia commentedSubscribing...
Comment #49
catchUpdating metadata for the current plan.
Comment #50
catchThis is a feature so it should be a feature request.
Assigning to me to write an issue summary, I'll also get some proper pseudo code in here.
Comment #51
moshe weitzman commentedSome interesting Varnish-related ideas at http://pooteeweet.org/blog/0/1979#m1979
Comment #52
EvanDonovan commented@moshe: Very interesting. So in a Drupal context, that would entail things like adding an html header that showed which nids were in a given View, so that the Views-generated page could be invalidated if any of those nodes had been updated?
I don't know if there's anything core could do to support something like that though - seems like it would be best in a contrib module.
Comment #53
catchI've been thinking the tags would look something like this as passed to cache_set().
So that'd be easy enough to convert to a string like:
The API for setting a cache item with tags just looks something like: cache($bin)->set($cid, $data, $tags);
If it's in the db or another backend you have something mapping cids to tags. If you're using the varnish backend then cache()->set() would normally be no-op, but could perhaps set the header with that string instead.
For automatically pulling up tags from views etc. for the page, we could use a similar mechanism to #attached #cache which we already have. Renderable arrays can have cache tags attached to them, and these get pulled up to the highest level (either full page level cache or a block, or an ESI block).
#81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers) is RTBC, I'm planning to shift attention to an initial patch here once that's in.
Comment #54
mikeytown2 commentedIn boost I use the boost_cache_relationships table to keep track of what nodes show up on the pages. Works well, but is a little DB intensive as a page with 10 nodes on it, creates 10 entries in the database; boost_schema(). This is the reason you see people running with Boosted Varnish. Its the same setup we are using at my work; we run don't even use the .html files boost creates unless we have a "bad event" (happens about 2x a year).
Comment #55
catchHere's a very rough start.
It adds a $tags argument to cache()->set()
And it adds cache()->deleteTags($tags)
I made a rough stab at what the database implementation might look like but there's no schema to test it with and I haven't verified the relative efficiency or not of those queries yet.
One thing that needs sorting out here: we currently hard code the concept of clearing on content events to the cache_page and cache_block bins. When using tags this actually has some advantages - in that we don't want to be running delete queries against every cache bin, that might not use tags at all. This may encourage doing something like #1194136: Re-organise core cache bins - we probably want to continue to reserve one or two bins for rendered HTML strings and allow modules to assume their stuff will end up there unless they're rolling their own cache clearing implementations too.
I want to get the SQL implementation firmed up so it's possible to do some basic benchmarks with it.
The other half of this is the collection of cache tags from things we'll be caching (i.e. in drupal_render() at least to start with). Not yet sure if we want to do that in the initial patch or not since that's a very separate layer.
Also I'm still thinking of ways to invalidate lists of content - i.e. not when an item is update or deleted from it, but when an item is added to it. this is likely to be "node added to term ID foo" or "node added to organic group foo", maybe we can do something like node_insert_og_group : 1 and node_insert_taxonomy_term: 3 for things like that.
@moshe thanks for the link to the varnish discussion, some of the conceptual issues (how to collect references to things that should cause cache items to clear, adding a hook system so they can clear caches, how to deal with additions to lists) are exactly the same problems that are being dealt with here. I'm pretty sure the actual varnish integration with this will just be a small add-on though once we've got the rest of the infrastucture in place.
Comment #55.0
catchUpdated issue summary.
Comment #56
catchAdded schema for cache_tags and cache_page_tags along with tests which now pass.
dbtng only allows this table design to work with a subquery as opposed to a join (have a feeling sqlite doesn't support joins in DELETE statements).
Next I generated cache items with this (drop it in drupal root as tags.php and run drush scr tags.php).
<?php
$n = 0;
$values = array();
$values = array_fill(0, 10000, 1);
$namespaces = array('foo', 'bar', 'baz', 'banana');
while ($n < 200000) {
cache()->set($n, $n, CACHE_PERMANENT, array($namespaces[array_rand($namespaces)] => array_rand($values, 3)));
$n++;
}
This puts 200k items into {cache} and 600k into {cache_tags}, with 4 unique namespaces and 10k unique values.
Then I checked how many matches there'd be on a particular tag:
this gave me 20 items to delete, subsquent was between about 15-40.
Then I ran:
So that's 0.8s with the subquery.
JOIN is instant though:
Comment #57
catchForgot patch.
Comment #58
catchchx suggested this syntax in irc, it's better than the straight subquery but the join is considerably better still.
Comment #59
catchTalked to nnewton about this in irc. He pointed me to:
http://bugs.mysql.com/bug.php?id=9021
That means the subquery will be fast in MySQL 5.something (apparently there will be no MySQL 6.0), but may nor may not get backported to a stable 5.x release.
This leaves switching on the driver and using db_query() with the JOIN if it's MySQL, but the db_delete() for everything else. This would present the opportunity to write a big comment about how much MySQL sucks linking to that bug report, but I'm not sure anyone will let me do that. However I'm probably going to write a patch for it to show the issue.
Otherwise there are not really any options (pulling into PHP is bad, LIKE '%foo%' queries on a string will likely be worse than the subquery).
Comment #60
catchAlright. Added support to dbtng to determine if a driver supports JOIN with INSERT/DELETE/UPDATE after talking to chx in irc.
This leaves some quite ugly code in cache.inc to handle the MySQL bug still - but it is limited only to the specific implementation of database caching that core ships with.
This makes the DELETE a fully indexed query in MySQL, with 600k rows in cache_tags it is not even measurable by the MySQL cli so should scale fine.
Uploading the latest patch, more to do here but let's see what the bot thinks and it'd be cool to get feedback on the general API at this level too.
One thing I'm not sure about is the order of expire vs. tags in cache()->set(), I put $tags at the end for now since there is no bc break that way and I dunno which is going to be used more yet.
Comment #61
chx commentedI support this. If your driver cant deal with an unstructured fugly DELETE statement then... well do nothing as the code defaults to no DML JOIN anyways :)
protected $joinDMLSupport = FALSE;Comment #62
Anonymous (not verified) commentedi'm happy with where the ugly is in this patch.
no code consuming the API should ever have to care that we got ugly to make their site fast, because it's an implementation inside a method on an interface.
we can safely gut this later without any BC changes at all.
Comment #63
Anonymous (not verified) commentedadded an updated patch, only change is to truncate the cache tags table associated with a bin on flush.
Comment #64
catchAlso adding deletion of tags when ->delete() or ->deleteMultiple() is called. This is likely to break dozens of tests since we don't have the $bin/$bin . _tags schema everywhere yet.
I really hate the current cache bin creation method, have been planning to move that schema definition into the DrupalDatabaseCache class for a while, this might be the time to do it...
Comment #66
berdirDon't we have to override this method too in DrupalFakeCache? Will probably not be called, but might still be a good idea...
What happens if we call flush() (or deleteTags()) on a cache bin that does not have a tags table defined? Should we check with a tableExists()?
Or alternatively, could we even only have a single cache_tags table which would have an additional 'bin' column? No idea how much slower that would be...
EDIT: Crosspost with catch :)
Comment #67
catchDrupalFakeCache isn't really fake - it deletes whatever it can, so we might not need to override but I'll check.
I'd be really concerned about the performance implications of a since cache_tags bin - we originally split the cache tables (back in 2006 or earlier) to avoid contention on a single table, and tags are many to one vs. cache items. So if this part of the API got used a lot it could be a massive, massive database table with a lot of i/o.
This patch at least simplifies the schema code a bit, moves it out of system_schema() and into DrupalDatabaseCache::schema() - which makes creating a bin a bit of an easier task.
I'd rather move towards #1167144: Make cache backends responsible for their own storage so that if you don't use the database cache you don't have the redundant database tables at all - we'll need the schema in the DrupalDatabaseCache class for that, but creating the bins themselves would no longer be done by modules at all, so this is a bit of half way towards that. I might try to move this hunk into that patch and get that one updated though.
Comment #69
catch#67: cache_tags.patch queued for re-testing.
Comment #71
catchFixed installation and a few other bits. The schema code is making me said so going to switch gears to #1167144: Make cache backends responsible for their own storage.
Comment #73
catchWithout the debug.
Comment #75
pillarsdotnet commented#73: cache_tags_4.patch queued for re-testing.
Comment #77
berdir#73: cache_tags_4.patch queued for re-testing.
Comment #78
catchI've hijacked #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support for the integration with content caching, since that's likely to not be a small job, and it's (by design) going to happen completely outside cache.inc except for the eventual removal of cache_clear_all().
Since tests pass I think this is at the point where it needs actual reviews - we'll be adding a method that's not called anywhere except tests yet, however that already exists for isEmpty().
Comment #79
miro_dietikerLooks very promising to me. I think that's the right direction.
Although you clearly showed the disadvantages, we hopefully will have much less clear-everything cases in future. This will allow much better long-running performance. It's just very important to eliminate all those clear-all cases in regular code to make the cache perform better for future.
EDIT: If a caching system will support tagging natively... Will we be able to cover that with best performance? :-)
Comment #80
catchI'm not aware of any dedicated caching systems that support tagging natively except for this experimental (and dormant) fork of Memcached which doesn't look like it's getting much use.
However, MongoDB would definitely support this (you could store tags as part of the cache item document, add an index on the array, and delete using that index).
I haven't played much with Redis yet but that also ought to be able to handle this well from what I know of it.
Moshe linked to lsmith's post about doing a similar thing with varnish (store tags with the content, then invalidate via a regexp).
Since we'll let the caching plugin handle both the way any tags are stored, and also how they're invalidated, it'll be up to the plugin author to figure out the best way to handle that with the storage engine.
Comment #81
owen barton commentedI haven't figured out the exact mechanism, but this could probably help improve the cache efficiency of #152901: Caching (per user) for authenticated users quite a bit.
Comment #82
carlos8f commentedI'm starting to work on FlexiCache 2.x (http://drupal.org/project/flexicache) which will store the cache data in Mongo, have memcache-like version flushing for tags and wildcards, and no longer require a core patch (I hope). I need to look at the SQL patches here and take ideas from that, and possibly have an approach that will work for both SQL and noSQL.
Comment #83
xjmUntagging.
Comment #84
moshe weitzman commentedLooking good.
Comment #85
catch@moshe - that's in #1167144: Make cache backends responsible for their own storage. I re-used some of the stuff from there here, but did not want to combine both patches.
Cache tag value length - how long is a uuid?
Comment #86
andypostA UUID is a 16-byte (128-bit) number. But most database storage engines can manipulate with UUID by COLUMN definition.
I think each cache definition(_info) could define some limits (native_UUID = TRUE, key_limit = 64, chunk size = 1M) also I lke to remember about cacherouter module - each router have it's own key generation override
Comment #87
carlos8f commented#73: cache_tags_4.patch queued for re-testing.
Comment #88
carlos8f commentedQuick re-roll after #1275684: Image module creates a cache bin then never uses it
Comment #89
carlos8f commentedFor the sake of experimentation, I'm introducing a couple of alternate implementations.
The concept is to do simple invalidation of tags instead of synchronously deleting all the data on a cache clear. This shifts some weight to cache get/set, since the get needs to check against the current "version" of a tag, and the set may need to get/attach the current "version" to the data record. However, I think we still win in the end.
Having run a Mongo-based cache tags system in production for most of 2011, cache clears have been the biggest bottleneck, as I was using a "select all from these tags" -> "delete all that stuff" combination which frequently ends up in unhappy cursors and timed out requests once there is a good deal of tagging going on. Shifting to a lightweight invalidation scheme I think is the way to avoid that. It will also make cache tags very do-able in simple key-value stores, Mongo, Redis, whatever :)
RE: the two approaches, I prefer the counter-based approach because it avoids weirdness when time sync is slightly off between servers. The timestamp approach has faster cache_set() though, since it doesn't need to query for the latest tag checksum.
Of course, benchmarks should be done. I will try to get some up for baseline vs. catch's, baseline vs. each approach here, and breakdowns of performance loss or gain for each cache operation. The benchmarks need to operate on a large dataset to be realistic; I may be able to export my production cache tags from Mongo to play with.
Comment #90
carlos8f commentedOops, fixed cache-install invalidate() method name.
Comment #91
miro_dietikercarlos8f if the decision is not clear, why not having both strategies available but configurable?
I'm not sure if we'll find the perfect decision here that matches best for all cases.. Are you sure we will?
Comment #92
carlos8f commentedThe DB schema in my patch is very different from catch's, that would make switching between the two difficult. Plus it would be a lot of baggage to carry around multiple implementations. I am just trying to play devil's advocate and see if there's interest in other approaches.
Comment #93
carlos8f commentedI ran some benchmarks:
I ran this 3 times for each patch, and took the middle result for each stat.
(numbers expressed in milliseconds)
Although the approaches are only semi-comparable, we can see some of the theoreticals expressed in numbers here. Subquery/JOIN-based tag clearing is clearly much more expensive than simple wildcard clearing. Curiously, the subquery was always faster than the JOIN on my system. There is also significant overhead in multi-inserting tags for each cid, whereas my method keeps that low by storing them in a serialized field and having one row per tag.
The overhead in querying/computing the checksum is actually lower than I thought, so I'm going to discontinue the timestamp-based approach in favor of the checksum. The performance of checksum-based tags is only *slightly* worse than no tags at all, so it looks very promising if we can work out garbage collection.
It's also worth noting that ideally tags can be cleared across multiple bins. My patch stores tags separately from cid's or bins, making that trivial. I need to tweak the patch to implement that though, since the cache class is so coupled with the bin currently.
I have made some fixes and tweaks to catch's patch and mine to do the tests, so I am posting that also. The main bug in catch's patch was that it only cleaned up tag records in non-JOIN mode (probably skewing benchmarks in the JOIN favor), and the null cache needed fixing.
Comment #94
carlos8f commentedFollowed up on the cross-bin clearing, this version abstracts cache tag stuff into its own class.* That allows cache tag storage to be extendable and still able to expire across multiple bins.
Cleared out some of the cruft from previous patches, and expanded on the tests to include multi-bin expiration. No longer includes #1167144: Make cache backends responsible for their own storage.
* needs docs :)
Comment #95
carlos8f commentedI've started a contrib project to follow this issue: Cache Tags
I also have a working Mongo cache backend with tag support committed there, and it's fast! Benchmark using #93 method:
set() - 0.068ms
get() - 0.16ms
expire_tagged() single - 0.06ms
expire_tagged() multi - 0.26ms
Whew, blazing!
Comment #97
carlos8f commentedUpdated patch, renamed cache_expire_tagged() to cache_invalidate().
Comment #98
carlos8f commentedDeployed the 6.x Mongo version of Cache Tags yesterday to a production site, and so far there have been 500k entries across 18 bins in the cache, 38k tags and 546k tag invalidations. Server load is very low, transfer rate is about 36Mb/s, and the site's overall speed is much improved. It's working so smoothly, I am a little shocked :)
Edit: after the weekend we've hit 1.5m cache entries, 124k tags and 2.8m invalidations. Working beautifully.
Comment #99
berdirCaptain Obvious here, documentation missing :)
Also, having an empty __construct() in the interface seems pointless, if nothing is passed to it then there is no requirement to implement it?
Also, typo: "invaldate"
Missing documentation too here, and should maybe be protected?
You could have the abstract class implement the interface, then you don't need to repeat the implements part.
Not sure if completely understood the schema changes but this is certainly missing update functions and seems to only change the {cache} bin.
I guess we'd want to wait with this stuff until we decided who manages database cache bin creation and when it's done. Because this brings up an interesting point, namely schema updates...
-22 days to next Drupal core point release.
Comment #100
carlos8f commentedThanks for the review. All good suggestions.
- I added docs!
- took out empty __construct()
- fixed silly method name typo
- flatten() is now a static function (making it protected prevents DrupalDatabaseCacheTags from calling it)
Haven't dealt with the schema update issue, and we might want to leave that to #1167144: Make cache backends responsible for their own storage. I'm not sure if making a special hook for cache bins is really good, since that is making dependencies on the module API for the cache API. However it seems like it doesn't block this issue since we can just provide a hook_update_N() which alters in new fields for cache_* tables in the mean time.
Comment #101
carlos8f commentedAnd...
Comment #102
carlos8f commentedAdded upgrade path, and this time the bot should pick it up.
Comment #103
catchSo the checksum here is going to do an additional query for each cache item retrieved. Memcache does a similar thing for wildcard clears (with a multi get instead of an IN()) and we saw a big performance degradation with this until extra layers of optimization were added on. It'd be interesting to see how much effect this has on page caching for example compared to core and the original patch I did - where the cache queries will be a larger proportion of the request.
However even if the raw read performance is worse, we might still want to go ahead with this - since it ought to scale well, we may be able to do some similar optimizations to memcache (i.e. don't try to fetch tags for bins that never use them), and even if raw read speed is slower better cache hit rates should make up for it in terms of actual server load.
I think we could drop the cache_invalidate() function and just use cache_tags()->invalidate($tags);
I still need to do a thorough review of this but looks like great progress.
Comment #104
carlos8f commentedIt won't be exactly an additional query per cache get, but rather, an additional query per tag on cache entries fetched per page load. Individual tag invalidation counts are statically cached, and summed with counts fetched from the db. This is very cheap stuff though, as the query component of a cache get takes less than 1/5 of a millisecond on a humble Mac Mini.
So far in production the system has been scaling very well, and at over 2.8 million tag invalidations, and
40200+ queries per second, the cache server load is typically at 0.00. I think it works! :)Comment #105
aspilicious commentedtrailing whitespace
Comment #106
aspilicious commentedWhoops, not needed
Comment #107
catch@carlos that's a good point, since the tags are stored with the cache items we're already skipping the lookup if an item isn't tagged.
Comment #108
Anonymous (not verified) commentedi like this.
took me a while to get my head around the tags implementation, but i can't find any obvious problems with this approach.
i found the 'checksum' stuff hard to follow at first, because that made me think of this:
http://en.wikipedia.org/wiki/Checksum
so, how about something like this in the interface:
Comment #109
catchIt'd be great to get #1272706: Remove backwards compatibility layer for cache API in so we don't have this duplicate any longer.
I'm not sure we can enforce that this is a site-wide setting. One use case I have in mind for this is storing cache tags in headers for varnish, then using varnish regexp clearing to invalidate items based on the tags. That would require being able to use specific tag implementations with specific storage. This would mean multiple cache tags backends which all get called each time tags are cleared. It'd be possible to write a cache tag backend that itself calls back to multiple other backends I guess.
In irc msonnabaum also mentioned redis which may well be able to handle directly invalidating wildcards, mongodb can likely do this as well.
Since tags can exist across multiple backends, maybe we just need to allow for multiple cache tag backends - but not per-bin - when invalidating a tag you'd invalidate it for each backend.
Also discussing this in irc, beejeebus mentioned that if this is the case we should move checksum() out of the interface - even if it's used in multiple backends we can allow for alternative implementations that do something else.
Other minor stuff:
Trailing whitespace.
Why the early return here? Also we don't allow if statements without braces.
-24 days to next Drupal core point release.
Comment #110
dixon_@catch I've been thinking the exact same thing regarding Varnish. But I'm not sure per-bin storage will help since you don't necessarily do
cache_set($cid, $data, ..., $tags)for the page bin (i.e. Varnish).So, my thinking was that we add a function called
cache_tag_page()that add tags to aX-Cache-TagsHTTP header. It would work like this:Then we add a hook to
cache_invalidate($tags)and then things could look like this:Comment #111
catch@dixon_ so getting things into varnish I was thinking the varnish cache backend could do that i.e. drupal_set_header() in cache('page')->set() and the terminal run in cache_tags()->clear(). If we allow more than one cache tags backend that's equivalent to having a hook (except we're able to skip a hook system dependency). I'm not sure if the drupal_set_header() trick would actually work when setting the page cache now, but seems like it ought to be possible to make that work.
Comment #111.0
catchUpdated issue summary.
Comment #112
carlos8f commentedRight, we can't really invoke hooks in the cache API. Our goal is to uncouple dependencies like that in D8, not create more. module_invoke_all() is only really usable after a full bootstrap (unless we do some huge lazy-loading change in D8), and the cache API needs to be available much before that. So we would probably just loop over a list of backend plugins and call their respective invalidate() methods, or else create a lower-level hook system for includes to use amongst themselves.
I rather like the idea of cache_tag_page(), but would be concerned about the tagging getting out of control. Also I would generalize it more to be cache_tag_request(), since in D8 we're trying to move away from page-centric thinking and more towards a REST server.
Comment #113
xjmLooks like this got re-tagged through a crosspost; fixing.
Comment #114
carlos8f commentedNow that I think of it, tagging the request might be better as a part of the new context initiative. That way the metadata isn't only for the cache layer, but usable as general information about the request.
At this point we're at 'needs work', but not really sure what needs the work.
- 'checksum' is meant to be a signature to track whether something has changed, it happens to be the total invalidation count in this case, but might be something else in a different implementation. I think it's conceptually a checksum then.
- Running parallel backends for varnish headers seems edge-casey, and could be accomplished by one backend proxying to another (or parent::invalidate, etc).
- Haven't heard a good argument against site-wide tag storage and invalidation.
- Tagging the request may have to wait so we can coordinate that with D8 context and re-use metadata.
- Minor code formatting will get fixed with the next re-roll.
Comment #115
mikeytown2 commentedDon't forget about ETags.
Comment #116
catch"- Haven't heard a good argument against site-wide tag storage and invalidation."
I'm fine with that but this doesn't mean we shouldn't allow storage backends to do immediate tag invalidation (i.e. delete rather than checksum), in which case we should not put that implementation detail into the interface. And since a storage backend could do this, and we allow storage backends to be mixed and matched, then it also doesn't make sense to me to impose this on people building sites. Either turning tags backends into an array and executing all of them, or a proxy backend will do this though, so these are both fairly small but I'd not like to see us lock these two things out of the design when they're easy to account for.
While checksum is technically correct it brings to mind a hash to many people (including everyone who discussed it in irc), so it might make sense to use something more specific, but I'd be fine with using checksum() and a better comment (since we'd move it out of the interface to the SQL implementation anyway).
Comment #117
Anonymous (not verified) commentedre #114, i still don't think checksum() is the right name, but really, i don't care that much - we all know my blue shed looks better than your red one ;-)
i'm much more concerned about keeping the concept in the interface. so, i'll RTBC a patch that still has checksum() in the default sql implementation, but won't support a patch that keeps that in the interface.
Comment #118
dixon_The invalidation will only work for cache backends that can run
cache_tags()->isValid($checksum, $tags)after fetching a cache entry. Varnish will never be able to invalidate cache entries withcache_invalidateas it is. But I agree that a hook is probably not the best solution either.The best solution would probably be the one catch mentioned -- iterating over cache backends and run
->invalidate($tags)on them. But then we need to register cache backends in a different way i.e. not with magic variables but an array, as catch also mentioned.Comment #119
carlos8f commentedI can't see how we can remove checksum() from the interface, without requiring the cache implementation to have special knowledge about the cache tags implementation. isValid() requires a $checksum, so if we remove checksum() we can't have isValid() either. Then we're left with an interface with one method, pointless. Can you illustrate what your concern is?
Comment #120
Anonymous (not verified) commentedyes, we'd remove them both. i don't see why an interface with one method is useless.
most big sites won't use the db implementation. they'll probably use a backend that will invalidate the tags directly, and have useless stub methods for the checksum stuff so that php doesn't choke.
not sure what else to say - the checksum concept seems like an implementation detail, for the default backend that most big sites just won't use. i've said my piece, if everyone else wants that in the interface, so be it, i won't block the issue.
Comment #121
carlos8f commentedSounds like there is a misunderstanding -- the checksum method is much, much faster than clearing tags directly, from benchmarks I've done and real deployment experience. And there is no preference for SQL. In fact, the approach targets NoSQL by keeping things key-value, but it also works great in SQL. I'm already using Cache Tags Mongo version in a large deployment with amazing results, compared to the older cache tags implementation we were running with direct clearing. @dixon_ contributed a wonderful Memcache backend using the checksum interface, and I'm sure it wouldn't be hard to port to APC or Redis either.
A backend that doesn't choose to go with checksums could physically clear data in invalidate(), return 0 in checksum(), and always return TRUE in isValid(). Likely this will be an edge case though, and varnish has special needs so we shouldn't take that as the baseline.
Comment #122
Anonymous (not verified) commentedjust to make sure its clear - please continue with the interface with checksum().
i don't agree with that detail (and i'm probably wrong), but i really, really don't want to block this. this is a great feature for our cache system, and i'm fully behind it.
Comment #123
dixon_Semi-related note...
Here's a friendly fork of carlos' CacheTags module, which adds support for tagging the page cache. A working Varnish implementation is included. Read more here #1341306: New core patch and support for tagging of the page cache and here #1341322: Support for tagging and invalidating the Varnish cache
Basically what you can do is to register the
cache_pagebin as an "external bin" insettings.php. Cache invalidation of external bins can't be made in passive fashion (i.e. with achecksum). External bins needs to actively invalidate the cache.This is not fully relevant to this thread, since we need a better way to solve it in Drupal 8, so please, read more in the issue links above.
Comment #124
Anonymous (not verified) commentedsimple reroll for core directory changes, lets see what the bot thinks.
Comment #125
moshe weitzman commentedJust a re-upload to see if bot will test
Comment #127
Anonymous (not verified) commentedwaiting for catch to commit the 'get rid of BC layer' cache patch before i work on fixing the tests.
Comment #128
msonnabaum commentedHere's a new version that removes the procedural wrappers.
Comment #129
msonnabaum commentedComment #130
Anonymous (not verified) commenteddiscussed this a bunch today with msonnabaum. we're both now leaning toward making the different cache backends responsible for handling tag invalidation. pseudocode stolen from msonnabaum's gist:
so, we iterate over the backends for a site, and if the backend implements DrupalCacheTagsInterface, we call invalidate().
the downside to this is that tags are stored per-backend.
the upside is that it is way simpler, and makes it very easy to handle sites with more than one backend.
Comment #131
msonnabaum commentedThe main motivation behind this is handling reverse proxy backends like varnish or akamai. It's in no way an edge case, so we should handle it without having to work around the current implementation, which seems to be written with mongo in mind.
I'm also not totally convinced that we need checksum/isValid in the interface. I trust it's faster for mysql and mongo, and we should use it there, but I don't know if it's worth the complexity for backends like redis that can probably actively invalidate just as fast. They also won't be used at all for reverse proxy backends.
Comment #132
catchDon't have much time to type, but caught up on the updates and spoke to beejeebus briefly in irc. Allowing cache storage to implement tags, then calling it once per implementing class for invalidation (not per bin, per class - regardless of how many bins it's assigned to), sounds great.
The only thing that's a bit tricky is we need a wrapper (currently cache_invalidate()) that collects all classes in use, checks if they implement that interface and calls the same method on them with the same argument. That's a little messy since it introduces a procedural wrapper to the cache system other than cache(), but on the other hand it will nuke the existing one that's still in there as cache_clear_all() hard coded to the page and block caches, so it'd be an improvement compared to now if not ideal keeping things self-contained - and the actual invalidation logic is still entirely in the backends, this is just a helper.
Comment #133
msonnabaum commentedJust to compare with the numbers above, here's redis with the checksum implementation and with immediate invalidation:
with checksum:
immediate invalidation (using SUNION and DEL):
So while deleting the entries at the time of invalidation is more expensive, I don't know that it's enough to warrant the overhead on get/set for this particular implementation.
Comment #134
catchYeah, at a minimum I think we want to leave it open for implementations to do direct invalidation, even if everything (apart from varnish) ends up doing the checksum method, there's no reason to bake the implementation details into the interface and prevent people from trying different things out.
Comment #135
Anonymous (not verified) commentedok, first rough cut, cache tags tests pass.
todo:
- make a real cache_get_backends(). not 100% sure how to do this, as it will require discovery.
- make the tests cover more than one active backend
- docblocks need work
setting to needs review for bot, but definitely still needs more work.
Comment #136
Anonymous (not verified) commentedthis time without the commenting out of other test methods.
Comment #137
catchThis is fortunately no longer my baby.
Comment #138
msonnabaum commentedHere's a new version that changes invalidate() to invalidateTags() and replaces the cache_default_class and cache_class_* variables with a new cache_classes variable so that we can discover the classes easier. Here's how it would be used:
Comment #140
msonnabaum commented#138: cache-tags-636454-138.patch queued for re-testing.
Comment #142
msonnabaum commentedShould fix the testing error.
Comment #143
msonnabaum commentedComment #145
msonnabaum commentedOne more time.
Comment #146
pounardfunction cache_invalidate(Array $tags) {Even if PHP is not really case sensitive, it should be a lowercase "array".Comment #147
Anonymous (not verified) commentednice work!
a couple of nits:
can we make that:
also, would probably be good to make cache_get_backends() use 'cache_backends' as the variable name? or go the other way, and make it all cache_get_classes()? not a big deal either way.
Comment #148
msonnabaum commentedNew patch with some of the above comments incorporated.
Comment #149
Anonymous (not verified) commentedcoolio, this looks good to me. will wait for the bot before RTBC.
Comment #150
msonnabaum commentedFixing code standard violation.
Comment #151
aspilicious commentedFlattenS
/**
* Defines a common interface cache backends that support tags.
*/
See http://drupal.org/node/1354#classes
InvalidateS
InvalidateS
GetS
AddS
ModifieS
Srry for posting the same "doxygen error" that many times but using modify in stead of modifies is just a habbit we have to forget (except when creating hook documentation).
-4 days to next Drupal core point release.
Comment #152
msonnabaum commentedTagging novice to reroll with aspilicious' suggested cleanups.
Comment #153
aspilicious commented:) I'm going to mark this neesds review again for the code. Btw if someone is going to reroll this don't pay attention to the missing docblocks if they are to hard.
Comment #154
msonnabaum commentedNew patch with doc cleanups.
Comment #155
pounardDo the
DrupalCacheTagsInterfacemeans a backend that supports tags? If yes, it's possible to make it more explicit 1) by naming, and 2) by interface inheritance:Could be:
Then the naming sucks (I'm thinking of the cache as PHP 5.3/PSR-0 naming), something like:
But this is a minor note, the patch seems good.
Comment #156
bibo commentedThis issue is currently for D8. Is it likely there will be a backport to D7?
I can see this having huge performance potential, especially when used with Varnish :) I also happen to have rather immediate and specific needs for http tagging & and cache clearing with Varnish, which means that I'll just create my own module, for now. If cachetags didn't need a core patch I would use it partially.
Anyway, my module will have a proper UI, context & rules support, and some trick for esi + user specific caching, which should make it very flexible and reuseable. Even so, I'm definitely also awaiting this core feature eagerly!
Comment #157
catchIt looks to me like if the same tag is used across multiple bins, then while it will be stored in a single {cache_tags} table, we'll be doing duplicate queries for it since the class is instantiated once for each bin.
I'd been thinking we'd allow a separate tags backend to the actual storage, but that's not happening here now. If that's the case, I'm not sure there's any value to a separate interface here - we have no way to enforce which modules put cache items in which bins and whether they try to use tags or not, nor which backends site admins assign to each bins - so why not put this directly into DrupalCacheInterface since it's going to be a requirement anyway?
Can't do this, the original schema definition has to be copied here or put into a helper, otherwise there's no way to know which version of the schema sites install with if we change it.
Also we can't rely on module_list() to find cache tables, since some modules may be disabled (say block module or something). So this would need to use db_find_tables() I think.
-1 days to next Drupal core point release.
Comment #158
catchOK I've re-rolled this.
- updated for PSR-0/namespaces.
- tags support is added to the CacheBackendInterface now, no separate interface.
- modified the upgrade path to not depend on hook_schema() or module_list() (did not run upgrade tests though, out of time).
That should be all the meaningful changes in the patch I think.
Comment #160
aspilicious commentedNeeded?
5 days to next Drupal core point release.
Comment #161
pillarsdotnet commentedObviously not.
Comment #162
pillarsdotnet commentedComment #163
tstoecklerIs there a reason this doesn't use cache_get_backends()? It has a different default, but it seems that cache_get_backends() still has the pre-PSR-0 one.
This doesn't exist anymore, the if can simply be removed.
Same question here: Couldn't this use cache_get_backends()? It's the installer so there is probably little benefit to having this configurable, but still.
This could use a function header. Should be the simple
"Implements CacheInterface::invalidateTags()." though.
Comment #164
lars toomre commentedCan we add type hinting here? I believe it should be an array.
There is a missing @return directive here (ideally with type hinting).
I think the wrong type hint is given here for tag param. Also I think it should be $checksum and $tags. Finally, there needs to be a blank line before the @return directive.
Another blank needed before @return directive and type hint needed for $tags variable. Several other $tags vaiables also could use type hinting too.
Comment #165
catchThis should address Lars and tstoeckler's reviews.
With install.inc, we're specifically setting the cache backend to the install backend, that's a hack required for very obscure install system bugs, so I've left that bit unchanged.
Comment #166
tstoecklerYes, that definitely makes sense. (I had somehow overlooked that we are a setting a different class in install.inc)
I couldn't find anything more to complain about from visual inspection, but have yet to play with it, so not setting RTBC.
Comment #167
dixon_The last iteration looks very good to me. But before we RTBC this, we probably should have some tests for this, no?
I'll try to give the last patch a ride very soon, and write some tests for it.
Comment #168
tstoecklerAlso noticed that there are no tests yet.
I'll have a go at that, maybe that's a good way to try this out.
Comment #169
catchHmm we have tests for this but they went awol in one of the re-rolls, added those back!
Comment #170
tstoecklerOops, that was a crosspost. (Great minds think alike :))
Alright then @_dixon: first come, first serve. Go for it :)
Comment #171
catchI think that was another crosspost ;)
Comment #172
tstoecklerWow, another one...
Comment #173
tstoecklerincredible...
Comment #174
lars toomre commentedDoing a final review of this patch, I noticed that this docblock is missing a @return directive.
This is also missing type hinting.
Comment #175
catchFixed the docs up.
Comment #178
catchAnd the installer wasn't updated for the type hinting...
Comment #179
lars toomre commentedThanks @catch. The documentation fixes look good.
The one thing I am unsure about is the current practice for the translation of assertion messages. I think the best practice now is to remove all t() calls and use format_string() where necessary. However, I may well be wrong.
Comment #180
pounardNothing to do with the patch itself, but aren't you worried that the cache tags feature might no be implemented easily with some backends, and that we may end up with bailing out some of them because of it? Aside of that, it's an excellent feature, but I'm still afraid of loosing some great backends (memcache, redis, every one of them that are key/value stores in fact).
Comment #181
catchWith the checksum method of checking the tags, it should be completely fine to implement this with both memcache and redis.
The worst thing for key/value stores is wildcard cache clears, which are implemented by the memcache backend in 6.x and 7.x but the code to do so is extremely complex and quite fragile. Once this is in, I'd want to see if we can replace wildcard clears with tags (or potentially we could move to something like cache namespaces which the stash project uses). But yeah this has been considered in depth in this issue and it should hopefully make things better rather than worse.
Comment #182
pounardI hope so, but I still see some weaknesses, even if the checksum method there are chances of cache stall and memory bloat if the checksum validity check is made on access, some cache entries may not be accessed for a while (note that this is also true for actual cache usage anyway so it doesn't really matter).
Thanks for the update, it seem perfectly doable indeed.
Comment #183
catchFixed the noticed and removed t() from asserts.
Comment #185
catchComment #187
berdirChanged self::tagCache to self::$tagCache, the first one is unfortunately not PHP :)
Let's see if this is better now.
Comment #189
berdirFixed a remaining $self->tag_cache and initialized self::$tagCache to an empty array to prevent array_merge() errors.
Comment #191
berdirGrr... and now without syntax errors :)
Comment #192
msonnabaum commentedThis is looking good to me. There are a few improvements that could be made I think, but could wait for separate issues. I don't think we're going to get much more insight on this until we start using it in core.
Comment #193
Anonymous (not verified) commentedjust a me too comment - i think this is ready to fly.
Comment #194
catchI was heavily involved in this one, but it's had the "Favorite-of-Dries" tag for a while, so I'm comfortable committing it given it's had plenty of review from plenty of people at this point.
Committed/pushed to 8.x. We'll need a change notification for the API addition, and hopefully see people over in #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support and other follow-ups.
Comment #195
tim.plunkettWhile attempting to write a change notification for this, I was reading the docs for CacheBackendInterface and I noticed it still had references to cache_default_class and cache_class_*.
Maybe this should be a follow-up issue, but I'm uncertain of any change notification until the in-code docs match.
For the record, I'm not sure my change is actually correct, especially since it makes 1 line into 3, and its all non-CMI variable_get/set stuff.
Comment #196
tim.plunkettPer xjm's suggestion, retitling for now but leaving priority.
Comment #197
msonnabaum commentedThat looks correct, but it's kinda silly that we give the example using variable_get/variable_set. No one does that. You use use $conf in settings.php.
As far as I know, CMI has not tried to deal with the settings.php settings, which this would be, so it seems like a good idea to just update the docs using $conf.
Comment #198
tim.plunkettThe only usage of this in core is
$conf['cache_classes'] = array('cache' => 'Drupal\Core\Cache\InstallBackend');in includes/install.core.inc, so that fits.However, doesn't this also need a hook_update_N for any existing cache_class_* variables and cache_default_class?
Comment #199
msonnabaum commentedIf everyone sets their cache classes in $conf, an update hook wouldn't really do anything, so I'm leaning towards no.
Comment #200
chx commentedOf course we set them there, it'd impossible not to. The very reason you set an alternative cache backend because you want to avoid a database, right :) ?
Comment #201
tim.plunkettWrote up http://drupal.org/node/1534630 and http://drupal.org/node/1534648, someone who actually knows what is going on, please review them :)
Comment #202
berdirWasn't aware of the changes related to these variables, confirmed that the new documentation is correct by looking at the code.
Note: RTBC is for the documentation cleanup patch in #198 (wow), so this needs to set back to needs review once that is commited for the change notices unless someone confirms that they are ok in the meantime.
Comment #203
berdirAbout the change notices:
> "Which would now be"
Maybe something along the lines of "Instead of a separate variable, the cache backend of the default bin "cache" is now used as the fallback, which can be set like this". (Edit: If we'd rename the default cache bin e.g to "default" instead of "cache", this would make even more sense ;))
The wording can probably be improved/simplified, but I think it deserves a sentence that actually explains why it's like that. Looks confusing otherwise IMHO.
Similar for the actual cache tag change notice, I'm not sure if that's common, but cache tags aren't just a replacement for cache_clear_all(), they're a completely new API. So a short introduction along the lines of "Cache Tags can be used to flag certain cache entries which allows to invalidate them based on the used tags later on. For example, invalidate all cache entries that have been tagged with a certain node id if that node is updated."
Also, again not sure about the procedure, but cache_clear_all() already already been dropped for all uses except the no-arguments case that clears page and block cache. So maybe you should use the new $cache->delete* methods? Also, maybe a better example would actually be how you can use it to clear all caches tagged with a node id instead of using a prefix/wildcard.
Example code for that:
I feel something like this explains the advantage of the tags system better.
Comment #204
tim.plunkettBerdir, I was just posting my limited understanding of it after an hour of staring at the patch, please feel free to edit the change notice!
Comment #205
catchCommitted/pushed the followup, back to CNR for the change notice.
Comment #206
tim.plunkettPer #203, the change notices need work. They're linked in the bottom of the issue summary.
Comment #207
berdirUpdated the change notices. Should be closer to a real world use case now. Back to needs review.
Comment #208
catchLooks good to me, marking this one fixed.
Comment #209
plachRestoring title
Comment #211
claudiu.cristeaThe only thing not consistent is the "cache_tags" table name which has the same pattern as a regular cache bin table. This is confusing but not critical.
Comment #212
aspilicious commentedComment #213
claudiu.cristeaRenaming to
cachetagswill fix but, I know, it's ugly :)Comment #214
pounardIt's not really ugly, both "cache" and "tags" word and readable and short enough for us to name the table "cachetags" without making it less readable. Or we could just change the cache system to be configured to use only explicitely registered bins by modules, and never allow to register the "tags" bin, this should be the most secure and elegant way to do this, plus it would allow us to get rid of the infamous hook_flush_caches() hook implementation for most modules in favor of something like hook_bin_info().
Comment #215
claudiu.cristea+1 for registering bins OR, more simple, redesign the cache table pattern. Something like:
cache_bin_*.For consistency I would rename also the
cachetable.Examples:
cache_path=>cache_bin_pathcache_filter=>cache_bin_filtercache=>cache_bin_drupal<<< switch the main cache table to a consistent patternComment #216
pounardYeah, good idea. For the record, there is actually an issue opened for this, it didn't moved since a long time ago, let me try to find it.
#1167144: Make cache backends responsible for their own storage Found it!
Comment #217
claudiu.cristeaSo, what is your option? Registering bins or change cache tables pattern?
Comment #218
berdirSee #216, there is an issue for registering cache bins. I don't think we need to discuss this here. I recommend to close this issue again.
Comment #219
aspilicious commentedClosing again... Lets meet in the other issue :)
Comment #220
fubhy commentedDidn't want to reopen this... When reading through the DatabaseBackend implementation for invalidateTags() I noticed that there may be a major performance issue. I may be wrong though: #1800768: DatabaseBackend tag invalidation performance
Comment #221
sunFollow-up: #918538: Decouple cache tags from cache bins
Comment #221.0
sunAdding followup issue
Comment #222
David_Rothstein commentedI think we should consider backporting this so it can be used in Drupal 7 core (and so it can be more easily used by Drupal 7 contrib modules).
#2598340: variable_set() never clears the page cache is an example of a core issue that could use this.
The Drupal 7 module (https://www.drupal.org/project/cachetags) doesn't have much usage, but I used it for a project once a couple years ago and found it to be absolutely great. It basically looks like something that could be dropped into core almost as is. The only modification it makes to existing core functions is an extra (optional) parameter to cache_set().
Comment #223
bigjim commentedI used the D7 cachetags project a few years ago, it worked like a charm. Has most of the patches we need to backport to D7.
Comment #224
wim leersI'm sorry, but I can't help but answer this bluntly:
Do you have any idea how much work it was to update all the things to have them support cache tags?
It literally took more than a year of me working quasi-full-time on "cache tags for all the things" in Drupal 8. And that was with extensive help of many others. And it required API changes. Lots of them. But I think by "cache tag support" you really mean just "API support" and not "make everything in Drupal 7 support it". If so, please adjust the title, because I just suffered a heart attack in your place :P
Or, much better, open a separate issue instead of reopening an issue that was closed for almost 3 years, and that is already beyond 200 comments.
That being said, even using this "just" for one-tag-per-variable (which is what you suggest at #2598340: variable_set() never clears the page cache) is fraught with problems (see #2598340-2: variable_set() never clears the page cache).
Comment #225
pounardActually having the API does not mean that everything must use it, but having it in core would be a great help for site developers (not builders) to put in place smart tags based caching policies on very specific business oriented sites. Put that in place and possibly a hook_cache_set_alter($item, $bin) and the developer would have total control over his site cache policy.
I'm saying that because I do work on long-run maintenance on multiple complex D7 sites which are never going to migrate on D8, but will continue to be maintain for a long time.
Comment #226
nielsvm commentedI don't think this is a good idea at all, for several reasons in fact:
invalidateTagsmethod creates a false expectation and more importantly confusion to site builders and developers.Although D7 ships a
DrupalCacheInterface, nothing will stop anyone from providing a module that swaps or wraps the standard backend introducing new methods such asinvalidateTags.But no, I don't think this should be backported - to core - in any form.
Comment #227
catchGenerally agreed with others that this is better discussed in a new issue relative to the current state of 8.x and 7.x contrib rather than here. Whatever could be added in 7.x is not going to be an actual backport of this patch.
Comment #233
damienmckennaIt has been nine years and nobody has shown any interest in the major rewriting it would take to add this to D7, so would one of the maintainers please make sure the attribution is correct and mark this fixed? Thank you.
Comment #234
catch(hopefully) fixed the issue credit, and moving back to 8.x/fixed.