For more background, see http://www.metaltoad.com/blog/how-drupals-cron-killing-you-your-sleep-si..., which was written for D6, but the same problem exists in D7. Here's a summary:
- system_cron() calls cache_clear_all(NULL, $table) for a bunch of tables, including 'cache_page'.
- comment_form_submit(), node_form_submit() and a bunch of other functions call cache_clear_all() with no arguments, which results in cache_clear_all(NULL, 'cache_page') being called.
- Records in cache_page have the expire field set to CACHE_TEMPORARY, because that's what drupal_page_set_cache() sets it to.
- The default class that implements the 'cache_page' cache bin is DrupalDatabaseCache, whose clear() method when NULL is passed as $cid deletes all records with an expire of CACHE_TEMPORARY.
- There is a cache_lifetime system variable, settable on the admin/config/performance page, that can be used to ensure that DrupalDatabaseCache::clear() doesn't run more than once within the corresponding interval, regardless of how often cron runs or website content gets changed.
This setup doesn't make sense to me. If I setup cron.php to run once an hour, why should my page cache be emptied each hour? Maybe I have a site whose content is only changed once per week on average: why should my page cache be emptied on cron at all? And while, yes, I can tune the cache_lifetime variable to be longer than my cron interval, doing so also means that when content is changed (i.e., a node or comment form is submitted), then my page cache really is stale for up to cache_lifetime.
And while, yes, I can override DrupalDatabaseCache with some other class, I'm not sure that helps, since I don't think it would have any way to distinguish whether clear() was called due to a node/comment submit or to a cron run.
Does anyone have thoughts on this? Knowledge of why system_cron() clears caches at all?
Related issues:
#1279654: Page cache is CACHE_TEMPORARY and does not honor cache_lifetime
Comments
Comment #1
effulgentsia CreditAttribution: effulgentsia commentedAs an aside, someone mentioned the 'page_cache_maximum_age' system variable that we now have in D7, but from what I can tell, it doesn't affect this issue, since all it does is control external proxy caches, and not Drupal's cache_page table/bin.
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedIt doesn't. system_cron() clears *temporary entries* from the cache table. It's a garbage collection mechanism that is very necessary.
Why pages and blocks are cached with CACHE_TEMPORARY is the actual question.
Comment #3
effulgentsia CreditAttribution: effulgentsia commentedThanks Damien. If CACHE_TEMPORARY is incorrect for page and block caches, then cache_clear_all() may need to be changed or else node_form_submit(), content_form_submit(), and friends may need to change how they call it. In any case, retitling issue to get at the heart of the problem from the perspective of a site owner.
Comment #4
catchIf page_cache_maximum_age is set, this is used as the value of $cache->expire, see http://api.drupal.org/api/function/drupal_page_set_cache/7
However block module always does CACHE_TEMPORARY -
cache_set($cid, $array, 'cache_block', CACHE_TEMPORARY);
We could look at changing this to use variable_get('block_cache_maximum_age', $something), but that means either breaking string freeze and a new dropdown to the performance page, or a hidden variable.
Comment #5
effulgentsia CreditAttribution: effulgentsia commentedDoes this mean if page_cache_maximum_age is 2 weeks, and node_form_submit() is called 1 week after a page is cached, then the node_form_submit() will fail to clear that page from the cache? Do we need to change node_form_submit(), comment_form_submit(), and friends to cache_clear_all() with $wildcard=TRUE? Sorry, I don't know much of the history and thinking that's gone into our page and block caching systems. My very naive assumption is that content changes should fully clear page and block caches regardless of their expire field. And I'm not clear on what purpose page_cache_maximum_age and (a hypothetical) block_cache_maximum_age serve. Why would you not want those to be very large, as long as you're emptying those caches when content changes invalidate them?
Comment #6
Jerome F CreditAttribution: Jerome F commentedI don't understand everything in this fread, but I'm experimenting performance issues on a drupal 6 website with very low frequentation. The first one to access the page will have to wait a while for the page to show up. Once that's done the other visitors have reasonably fast access to the website. I suspect that's due to the cache cleared by cron. If that's not totally out of subject here, could you please share hints about how to handle cache for a small D6 website ?
edit :
I updated to the latest version of boost, it has become very userfriendly and it seems to solve my problems.
Comment #7
David_Rothstein CreditAttribution: David_Rothstein commentedSubscribing.
Yeah, I am scratching my head trying to think of why CACHE_TEMPORARY ever makes sense to use in cache_set()... it seems like it rarely would? (It's sort of like a declaration that "I don't trust my code to clear caches correctly when it's supposed to so instead I'm going let cron clean it up whenever it's run" which is just bad.)
Comment #8
catchJust a note that #4 was wrong, $cache->expire will be set if there's an Expires header, but not from page_cache_maximum_age.
Comment #9
Ian Ward CreditAttribution: Ian Ward commentedI can confirm the behavior described by effulgentsia in the original post, and add that 'page_cache_maximum_age' is only used to set the 'Cache-Control' header "max-age" value in Drupal core. When you use something like Varnish, the max-age tells Varnish how long it should serve that page from its cache before checking the back end for an updated version. It also tells the browser how long to hold the cached page for. I cannot get Varnish to send cached pages unless max-age is set to something greater than 0. That said, I could configure Varnish to manipulate the Cache-Control header when it comes in and force it to cache objects, but it does by default respect this max-age setting, controlled by page_cache_maximum_age.
I also do not know a lot about the history of why the settings are the way they are. Another argument could be, just change CACHE_TEMPORARY to CACHE_PERMANENT in drupal_page_set_cache() and then some contrib module could take care of the user expectations that page caches should be cleared when nodes and comments are updated. Views, panels, and other sections and page callbacks in Drupal provided by contrib modules and composed of nodes or other entities will also come with this expectation. It seems like the API functions are there to let a module like http://drupal.org/project/cache_actions or http://drupal.org/project/expire or future module deal with that complexity. If core just clears pages aliased to nodes or when comments are added/updated on nodes aliased or not to pages (cids) it seems too specific where a more general module could take its place. It could all go into one of these contrib modules, as long as the pages are getting stored with the correct 'expire' value (not CACHE_TEMPORARY).
Patch attached to use CACHE_PERMANENT instead of CACHE_TEMPORARY
Comment #11
David_Rothstein CreditAttribution: David_Rothstein commentedLooks like the testbot shows why that won't work (and it is pretty darn awesome that there is a test - in the poll module, no less!! - to catch that).
I think what this means, unfortunately, is that this bug will be difficult to fix in Drupal 7 without something resembling an API change. (Though you could certainly argue, as @effulgentsia says, that node, comment, as well as any contrib module should always have been clearing the entire page/block cache anyway in these situations, because you never know who else might have stored something permanent in there.)
Comment #12
catchI think we should consider setting an expires header if page_cache_maximum age is set, that would give some kind of sanity to this, although reading through the original reverse proxy issue there might be a problem with expires headers and certain proxies, but I'm not really up on exactly how much of a problem that is.
All of this can be done in contrib though, so I'd be tempted to move this to 'major' and D8, and solve it in contrib for D7, and there's already work started on that in performance_hacks module.
Comment #13
Damien Tournoud CreditAttribution: Damien Tournoud commentedThat sounds like a decent plan to me.
As noted above, we also need to change the cache_clear_all() function to clear all the entries in block and page forcefully ($cid = '*' and $wildcard = TRUE).
Comment #14
ohnobinki CreditAttribution: ohnobinki commented+
Comment #15
Agileware CreditAttribution: Agileware commentedSubscribing
Comment #16
catchmemcache has various workarounds for this, the latest version is in #1134242: Review cache_lifetime behaviour..
While it is tempting to kill CACHE_TEMPORARY that needs some work and can't be done in a way that could be backported to Drupal 7, there's an interim measure we can do to improve things a bit before then though:
* add a garbage collection method to the cache interface.
* the core cache.inc will have the same code for content clears and garbage collection, at least for now, that's fine.
* add hook_cache_bins() - this would stop modules that do things other than declare cache bins in hook_flush_caches() from having their code executed every cron run.
* system_cron() would then get all cache bins, load the class that controls them, and run cache_gc() on each.
* this could be backported, we can partially workaround the cron issue for the core cache.inc by only running the garbage collection once every x hours, instead of every single cron run.
Also something like this may end up blocking a full memcache 7.x release - since the 6.x branch solves some of this by changing the logic in cache_clear_all() - cache_clear_all() itself is no longer pluggable in 7.x.
Comment #17
catchComment #18
Fabianx CreditAttribution: Fabianx commentedSubscribe
Comment #19
pillarsdotnet CreditAttribution: pillarsdotnet commentedxref: #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers)
Comment #20
catchThis actually blocks a stable release of Drupal 7 memcache module now, so I'm bumping to major.
n Drupal 6, all of cache.inc was swappable, this meant cache backends could do their own logic in cache_clear_all().
In Drupal 7, cache_clear_all() is hard-coded in cache.inc, and just calls cache_clear_all(NULL, 'cache_page');.
That means there is no longer any way to differentiate between cache_clear_all() triggered by node or comment posting, and cache_clear_all() triggered by cron's garbage collection (see system_cron()).
In core/db caching, this means that the page cache gets wiped every cron run.
In memcache, it's the opposite - the cache doesn't get wiped on wildcard clears.
See #1246796: Memcache fails its own tests. and #1103478-16: Skip some wildcard fetches for background where this issue was fixed in memcache - which required doing custom logic in cache_clear_all() when called with no arguments.
The way I'd like to fix it is by adding a new method to the cache API (either to the interface or optional, depends how we feel about bc for other cache backends), which gets called when cache_clear_all() is called with no arguments.
In the database backend this could just do what cache_clear_all() does now, but for memcache it would allow the workarounds in the 6.x-1.x branch to be implemented.
That new method already exists in this patch #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers), however that contains a lot of changes that aren't necessary to fix this bug, so bumping this one to major for the memcache blocker.
The fix proposed here of setting an explicit expire on page and block cache entries (more or less deprecating CACHE_TEMPORARY) will fix the database issue, but it won't fix it for memcache at all. We might need to split it out into two issues, but trying to keep it in one place for now.
Comment #21
jamiecuthill CreditAttribution: jamiecuthill commentedI've recently come across the problem with Drupal 7 and memcache not clearing the page cache after the page cache maximum age is passed. After searching through the code I was surprised that the page cache maximum age is not used when adding the page into the cache.
To fix the memcache issue and other backends wouldn't it be sensible to set the expire time of the cache_set in drupal_page_set_cache (common.inc:4964) to expire at REQUEST_TIME + page_cache_maximum_age?
Comment #22
RichardLynch CreditAttribution: RichardLynch commentedI may be misunderstanding this test or even the whole thread and where things stand, but just to weigh in:
NOT caching all pages because a poll wouldn't work properly from a cached page, to me, indicates that the poll implementation needs to be fixed to clear the cached page upon submission, or to ajax in the results, or to disallow a vote from anonymous based on a cookie, or something along those lines.
Nuking all pages in the cache every time cron runs so polls can work is swatting a fly with a cannon.
I've already had to patch core in D6 for this issue. Please consider this carefully for D7 and D8.
cache_page should honor cache_lifetime, imho, as that's the naive expectation of every user who visits the Performance page.
cache_page getting totally wiped out every time cron runs as default behaviour is definitely "unexpected behaviour"
Even Drupal community members who should know better got this wrong in the comments on the metaltoad page referenced. :-)
As much as I love that a test caught this, I really do not want to maintain my one-liner D6 core patch when I install D7/D8, just for the sake of making a great test for a mis-implemented poll feature "pass" :-) :-) :-)
Comment #23
catchHere's a rough start on what I outlined in #20. The only core change is in system_cron() and DrupalDatabaseCache.
This should be more or less backportable as is to D7.
Cache backends may now specify a garbageCollection() method.
system_cron() by default only does garbage collection every 24 hours.
Not expecting this to pass tests but you never know.
Comment #24
catchUse 'cron' as the flood identifier.
Comment #25
BTMash CreditAttribution: BTMash commentedBased off the discussion in #1279654: Page cache is CACHE_TEMPORARY and does not honor cache_lifetime, the issue with this patch would still be on content pages that are created prior to the content actually being expired (and the cache expiry time being set to CACHE_TEMPORARY (which is -1)) with the garbage collection running. For example:
Under this scenario, the content should not expire until atleast 4am. However, because the content has an expiry time set to -1, the page will still expire at midnight because its expiry time is less than the current time of garbage collection. I **think** one of the changes that should be happening is that the page expiry time should be set to the current time plus the page lifetime so the cache affects the relevant expired pages.
Comment #26
catchSo the problem with using $cache_lifetime as the value for expire is that this turns it into a maximum - if your cache lifetime is 5 minutes then once #534092: cache_get() returns expired cache items is fixed you're in serious trouble since items will last shorter than CACHE_TEMPORARY items do now.
However you're right that even if we reduce the frequency of garbage collection we still have a problem, so here's a new patch.
Instead of using a variable prevent cache items from being cleared more often than cache_lifetime, it instead runs two queries. One to remove expired items, and one to remove CACHE_TEMPORARY items that were created longer ago than variable_get('cache_lifetime')
Still untested, however if that query is OK it should solve the two conflicting issues here.
Comment #28
catchWell apart from that.
Comment #29
catchRemoved method_exists() since this is part of the interface in D8 now.
Note I am specifically not adding an update to remove the cache_flush_ . $bin variable since some alternate cache backends still use it.
The Drupal 7 update here will be tricky since we need to find all cache_* tables and add an index to them, however I believe there are schema functions for this so it ought to be possible without invoking hooks.
Comment #30
BTMash CreditAttribution: BTMash commentedAfter a few times of reading through and trying to understand what is going on, it finally makes sense to me :) However, is there a reason for garbageCollection() in the D8 version if its just a wrapper for expire()?
But aside from that, the code that actually matters looks sound. So its a +1 from me towards RTBC.
Comment #31
catchThe reason to have the two methods is so that memcache and similar which handle expiry of cache items themselves are able to add a no-op method there. This could use some tests for the cron logic.
Comment #32
catchWanted to see how it runs with a million items in cache_page so hacked cache.inc to use time() and made a script to put 1m items in there (after creating 100k with a different script first before I realised that probably wasn't enough).
That ends up with:
Found a point just over half way:
That query is nicely indexed:
Then deleted those:
then I tried deleting 500k items with the old query:
So the new query is about 60% slower than the old one with 500-600k items (although I'm only running this once each on my laptop so there's a potentially wide margin of error, so main thing is that it's not magnitudes different). Either way the cache hit rate should be improved, and deleting 500k items all at the same time should be extremely rare.
I did a similar test with 100k rows in the table, deleting 20k, and that took 8ms to delete.
Comment #33
RichardLynch CreditAttribution: RichardLynch commentedMaybe it's just me, but it seems like the CSS cache clearing routine, at least in D6, is quite brain-dead in regards to pages that might be cached with references to the CSS files in question...
Step 1) Get your Page to cache for a long time.
Step 2) Something calls drupal_clear_css_cache.
Step 3) Surf to a Page in the cache.
Step 4) View perfectly valid HTML that references 404 CSS files. Lovely.
Maybe you guys have this covered already in D8, but I'm still trying to get decent performance out of D6 page_cache...
I don't claim to know the right answer to this issue, but perhaps each CSS file should have a registry of cache_id values that reference it or something... Ouch.
Comment #34
catch@richardlynch that's off topic for this issue, and there are contrib modules like advagg that deal with it in D6.
Comment #35
BTMash CreditAttribution: BTMash commentedMy generate script for 1 million rows was:
(note that I did not go through cache_set as this gave me a bit more control on the created time). I have to figure out and write a test to see that it works well (though running it through 1M items is a bad idea given it took nearly 20 minutes just to generate the items on my comp). But the tests by eye so far indicate that it is working as expected.
Comment #36
RichardLynch CreditAttribution: RichardLynch commented@catch
Unfortunately, my naive patch to just make page_set_cache honor cache_lifetime appears to run afoul of the CSS brain-dead clear-all...
We seem to be having a theme-less (CSS-less) page every couple days, and I believe my patch is the cause.
It's quite the crisis right now, and they're wanting a whole lot of analysis before I revert my patch in PRODUCTION...
The more I study D6 caching, the more it looks like spaghetti just waiting for a tiny blunder to cause an avalanche of issues.
If you can make the page cache lifetime longer, and *NOT* have the CSS required by that page disappear out from under it, you're a better man than I.
I'm just going to live with poor performance and let Drupal take the fall for it. Sorry.
Comment #37
RichardLynch CreditAttribution: RichardLynch commentedNOTE:
If you can keep your page lifetime smaller than CSS lifetime, you should be fine...
But I was wanting both to be like 6 months or a year, and only managed to make a page out-live its CSS and JS requisites.
Ouch.
Comment #38
Fabianx CreditAttribution: Fabianx commented@RichardLynch: That is still off-topic here. As catch already said advagg deals with that in D6 very nicely. Boost also has solutions for that (re-caching CSS files at another place), so please use one of those solutions and be happy :-).
To the original issue: Nice progress here. I've also been bitten by the cache clear on cron, but with Varnish in front its not as bad.
Comment #39
pillarsdotnet CreditAttribution: pillarsdotnet commented#29: lifetime.patch queued for re-testing.
Comment #40
catchRe-rolled for /core.
Comment #41
catch#40: lifetime.patch queued for re-testing.
Comment #43
iamEAP CreditAttribution: iamEAP commentedLooks like there have been significant changes to core's cache.inc since last post.
Where does this issue stand?
Comment #44
iamEAP CreditAttribution: iamEAP commentedI don't intend to hijack this thread, but I'm coming to this issue with my primary concern being that page cache entries last only as long as the time of the last cron run + cache_lifetime. (Many issues relating to that one issue are marked as dupes of this, which is why I'm here.)
A can't find a concise, accurate summary of this anywhere, so here's my crack at it:
In the worst case, that means the page cache is wiped on every cron run. If an administrator sets their cache_lifetime to, e.g. 6 hours, then cache entries are wiped every six hours, regardless of when the page cache entry was created (so even if a cache entry is created at 5:50pm, if cron runs at 6:00pm and its been cache_lifetime since the last time page cache was flushed, the 10-minute-old entry will also be wiped too). As has been mentioned before, this is unintuitive and very bad for performance.
Rather than making very large changes to how the cache clear happens, I think it makes more sense to make very small changes to how drupal_page_set_cache operates.
Catch's workaround proposed in #12 (setting an expires header in contrib based on page_cache_max_age) is almost workable, but I can think of cases where decoupling external cache lifetime and internal page cache lifetime is desirable.
Why not just make drupal_page_set_cache() set a default expiration of REQUEST_TIME + cache_lifetime if a cache_lifetime exists, otherwise default to CACHE_TEMPORARY?
Patch for D8 supplied knowing nothing about D8, but hoping for a direct backport to 7.
Comment #45
catchiamEAP, that's patch is not going to get committed - it means the minimum and maximum cache lifetime becomes the same, which is not the right fix.
#40 could do with a re-roll though, the actual code hasn't really changed much, just moved around.
Comment #46
iamEAP CreditAttribution: iamEAP commentedCould you clarify how the minimum would also become the maximum cache lifetime in that situation? Also, is it even relevant? All the minimum cache lifetime guarantees is a minimum cache lifetime. Beyond the lifetime, nothing can or should be guaranteed.
The text on the performance page (in 7) reads as follows:
As noted many times in this thread, this is not actually happening in any current version of Drupal. The patch in #44 makes the least number of changes (no API changes, unless I'm misunderstanding "API" in this context), stays consistent with the text on the page, clarifies the meaning of cache_lifetime, and could easily be written to apply to all current versions of Drupal, whereas the approach from #40 makes more significant API changes and doesn't have as clear a backport to 7 (let alone 6).
Comment #47
catchif $expire = $minimum_cache_lifetime + REQUEST_TIME, then as soon as that timestamp is reached, the item will be invalid, hence, it becomes the maximum. This will actually make things considerably worse rather than better for sites that have a short cache lifetime and longer cron intervals.
Comment #48
iamEAP CreditAttribution: iamEAP commentedRe-roll of #40 (due to #22336: Move all core Drupal files under a /core folder to improve usability and upgrades).
Comment #49
catchMoved the 24 hours logic out of system cron to the database backend itself, some other small cleanup as well.
Comment #51
catchThis should fix the install failure.
Comment #52
catchHere's a test that fails without the patch, and passes with it.
Also a new patch because I hadn't updated properly for the config conversion.
Comment #54
catch.htaccess cruft.
Comment #55
catchRe-rolled after #1015946: Eliminate $user->cache and {session}.cache in favor of $_SESSION['cache_expiration'][$bin].
Comment #56
catchWith the attachment.
Comment #57
Berdir(Mostly) code style review.
Trailing space here.
AFAIK it's should be "it is" as short versions shoudn't be used. At least for interface text, but I guess that applies for comments as well.
No idea why, but every single line here has exactly one trailing space. Weird.
That should probably have a Overrides DatabaseBackend::__construct() docblock.
I guess this needs an update function?
Comment #58
iamEAP CreditAttribution: iamEAP commentedI believe the plan for D8 page cache via WSCCI is to replace it with Symfony Kernel's HttpCache.
Given that work would almost definitely not be back-portable, would it be worth splitting this into separate issues where this one focuses on a "quick" fix for D7 (utilizing work from the existing patch), and we create a separate issue for D8 for page cache re-implementation via Symfony HttpCache?
Comment #59
David_Rothstein CreditAttribution: David_Rothstein commentedYes, the possibility of switching to Symfony's HttpCache definitely sounds like something that should be discussed in a separate issue, rather than here.
But this issue would still need to be fixed in D8 first (before backporting to D7).
Comment #60
catchYes discussing HttpCache is completely out of scope here, this is a straight bugfix for backport and even if the page cache is completely replaced, that wouldn't necessarily touch the code here. There's too much coupling in core still between the page and block cache and the cache API (i.e. the fact that cache_clear_all() hasn't been fully removed yet), but theoretically CACHE_TEMPORARY and minimum lifetime which are fixed here is an independent part of the API from the block and page caches, and I've seen them used as such in custom code before.
Attached re-roll deals with Berdir's review I think, two things not included:
- rather than the 'overrides' docblock for InstallBackend::__construct() I just removed the function since it's the same as the DatabaseBackend anyway.
- there's no update in the 8.x patch, one will need to be added for the D7 backport though.
Comment #61
catchAdding 'contributed project blocker' since memcache is currently having to differentiate between these via debug_backtrace().
Comment #62
olli CreditAttribution: olli commentedWhere did this go?
What is this for?
Trailing space.
Comment #63
catchThanks! Actually all the session stuff is completely unnecessary now, as is the variable_set(), so those are removed from the patch, I'd forgotten how much clean-up this allows. Also fixed the trailing whitespace.
Comment #64
catchHmm ovesslin pointed out by e-mail that we can't remove the $_SESSION stuff here, got over-enthusiastic:
Before patch:
- CACHE_TEMPORARY items are stored until either cron runs or cache_clear_all() is called, as soon as that happens they're deleted.
- if minimum cache lifetime is enabled, then for the current user they're flushed immediately due to a timestamp stored in $_SESSION, for all other users they're flushed when the current timestamp is greater than variable_get('cache_flush_' . $bin); + $cache_lifetime
After patch:
- CACHE TEMPORARY items are stored until cache_clear_all() is called, or for up to 24 hours (older items are cleared out on cron).
- if minimum cache lifetime is enabled, then items older than the minimum cache lifetime are deleted immediately (no variable tracking, just deleted), for the current user, all items are marked as invalid via the $_SESSION variable. Items are remove by garbage collection on cron when they're older than 24 hours or minimum cache lifetime, whichever is the greatest.
This just removes the stray variable_set() and the trailing whitespace per #62.
Comment #65
no_commit_credit CreditAttribution: no_commit_credit commentedComment nitpicks.
Comment #66
olli CreditAttribution: olli commentedStill missing... Tests could check for it.
When minimum cache lifetime is enabled and there's two comments/content updates in $cache_lifetime seconds, then anonymous users
- before: will see both changes in $cache_lifetime seconds
- after: will see both changes in 24 hours
Is this correct?
Should we add a way for modules to figure out in hook_flush_caches() that we're just collecting garbage?
Comment #67
catchNot quite:
Before: will see both changes in $cache_lifetime seconds (kinda, there's some edge cases in the current code)
After: will see changes immediately for any cache item that's older than cache lifetime. Cache items that are younger will 'survive' until they're either garbage collected or if there's a content change once cache lifetime has passed for them.
With hook_flush_caches() do you mean for modules that are doing something other than returning cache bins? While hook_flush_caches() is completely abused for various purposes and calling it during cron is a bug, that's a bit different to this issue, however it's partly dealt with in #1167144: Make cache backends responsible for their own storage and I think a separate issue to add hook_cache_bin_info() to decouple it from hook_flush_caches() would be a good idea.
I'll try to re-roll shortly. We added some tests for cache lifetime recently, so I'm surprised they're all passing with the broken patches, but yeah would be good to add to those a bit.
Comment #68
olli CreditAttribution: olli commentedYes, that's what I mean with hook_flush_caches(): field and features do system_rebuild_module_data() in system_cron() while update and ctools introduce workarounds. Decoupling sounds good.
Comment #69
sunAlso note that we're attempting a major decoupling in #1404198: Separate database cache clearing from static cache clearing and data structure rebuilding - might possibly resolve this idea already, not sure.
Comment #70
olli CreditAttribution: olli commentedIn some cases it might take time to clean the session because page cache is disabled if session is not empty.
For items with expire > 0 caches are still cleared the same way on cron as on content change.
It would be good to add an expired item (expire < REQUEST_TIME) and one that is not expired to the test.
For current user the temporary item should be removed.
To prevent younger items surviving, could we have "the timestamp of the last global flush" just like memcache module?
Comment #71
pillarsdotnet CreditAttribution: pillarsdotnet commentedDries committed #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support on Wed, 13 Jun 2012.
Therefore, according to Catch, this is now a 7.x-only issue.
Comment #72
iamEAP CreditAttribution: iamEAP commentedDoes this warrant another round of discussion now that we don't need to worry about a backport, or is a straight port of the D8 patch in progress sufficient?
Comment #73
iamEAP CreditAttribution: iamEAP commentedComplete shot in the dark. Mostly what's found above sans tests, plus the update function.
Comment #75
catch@iamEAP, I was writing the 8.x patch explicitly for backport to D7, and with the longer term aim of adding cache tags in mind, so apart from the bugs that olli identified personally I think it's fine as a straight port.
Comment #76
iamEAP CreditAttribution: iamEAP commentedSame...
Edit: Good to know. Thanks, catch.
Edit again: missing semi-colon...
Comment #78
iamEAP CreditAttribution: iamEAP commentedAlright. First stab at a more or less direct port of #65.
Additionally...
Comment #80
iamEAP CreditAttribution: iamEAP commentedAdding an explicit check in the update function to replace the index if it actually dropped it. Also, minor comment cleanup in system.module.
Comment #81
Anonymous (not verified) CreditAttribution: Anonymous commentedJust a quick question. Will this patch be implemented in the next Drupal 7 release?
Comment #82
joachim CreditAttribution: joachim commentedIf it's reviewed (see http://drupal.org/patch/review) and a maintainer commits it in time, yes.
Comment #83
iamEAP CreditAttribution: iamEAP commentedThere are some problems with the last patch:
I've also started a contrib module to address this issue: http://www.drupal.org/project/adbc
Comment #84
catchThis update could run during an update from Drupal 7 to Drupal 8 (it's unlikely but at the moment it could), this and any disabled modules may mean that module_invoke_all('flush_caches') doesn't return a complete list.
The cache tags patch already had to deal with this issue for 8.x, and the 7.x code should be the same:
Comment #85
iamEAP CreditAttribution: iamEAP commentedFixed the update function according to catch's comments in #84. Also fixed my remaining comments from #83:
Comment #87
catchClearing on cron every 24 hours is by design, minimum cache lifetime is supposed to be a minimum, not a maximum, and the cache will still be cleared if there's a content change.
Comment #88
iamEAP CreditAttribution: iamEAP commentedI suppose it is configurable. Here's #80 with flood_register_events fix and the new update function.
Comment #89
iamEAP CreditAttribution: iamEAP commentedI'm a little concerned there are some contrib modules that don't use the word "cache" in their cache table names (which I don't believe is explicitly required), but I would imagine that number is small enough not to worry too much.
Comment #90
smitty CreditAttribution: smitty commentedClearing cache every 24 hours is better than art every cron run. But one question of the original issue is not answered yet imho: Why has system_cron() to clear caches at all?
Comment #91
catchsystem_cron() is supposed to only do 'garbage collection' for non-permanent cache items when they've expired. If an item has an explicit TTL then that's OK, the problem is with CACHE_TEMPORARY which has always already expired as soon as it's created (apart from minimum cache lifetime which is disabled by default and not very well understood).
The bug here is that core never differentiated between garbage collection and explicitly clearing temporary items due to content changes, as well as expiring CACHE_TEMPORARY items too frequently during garbage collection (i.e. every cron run).
Comment #92
smitty CreditAttribution: smitty commentedDo I understand this right: With this patch only the temporary cache will be cleared every 24 hours. But because e.g. the page cache is now permanently it will be only cleared if there's a content change.
Comment #93
iamEAP CreditAttribution: iamEAP commentedWhere in D7 does the cache get cleared on content change? I see in D6 there's an empty call to cache_clear_all(), at the end of node_save, but nothing of the sort in D7.
Comment #94
catchIn Drupal 7 those cache clears were moved to submit handlers, mainly so that things like mass imports of tens of thousands of nodes can handle cache clearing themselves at the end instead of once per node.
Comment #95
Fidelix CreditAttribution: Fidelix commentedWorks for me, although it did not apply cleanly.
Comment #96
iamEAP CreditAttribution: iamEAP commented#88: d7-cache_garbage_collection-891600-88.patch queued for re-testing.
Comment #97
catchWe worked around this in memcache by running a debug_backtrace() to determine what fired the cache clear. That's considerably less than ideal, but given the workaround exists I'm going to demote this from major.
Comment #98
mgifford#88: d7-cache_garbage_collection-891600-88.patch queued for re-testing.
Comment #99
jackbravo CreditAttribution: jackbravo commentedThis is a re-roll of #88. Work great for me.
Comment #100
jackbravo CreditAttribution: jackbravo commentedSo this works fine for Fidelix, iamEAP and myself. Do we need more people to test it before marking as RTBC?
Comment #101
jackbravo CreditAttribution: jackbravo commented#99: d7-cache_garbage_collection-891600-99.patch queued for re-testing.
Comment #102
jackbravo CreditAttribution: jackbravo commentedmarking as RTBC.
Comment #103
lotyrin CreditAttribution: lotyrin commentedWhy 24 hours? What do I do If I want to have a longer minimum cache lifetime than that - not that I see the use case in doing so, but what justifies adding a constant here?
Comment #104
lotyrin CreditAttribution: lotyrin commentedNevermind. I read the logic backward. Makes significantly more sense now.
Comment #105
David_Rothstein CreditAttribution: David_Rothstein commentedOverall this seems to take Drupal 7 caching into a much nicer direction, but I saw some issues:
I don't understand why this is being removed (and if it is being removed, why the other code that uses $_SESSION['cache_expiration'] isn't).
If you remove it, don't you run into the exact problem that it was designed to solve? (Example: Configure the site to cache pages and to have a minimum cache lifetime of 10 minutes, then give anonymous users permission to edit content. After the patch, an anonymous user who edits content won't see their own edits fully take effect right away; instead they'll have to wait 10 minutes.)
Why is this being removed from getMultiple() but left behind in isEmpty()? Doesn't that mean cache_get() and cache_is_empty() can return inconsistent results?
Partially related to (or perhaps dependent on) the answer to the above question... but aren't we worried about these meltdowns? After the patch, garbage collection will do a db_delete() twice every single time it is called, and it can easily be called on normal page requests.
After the patch, it can disappear after 5 minutes if a cache clear happens then.
That may very well be the correct behavior, but it could use some discussion. (For example, I wonder if it could cause issues on sites that may be relying on long minimum cache times for everything.)
Also a few other issues that are more minor:
I'm all in favor of checking 'cid' and 'data', but why aren't we checking 'expire' and 'created' too (since those are the actual columns that the code below relies on being there)? Probabaly not too likely to matter in practice, but seems like an obvious omission nonetheless.
Why are we switching to the private _cache_get_object() function all the sudden? Doesn't cache_clear_all(NULL, $table) still work?
Similar question here (and actually throughout the test): Why is this using _cache_get_object() rather than the standard API?
Comment #106
caktux CreditAttribution: caktux commentedIs this still an issue in D8?
Comment #107
catchNo this has been fixed in 8.x. Via #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support which isn't backportable.
Comment #107.0
catchUpdated issue summary.
Comment #108
deanflory CreditAttribution: deanflory as a volunteer commentedSo, what is the current recommended approach for D7?
Comment #109
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedUse some Contrib modules that expire way less:
e.g.
- advanced page cache
- boost
- blockcache_alter (for blocks)
- render_cache (for full nodes)
Then on a cache miss you at least don't have to rebuild everything.
--
However this patch could still have a chance to go in overall.
Comment #110
brad.bulger CreditAttribution: brad.bulger commentedreroll of #99 against current 7.x (as of 7.43)
Comment #111
SocialNicheGuru CreditAttribution: SocialNicheGuru commentedNeeds to be rerolled for the latest stable drupal version, 7.54, because of a conflict with update 7081.
Comment #112
Munavijayalakshmi CreditAttribution: Munavijayalakshmi at Valuebound commentedComment #113
Munavijayalakshmi CreditAttribution: Munavijayalakshmi at Valuebound commentedComment #114
izmeez CreditAttribution: izmeez commentedPatch in #113 needs to be re-rolled for the latest stable drupal version because of a conflict with system_update_7082().
Meanwhile it is still not clear what is the best approach for D7. Comment #107 states that the fix in 8.x is cannot be back-ported and comment #109 suggests using contrib modules that "expire less".
Comment #115
izmeez CreditAttribution: izmeez commentedAttached is the patch in #113 re-rolled against the latest drupal 7.68 we have been using the patch in its different rolls for sometime now. It seems better than the default.
Comment #116
delacosta456 CreditAttribution: delacosta456 commentedhi
@izmeez patch is working for me on config below:
--Drupal 7.69
--Install--profile--Panopoly (panopoly-7.x-1.72)
-- php 7.1.31
Thanks
Comment #117
brad.bulger CreditAttribution: brad.bulger commentedreroll for 7.70 to avoid conflict with system_update_7083() - we'll just keep doing this, i guess
Comment #118
izmeez CreditAttribution: izmeez commentedRe-roll for drupal 7.71 to avoid conflict with system_update_7084()
Comment #119
brad.bulger CreditAttribution: brad.bulger commentedYet another reroll to bump the fake update hook number for 7.95.
Comment #120
brad.bulger CreditAttribution: brad.bulger commentedIs continually bumping up this fake update_N hook in the system module preventing the real hooks from running? Like, d7-cache_garbage_collection-891600-119.patch defined system_update_7086(). Now core Drupal 7.98 has a real system_update_7086() function. But won't my installation think that it has already run?
Comment #121
poker10 CreditAttribution: poker10 at ActivIT s.r.o. commentedYes, that is a good point, thanks!
In case someone has run the update hook from this patch, it is likely that the real
system_update_7086
introduced by #2164025: Improve security of session ID against DB exposure or SQL injection in Drupal 7.98 will not run. This will cause that these sites would not use the security improvement, because the session hashing will not be enabled. See: https://www.drupal.org/node/3364841I recommend that all sites using this patch should carefully check that change record and a committed patch and take measures to ensure that they have the session hashing enabled.
Comment #122
brad.bulger CreditAttribution: brad.bulger commentedI've never been quite clear about the system_update_N function in this patch. It tries to find all existing cache tables and replaces any existing "expire" index with an "expire_created" index. Running in an update function like this, that's a one-time check, right? If new cache tables are created with an "expire" index, this would do nothing about that. I don't know if that would happen. But if a one-time check will suffice, then it seems like instead of interfering with the real update hooks of the system module, that check should move somewhere else.
Comment #123
poker10 CreditAttribution: poker10 at ActivIT s.r.o. commentedThere are two changes in the
system.install
file. Change insystem_schema
:And the new update function
system_update_7086()
.These two changes cover two usecases - if the site already has cache tables, then the update hook will update the indexes there. On installation (or creation of a new cache_xx table) the
system_schema()
comes to play and it will ensure that all new tables would haveexpire_created
index. This is because all other cache tables are using the main cache table definition. See:So both changes in the install file are needed - one for new installations (and new cache tables) and the update hook for existing tables.
Comment #124
brad.bulger CreditAttribution: brad.bulger commentedOK, so if future cache table use (or are supposed to use) the basic cache table schema, then there's no need to redo any of these changes for future updates. Is that right?
It seems like putting the update hook into a new module would be a better way to handle that. There are some alternate caching modules mentioned above, it's over my head to evaluate them though.
Comment #125
poker10 CreditAttribution: poker10 at ActivIT s.r.o. commentedYes, that should be correct. In case the full patch was/is still applied, all existing cache tables should have been updated in the past and new are created by the
system_schema()
code. So there is no need to run the update hook again. Sites just needs to ensure that the newsystem_update_7086()
from D7.98 have run.Comment #126
brad.bulger CreditAttribution: brad.bulger commentedI've moved the code to replace 'expire' indexes with 'expire_create' from the fake hook_update_N function into
system_requirements
(). In the runtime phase (eg when you load the Status report on your site), it will report a warning that the update script needs to be run if it finds at least one 'expire' index on a cache table. In the update or install phases, if it finds an 'expire' index on a cache table, it will drop it and then create an 'expire_create' index if one does not already exist, reporting all of these actions to the user. So running drush updatedb or the update.php page should clean up the indexes even if there are no update hooks needing to be run.I reset the system module schema number back to before the first hook from this patch that we applied to our site - 7074 - and reran all of the real system module hooks since then, without much incident. YMMV