Closed (fixed)
Project:
Redis
Version:
7.x-2.3
Component:
Code
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
26 Apr 2013 at 06:52 UTC
Updated:
22 May 2014 at 22:07 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
Island Usurper commentedI feel like a better solution is to let Redis itself expire things. Just because the Drupal Database Cache does things a certain way doesn't mean it's the right way for another cache backend.
This patch gives CACHE_TEMPORARY items a somewhat arbitrary TTL. It's based off the cache_lifetime variable, but it is at least a minute. We don't want Redis to expire things as soon as we set them, after all. After that, we should only get many keys if the $wildcard parameter is TRUE. If you give cache_clear_all() a $cid of '*' without setting $wildcard, that's an error.
Comment #2
pounardHow about this patch?
Comment #3
greggles"If caller is stupid a this operations gives"
There are two typos: an extra "a" after stupid and operations should be a singular operation. Saying more explicitly what is stupid would be more helpful than just being vaguely judgmental. How about
"If caller sends an expiry time before now this operation gives"
Comment #4
pounardSo you don't like a bit of humor in comments :) Thanks for revealing the typos, will fix it tomorow morning (it's almost night here).
Comment #5
pounardTagging
Comment #6
pounardI fixed the typo as greggles suggested. Commiting the fix within the day, thanks everyone.
Comment #7
pounardComment #8
msonnabaum commentedOk, so this is really bad.
That means if you have cache_lifetime at 0, which is a recommended setting, the page cache essentially doesn't work, because it expires itself immediately. You may as well not do the cache set at all at that point.
Comment #9
msonnabaum commentedThis issue is a bit confusing, but if I'm understanding it right, we're just trying to make redis handle CACHE_TEMPORARY properly. However, I don't get why CACHE_TEMPORARY items are ever being expired, they should only be cleared on cache_clear_all().
Attached patch is how I would do this, by simply keeping an additional list of our CACHE_TEMPORARY items, and then clearing them all on a cache_clear_all().
Comment #10
msonnabaum commentedAlso, that patch is 100% untested, just sketching out the idea.
It would also be good to set a far future expire on those items so that they'd get cleaned up when redis' LRU kicks in.
Comment #11
pounardYes it is untested, and yes it is really bad, sorry for this. Fixing it right now, thanks.
Comment #12
pounardShould fix it, what's your opinion about this?
Comment #13
pounardHere is a patch that I'm going to commit right away. It fixes both Predis and PhpRedis backend, added some documentation in code and complete unit tests for this issue and expiry checks. Thank you very much once again, that's a huge bug you found!
Comment #14
pounardNew 7.x-2.1 release on its way, thank you again.
Comment #15
msonnabaum commentedThat seems much better, thanks for fixing that so fast.
What do you think of my idea on just keeping a list of cache temporary items so the behavior can better match existing cache backends? With the current patch that went in, won't they never be expired, even on a cache_clear_all()?
Comment #16
pounardNot sure this would be a good idea to keep lists of cache items: I would be afraid to uselessly raise CPU and RAM usage for cache handling which would IMO defy the purpose of it. But if you have a concrete scenario or real life measures that prooves me wrong I'd be happy to discuss.
Nevertheless, that seems to be quite a good workaround for wiping them out, I'd be a good thing to experiment.
I didn't see that side of the fix but I think you are right: with the current patch temporary items with a cache lifetime set to 0 are never been cleared when cron kicks in.
Looking at the actual database backend code confirms that items are dropped at cache_clear_all() time when they are temporary which does not happen into the Redis module. I know that this creates a different behavior between core and this module. Now pragmatically this happens during every hook_cron() run: we know that on a lot of hardcore badass sites, we need cron to run up to every 5 minutes: this means that all temporary caches (and page and block) will be dropped every 5 minutes with original algorithm: this is *very* bad for performances.
I'm thinking that I'd like to keep Redis not flushing them out arbitrarily and keep them for their lifetime or until a flush all caches call happen (or when the business code explicitely wipe them out). At least keeping that behavior configurable in settings.php.
What do you think?
Anyway, all of this needs a new issue, I will try to manage some time the next week to do that.
Comment #17
omega8cc commentedThis totally breaks proper invalidation for the
cache_pagebin in D6 (we use Pressflow), and causes stale content displayed even if the node has been edited.Here is our default configuration: http://drupalcode.org/project/octopus.git/blob/HEAD:/aegir/conf/global.i... which worked prior to this issue/commits.
And the fork with both last commits reverted to get it back to work (as before) properly: https://github.com/omega8cc/redis/commits/7.x-2.x-o8
I will try to debug this and provide a patch, so just a heads up for now, that the current stable is broken for D6 sites.
Comment #18
pounardD6 only?
Comment #19
omega8cc commentedNo, it affects both D6 and D7. Just tested this again.
Comment #20
omega8cc commentedWhile debugging this I have found something else related to D6 only, which is totally odd to me, and may affect caching in general.
Note the line:
This is the only place in D6 core where such variable is used, so it is most probably a mistake/typo in the core which can do some really nasty things to the caching logic on its own. It should be
cache_lifetimeas everywhere else, no?Comment #21
pounardNot sure, Drupal 6 maintain a separate cache lifetime for pages if I remember correctly. I'm going to try to fix the Redis module right now.
Comment #22
omega8cc commentedI mean, it may indirectly affect also Redis integration and is an obvious core bug to me, because the
page_cache_lifetimestring literally exists in this one line only, so it is not used/effective. I'm going to open a bug report in core for this.Comment #23
pounardI want to slap so hard those who wrote the Drupal 7 cache backend interface... Seriously WTF most of the behaviors hardcoded in the DatabaseCacheBackend never have been documented at the interface level nor implemented at a higher level: this is the real problem.
Technically the Redis module cannot do that in the current state: either we force a full clear() (including permanent items) or we don't drop anything, there is no compromise easily possible except by iterating over the full bin and manually expire keys that are non permanent when cache_lifetime is set to 0.
But Redis has evolved so quickly, there is probably a way to do a better select operation in hashes.
Comment #24
omega8cc commentedSide note: I have opened a separate issue for this
page_cache_lifetimething in D6: #2130865: There is no such variable like page_cache_lifetimeComment #25
pounardGood thing you did.
I'm blocked with this very stupid cache usage here. There are a couple of "easy" solutions:
What do you think about those?
Comment #26
omega8cc commentedSince relying on cron is not enough here, because cache_page bin has to be properly managed on various events, like node edit etc, and adding complexity, with possible performance degradation (thanks to checks multiplied) may introduce more problems and regressions than we are trying to solve, I would prefer to keep it simple and just purge the bin when needed. Speed is not enough when people see stale content, checkout doesn't work and orders are lost etc. We need some balance between aggressive enough cache clears and cache backend consistency and reliability.
Comment #27
pounardI will add the SET temporary item cid storage in order to achieve ISO functionality with the database backend meanwhile I'll leave this configurable with two additional behavior: replace the temporary flush by a full flush (legacy behavior of this module pre 1.0 release) and no temporary flush (for people that want aggressive caching). If I'm not too lazy I'd make this configurable per bin and put sensible defaults where only page and block are wiped out accordingly to spec.
But I probably won't be able to do it before next week.
Thanks for your time, it's highly appreciated, if you have any other comment or observation, please do! Thanks again.
Comment #28
omega8cc commentedWow, that sounds exciting! Looking forward to see and test patches. Thank you for your continued support.
Comment #29
pounardOk, I fixed it! I think.
I implemented what I explained upper: cache backends have three behaviors and put sensible defaults depending on the cache bin. I also added massive unit tests about this specific issue, and fixed the lock backend unit tests at the same time (you can now run full core unit tests using this module as well as run both PhpRedis and Redis unit tests in the same PHP runtime, which is great).
See this commit #45dad1f.
And for a detailed explaination and avanced usage read this (link towards the README.txt file in the git repo).
Please review and test it and report any bugs. Any suggestions or comments about this patch are welcome.
Comment #30
pounardThis patch also fixes PhpRedis specific problems due to behavior changes in the PHP extension as well as fixing a few typo errors, minor optimizations, and moreover test it all!
Comment #31
pounardI pushed two releases do not use the 7.x-2.2 which ships with a potential fatal error when using PhpRedis. 7.x-2.3 fixes that and embed all the previous fixes.
Comment #32
omega8cc commentedWow, just wow! It works! I have tested various non-default modes to see if I can break something, but the defaults just work! Marvelous! We just released BOA-2.1.1 with previous version included, but I will now add 2.3 as a hot fix to stay up to date. Thank you!
Comment #33
pounardBe careful with lock implementation, it's still experimental and we experience random deadlocks sometime. I should have documented this. I you ever reproduce such locks please provide feedback as soon as possible, I didn't manage to reproduce it on a development box.
Thanks you very much for your very quick feedback.
Comment #34
omega8cc commentedWe have tested Redis lock both with D7 and D6 sites for a long time already, and there were literally zero issues, so we are enabling it by default now, which may help to catch edge case issues, if any will appear, so I will report them for sure. Thanks again!
Comment #35
pounardYou're welcome. Gald to see this is all working fine for you, having feedback is always a pleasure.
Comment #36
omega8cc commentedHave to reopen this, since we have tested this on D7 only (ouch!), while D6 still doesn't work and emits warnings:
Comment #37
pounardOh my bad, I didn't tested with cache_backport. I will fix that, thanks for reporting.
Comment #38
omega8cc commentedMaybe it is because in D6 empty
variable_get()doesn't default to NULL, like it is in D7, so the check for NULL never works here?Comment #39
omega8cc commentedI mean the check here:
Comment #40
omega8cc commentedThis commit fixes at least these warnings, but I will test it further to make sure everything else works as expected also in D6. Patch for review attached.
Comment #41
omega8cc commentedStatus update.
Comment #42
omega8cc commentedJust realized that proposed fix to avoid warnings in D6 will change the logic, since it will no longer be able to check for D7 specific NULL and simply default to
$mode = 0;which is probably not exactly what it is supposed to do, since it will never reach defaults defined in theelsepart.Comment #43
omega8cc commentedAttached patch should do the trick.
Comment #44
omega8cc commentedYet another patch for
variable_get()to avoid silent fails on D6 sites.Comment #45
pounardPass null if the call expects null where there is no value set! Easy :) Thanks for the patches I'll review this tomorow morining.
Comment #46
omega8cc commentedBy the way, re: "Be careful with lock implementation, it's still experimental and we experience random deadlocks" -- maybe this happens in multisite, since lock implementation doesn't seem to be multisite-aware? At least, it doesn't use any prefix as cache bins do.
Comment #47
pounardRight nice catch, this worth open an issue for adding prefixing to lock backend; Opened #2134001: Add missing key prefix usage to lock as well.
Comment #48
pounardGit commit #ad6b76c should fix #42, #43 and #44. I did not apply directly your patches because the logic was indeed a bit broken 'cache_lifetime' has a default value which is supposedly "0" in core: so I added some class constants to fix that. Once again I have to thank you. I am releasing a new version (7.x-2.4) that should shows up soon on project page. Feedback will be appreciated as always.