Core does a cache_clear_all() on every node and comment submit. If you have a lot of nodes and comments being posted (say thousands/day), that's thousands of variable_set() per day too.

Seems like the best option for this would be to store it in memcache itself (similar to the wildcards logic) - a single entry for all cache bins, fetch it once each request into a static, update it on set. That's an extra memcache request every time compared to putting it in a variable, but it might help.

If we add cache lifetime per bin, could also only do the variable_set() every cache_lifetime / 4 (or some other fairly high fraction) - so if one hour, you could only ever trigger a flush every 15 minutes. That'd mean no new cost on every request, and would probably reduce the variable_set() to a reasonable level.

Comments

catch’s picture

Read through core cache.inc issue again and it's less of an issue there, although that has its own problems separate from this.

catch’s picture

Assigned: Unassigned » catch
Status: Active » Needs review
StatusFileSize
new3.27 KB

Here's a Drupal 7 patch. Will do some testing of this, then backport to D6.

catch’s picture

Title: Avoid variable_set() on bin flushes » Avoid variable_set() on bin and wildcard flushes
StatusFileSize
new3.29 KB

Updated patch, need to set this->memcache before doing anything else.

catch’s picture

StatusFileSize
new5.21 KB

Alright while working on the D6 patch I got so annoyed by having to copy the same code around I ended up making memcache_variable_set() and memcache_variable_get() functions to simplify it a bit.

These would be used only for the following:

* the variable is volatile (i.e. not set via configuration) and liable to stampedes.
* it doesn't matter if the value gets completely blown away in the case of a full memcache restart.
* you can live with an extra dmemcache_get() for each variable when using memcache_variable_get() (I made cache_flush into an array for this reason so we don't get more and more dmemcache_get() the more bins there are).

Will reroll the D7 patch using this since that could be useful there too.

catch’s picture

StatusFileSize
new4.11 KB

D7 patch with memcache_variable_set() and memcache_variable_get().

Both of these latest patches are untested but I'm out of time for the next 3-4 hours.

catch’s picture

StatusFileSize
new6.49 KB

I've now profiled and load tested the D6 patch, and it passes the wildcard clear test.php script jandrews was using on other issues.

New patch with various bugs found via that.

catch’s picture

StatusFileSize
new4.14 KB

Same bug fixes for D7 - still need to test memcache-test.php with the D7 version though.

catch’s picture

Release early, release often...

catch’s picture

StatusFileSize
new6.45 KB

arggh and the d6 patch was the wrong file. Promise this is it for tonight.

jeremy’s picture

> * it doesn't matter if the value gets completely blown away in
> the case of a full memcache restart.

What about a partial memcache restart?

Let's say you have 10 memcache servers. The 'memcache_wildcard_flushes' variable hashes to server 3. Server 3 restarts or otherwise fails, and you lose the 'memcache_wildcard_flushes' variable. However, you've still got cached data on the other 9 servers. In this case, wouldn't you end up with potentially stale cached data being served?

catch’s picture

Yes in that case you could end up with stale data.

One possibility with this at least for the wildcard_flushes variable would be to immediately set this when it's empty with a key containing a timestamp like $flushes['initialized'] = time();, then check that against the created timestamp of cache entries as well as specific wildcard flushes. This would then mean that all current cache entries would be invalidated in the case that this variable gets 'lost'.

jeremy’s picture

Status: Needs review » Needs work

I think it makes sense to initialize the variable with the current time. I don't see any other way to guarantee that we're serving only valid cache entries.

This will cause us to make 1 memcache set per cache bin on the first page load, but I don't see a problem with that.

The remaining problem is that restarting 1 cache server would now be capable of causing all N of your cache servers to be invalidated. This is very much undesirable, and could lead to unplanned website outages. For example, it would no longer be possible to do a rolling upgrade of all your memcache servers, spacing out the restarts to minimize the impact on your caches. One possible solution is to store the variable multiple times, allowing it to hash to multiple memcache servers. All the versions of the variable could be retrieved together with a multiget. If one goes missing (ie a server is restarted), it could be restored from one of the other copies. (With the memcached driver we can specify which servers we store keys to, but not with the memcache driver.) All versions of the variable would also have to be updated together.

catch’s picture

Status: Needs work » Closed (won't fix)

I think we should abandon this and try to improve variable_set() instead, #987768: [PP-1] Optimize variable caching is my current focus for this.