Memcache and cache_clear_all wildcard

Yoran - June 16, 2009 - 19:45
Project:Memcache API and Integration
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Yoran
Status:active
Description

I had for my last contract to make cache_clear_all working with wildcard and memcache. As you know, main problem of memcache is that you can't retrieve a set of keys by some kind of query. You can retrieve with some work all keys you need using stats command but its a pretty long process.

This is a big problem when you make an high performance site with drupal where all users are authenticated and when you have to manage caches by roles (ex. keys like "block_12_role1_role2"). In this situation, when it comes to update the cache, you have to remove all keys like "block_12*". With original implementation of memcache, you lost all your caches if you don't know exactly the name of the key you need to remove.

To solve this, the approach I adopted is to maintain a key index in memcache itself.
- each time a key is set, index for associated bin is updated and marked 'dirty'
- each time a key is remove, associated index is removed and marked 'dirty'
- each time a set of key should be removed with a wildcard (ex. cache_clear_all("user_11","cache",true)), the according index is loaded from memcache and keys are removed.

In order for this to work I modified existing methods of dmemcache.inc (from original memcache module) and added more :
- dmemcache_lock / dmemcache_unlock, to use memcache as a semaphore. The issue was to prevent dirty caches to be wrote if someone else was also writing it.
- dmemcache_maintain_ley_list, the central method for maintaining indexes with $operations like get/put/delete/save.

When the time come to shut-down the page (I use PHP registered shutdown function for this , all dirty indexes are stored to memcache. This can take all the time we need as data are already sent to web client (same concept as poormancron module).

Now you tell me if this can be of any interest for this project. I attached both memcache.inc and dmemcache.inc. There is some logging function in this code, I can make this more clean if this can be usefull.

AttachmentSize
memcache_with_indexes.tar_.gz4.15 KB

#1

robertDouglass - June 18, 2009 - 08:56

It's definitely of interest, and I'm glad you've solved this problem. The other solution which I've been considering is to never clear caches to begin with, but to version them. I'd store the version in the db as this would be a small query, and the version of the cache would become part of the key. When a cache is to be cleared the version is updated, thus all caches of the previous version would not be found, and they'd be regenerated with a new key.

#2

mwillis - September 15, 2009 - 06:59

Robert,
Save the version in memcache instead as some times memcache is used to reduce the total number of queries to the database in addition to reducing the number of complex queries.

Also if there were some way to unify how everyone made their key names in various modules such that the memcache module could systematically separate them and inject a key (ala Alex Rickabaugh's comment on the blog http://www.aminus.org/blogs/index.php/2007/12/30/memcached_set_invalidation ), a similar approach could be done behind the scenes in dmemcache.inc so that the memcache module was one step closer to being a drop in replacement with full functionality. Maybe that becomes part of the standard install doc for the memcache module "you must ensure your cache keys match this syntax" (or we clobber together patches for popular contrib modules). It took me a bit to grasp Alex's concept, but here's a lame attempt at a Drupal like example (I know $user->mail is available so bare with me):

$cached_key = cache_get('email_');
if ($cached_key->data) {
  $key = $cached_key->data;
} else {
  $key = md5(rand());
  cache_set('email_', 'cache', $key, CACHE_PERMANENT);
}
$cached = cache_get('email_' . $key . '_' . $user->uid, 'cache');
if ($cached->data) {
   $email = $cached->data;
} else {
  $email = db_query('SELECT mail FROM {users} WHERE uid = %d', $user->uid);
  cache_set('email_' . $key . '_' . $user->uid, 'cache', $email, CACHE_PERMANENT);
}
return $email;

And somewhere else, if we needed to invalidate the data sets we would, instead of doing cache_clear_all('email_','cache',TRUE); , do:

$key = md5(rand());
cache_set('email_', 'cache', $key, CACHE_PERMANENT);

Then like you said, with a new $key value, all of the old cache values are inaccessible and will be garbage collected eventually.

#3

robertDouglass - September 15, 2009 - 07:29
Version:6.x-1.2» 6.x-1.x-dev

Great comments mwillis. When we start coding with this (or similar) approach, we'll open up a 6.x-2.x branch of the module. I hope the time is coming soon.

 
 

Drupal is a registered trademark of Dries Buytaert.