I had for my last contract to make cache_clear_all working with wildcard and memcache. As you know, main problem of memcache is that you can't retrieve a set of keys by some kind of query. You can retrieve with some work all keys you need using stats command but its a pretty long process.

This is a big problem when you make an high performance site with drupal where all users are authenticated and when you have to manage caches by roles (ex. keys like "block_12_role1_role2"). In this situation, when it comes to update the cache, you have to remove all keys like "block_12*". With original implementation of memcache, you lost all your caches if you don't know exactly the name of the key you need to remove.

To solve this, the approach I adopted is to maintain a key index in memcache itself.
- each time a key is set, index for associated bin is updated and marked 'dirty'
- each time a key is remove, associated index is removed and marked 'dirty'
- each time a set of key should be removed with a wildcard (ex. cache_clear_all("user_11","cache",true)), the according index is loaded from memcache and keys are removed.

In order for this to work I modified existing methods of dmemcache.inc (from original memcache module) and added more :
- dmemcache_lock / dmemcache_unlock, to use memcache as a semaphore. The issue was to prevent dirty caches to be wrote if someone else was also writing it.
- dmemcache_maintain_ley_list, the central method for maintaining indexes with $operations like get/put/delete/save.

When the time come to shut-down the page (I use PHP registered shutdown function for this , all dirty indexes are stored to memcache. This can take all the time we need as data are already sent to web client (same concept as poormancron module).

Now you tell me if this can be of any interest for this project. I attached both memcache.inc and dmemcache.inc. There is some logging function in this code, I can make this more clean if this can be usefull.

Comments

robertdouglass’s picture

It's definitely of interest, and I'm glad you've solved this problem. The other solution which I've been considering is to never clear caches to begin with, but to version them. I'd store the version in the db as this would be a small query, and the version of the cache would become part of the key. When a cache is to be cleared the version is updated, thus all caches of the previous version would not be found, and they'd be regenerated with a new key.

mwillis’s picture

Robert,
Save the version in memcache instead as some times memcache is used to reduce the total number of queries to the database in addition to reducing the number of complex queries.

Also if there were some way to unify how everyone made their key names in various modules such that the memcache module could systematically separate them and inject a key (ala Alex Rickabaugh's comment on the blog http://www.aminus.org/blogs/index.php/2007/12/30/memcached_set_invalidation ), a similar approach could be done behind the scenes in dmemcache.inc so that the memcache module was one step closer to being a drop in replacement with full functionality. Maybe that becomes part of the standard install doc for the memcache module "you must ensure your cache keys match this syntax" (or we clobber together patches for popular contrib modules). It took me a bit to grasp Alex's concept, but here's a lame attempt at a Drupal like example (I know $user->mail is available so bare with me):

$cached_key = cache_get('email_');
if ($cached_key->data) {
  $key = $cached_key->data;
} else {
  $key = md5(rand());
  cache_set('email_', 'cache', $key, CACHE_PERMANENT);
}
$cached = cache_get('email_' . $key . '_' . $user->uid, 'cache');
if ($cached->data) {
   $email = $cached->data;
} else {
  $email = db_query('SELECT mail FROM {users} WHERE uid = %d', $user->uid);
  cache_set('email_' . $key . '_' . $user->uid, 'cache', $email, CACHE_PERMANENT);
}
return $email;

And somewhere else, if we needed to invalidate the data sets we would, instead of doing cache_clear_all('email_','cache',TRUE); , do:

$key = md5(rand());
cache_set('email_', 'cache', $key, CACHE_PERMANENT);

Then like you said, with a new $key value, all of the old cache values are inaccessible and will be garbage collected eventually.

robertdouglass’s picture

Version: 6.x-1.2 » 6.x-1.x-dev

Great comments mwillis. When we start coding with this (or similar) approach, we'll open up a 6.x-2.x branch of the module. I hope the time is coming soon.

drewish’s picture

subscribing

crea’s picture

I'm in the process of making Namespice module which will bring concept of tags to cache entries. Each tag is actually cache namespace emulation, as recommended by Memcache FAQ: each tag will be appended to cache key with it's serial number; Robert mentioned it above as "cache versioning". So it will be possible to flush entries using combination of tags. Unfortunatly, because combination makes key unique, it won't be possible to cache_set() with one combination of tags, and cache_get() with another. Don't know if this is a problem. Most important, partial wildcard flushing will be possible for any cache backend with the only limitation that a module needs to be changed to use Namespice wrapper functions to work with tags.

I already implemented same approach in my Advanced Panels Cache module, you may check it out. I'm just in the process of refactoring Advanced Panels Cache so Advanced Panels Cache will use Namespice for tags and serials management. I decided to make Namespice separate module and not contribute to Memcache API project, because other cache backends could benefit from it too (yes, that's when you realize how pity is that we don't have established central cache management such as Cacherouter).
In addition to serial management function Namespice will include wrappers for core cache functions (cache_get, *_set, *_clear_all) that will be aware of tags and will use them.

david strauss’s picture

Jeremy just mentioned that this issue was in the queue. I produced my own implementation, which should be far, far faster than the one suggested here or any of the options at http://www.aminus.org/blogs/index.php/2007/12/30/memcached_set_invalidation.

https://code.launchpad.net/~davidstrauss/pressflow/memcache-6.x-patched/...

jeremy’s picture

StatusFileSize
new2.43 KB
new6.03 KB

I have committed the attached patch, which is what David links to above with only one minor change, switching from sleep(1) to usleep(1000) when grabbing a semaphore and some trivial white space cleanup.

With this patch applied, memcache's internal counters are going to show a cache hit even when we're actually getting a cache miss. Logic could be added to flush an item as soon as we notice that it's invalid which would minimize this effect.

The mc1.php script is a simple test suite, also provided by David.

If you enable the minimum cache lifetime, then wildcard flushes will flush the entire bin -- I am leaving this issue open until this is fixed and the minimum cache lifetime is updated to use this updated cache flushing logic.

We also need to update the documentation. Bins are now only needed for contents that are too big and when you don't want the contents of one "bin" to push out other contents from memory.

The commit message:
http://drupal.org/cvs?commit=310354

jeremy’s picture

StatusFileSize
new2.47 KB
new6.11 KB

Attached is a port of this patch to 7.x-1.x-dev.

crea’s picture

Will that "prefix directory" scale to millions of cache entries ?
Also, getting all bins together seems like a bad idea because of different cache sizes, flush strategies and frequencies.
Good read: http://blog.evanweaver.com/articles/2009/04/20/peeping-into-memcached/

david strauss’s picture

@crea Yes.

crea’s picture

I worry that you load it basically on each page request. Getting prefix directory containing millons of prefixes together can consume lot of ram and can be slow.

crea’s picture

What happens if prefix counter is evicted ?

david strauss’s picture

I worry that you load it basically on each page request. Getting prefix directory containing millons of prefixes together can consume lot of ram and can be slow.

The premise "Getting [a] prefix directory containing millons of prefixes together" is nearly impossible. Check out the code: the prefix directory at most gets one addition per wildcard call to cache_clear_all(). You'd have to do millions of unique wildcard clears on the cache to get to that point: the same sort of clearing that would flush the entire cache bin before my changes.

Now, I can understand setting a threshold to the prefix directory size where, at a certain point, the whole bin gets flushed. But conditions where that would be helpful are rare.

What happens if prefix counter is evicted ?

A prefix counter is accessed at least once per request that accesses any item matching the prefix, so prefix counters are unlikely to be evicted unless the matching items are also unused. We ought to handle the worst possible case (prefix evicted but not the cache item itself) by adding a check for $memcached_counters[$table][$prefix] === FALSE and treating any item as invalidated if the check returns TRUE. That's a minor change. (And thanks for raising the concern.)

crea’s picture

Well, if you have lots of nodes and use node caching, that will exactly bring you to the huge directory. Please correct me if I'm wrong, but depending on prefix length ( dunno if it's 1 byte per character or 2) directory can grow much faster. Taking simple numbers: assuming 1byte per char X average 10chars per prefix X 100000 nodes = 1 MB per each php process. Take bigger numbers, e.g. more nodes and longer prefix length and then there's trouble.
In short, we can't base this feature on some assumptions about prefix length or number of items. Was it Bill Gates who once said "640kb will be enough for everyone" ? I can't remember.

Regarding prefix counters: in Namespice module I came to conclusion that safest way to store counters (I call them serials) is to use separate bin for them. With shared bin you can't control evictions. If eviction of counter happened while invalidated cache is still in Memcached, and you reinitialize the counter, counter collision can happen with previously used but already invalid number. Probability of this is small, but still not zero.
When storing counter separately you need to flush related content cache bins together with counter bin (to prevent collissions too), but that's different problem.

In the case above applies to this patch, the check for $memcached_counters[$table][$prefix] === FALSE leading to "cache miss" is not enough, because invalid cache entry is still in Memcached and can be returned in the case of collision.
I haven't looked deeply at this patch, just wanted to raise a concern about counter problems.

crea’s picture

@David Strauss: I don't see the check you mentioned ( $memcached_counters[$table][$prefix] === FALSE). Was it planned feature, not yet implemented or am I missing something ? I'm looking here: http://drupalcode.org/viewvc/drupal/contributions/modules/memcache/memca...

Difference between Namespice and this patch is here you store counter (serial) right inside it's cache entry, which simplifies things. It allows to flush entry directly in cache_get() so there will be no counter collisions, because invalid entries with missing counters will be removed on sight.
That's actually very good idea, for some reason I never thought about this (shame on me!). In Namespice I only use serials in cache id only, that means in a case of missing serial e.g. due to evictions, I can't flush (read: delete) cache entries directly because I can't construct proper cache id.

david strauss’s picture

Well, if you have lots of nodes and use node caching, that will exactly bring you to the huge directory. Please correct me if I'm wrong, but depending on prefix length ( dunno if it's 1 byte per character or 2) directory can grow much faster. Taking simple numbers: assuming 1byte per char X average 10chars per prefix X 100000 nodes = 1 MB per each php process. Take bigger numbers, e.g. more nodes and longer prefix length and then there's trouble.
In short, we can't base this feature on some assumptions about prefix length or number of items. Was it Bill Gates who once said "640kb will be enough for everyone" ? I can't remember.

I'm not sure what you're talking about. There are no assumptions about prefix length or number of items. With only one exception (unparameterized calls to cache_clear_all()), directory entries only get created by calls to cache_clear_all([something], [something], TRUE).

Here's a complete listing of cache_clear_all([something], [something], TRUE) calls in Drupal 6:

./includes/common.inc:    cache_clear_all('*', $table, TRUE);
./includes/locale.inc:  cache_clear_all('*', 'cache_page', TRUE);
./includes/locale.inc:  cache_clear_all('*', 'cache_page', TRUE);
./includes/locale.inc:  cache_clear_all('locale:', 'cache', TRUE);
./includes/locale.inc:  cache_clear_all('locale:', 'cache', TRUE);
./includes/locale.inc:  cache_clear_all('locale:', 'cache', TRUE);
./includes/menu.inc:    cache_clear_all('links:'. $menu_name .':', 'cache_menu', TRUE);
./includes/menu.inc:    register_shutdown_function('cache_clear_all', 'links:'. $menu_name .':', 'cache_menu', TRUE);
./includes/menu.inc:  cache_clear_all('*', 'cache_menu', TRUE);
./includes/theme.inc:  cache_clear_all('theme_registry', 'cache', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($format .':', 'cache_filter', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($form_state['values']['format'] .':', 'cache_filter', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($form_state['values']['format'] .':', 'cache_filter', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($form_state['values']['format'] .':', 'cache_filter', TRUE);
./modules/locale/locale.module:        cache_clear_all('locale:', 'cache', TRUE);
./modules/locale/locale.module:      cache_clear_all('locale:', 'cache', TRUE);
./modules/system/system.install:  cache_clear_all('router:', 'cache_menu', TRUE);
./update.php:    cache_clear_all('*', 'cache_update', TRUE);

Most of them use a hard-coded prefix that cannot bloat the directory (which is also true for the excluded, unparameterized calls to clear the page and block caches).

The following calls clear a prefix that *can* vary:

./includes/menu.inc:    cache_clear_all('links:'. $menu_name .':', 'cache_menu', TRUE);
./includes/menu.inc:    register_shutdown_function('cache_clear_all', 'links:'. $menu_name .':', 'cache_menu', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($format .':', 'cache_filter', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($form_state['values']['format'] .':', 'cache_filter', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($form_state['values']['format'] .':', 'cache_filter', TRUE);
./modules/filter/filter.admin.inc:  cache_clear_all($form_state['values']['format'] .':', 'cache_filter', TRUE);

The first two deal with clearing all items for a named menu. Out of the box, Drupal only has a couple of named menus. Even the largest sites only add a handful. The last four calls only happen following administrative action with filters. And even then, they're limited to the number of configured formats.

I do not see anything in Drupal 6 core that could bloat the directory. Rather, it looks like the directory would converge on under 30 items for core.

Regarding prefix counters: in Namespice module I came to conclusion that safest way to store counters (I call them serials) is to use separate bin for them. With shared bin you can't control evictions. If eviction of counter happened while invalidated cache is still in Memcached, and you reinitialize the counter, counter collision can happen with previously used but already invalid number. Probability of this is small, but still not zero.
When storing counter separately you need to flush related content cache bins together with counter bin (to prevent collissions too), but that's different problem.

In the case above applies to this patch, the check for $memcached_counters[$table][$prefix] === FALSE leading to "cache miss" is not enough, because invalid cache entry is still in Memcached and can be returned in the case of collision.
I haven't looked deeply at this patch, just wanted to raise a concern about counter problems.

This isn't designed to be the final word on managing prefix clearing. It's designed to be a massive improvement on the current system. Jeremy and I have already discussed offering the option of using a separate bin for counters; it's a trivial change.

And the system is actually already safe from the scenario you describe (provided we add the counter cache miss logic in discussion). A counter only gets initialized when first added to the directory. If the counter gets evicted, running increment on it will not re-add the item. So, you get the correct but unfortunate effect of all items with that prefix being considered invalid until the whole bin is cleared.

It might be prudent to flush the whole bin if prior counters are found to be evicted in order to restore usability of cache items with that prefix. Or, we could instead use a microtimestamp, which is slower and not guaranteed to provide perfect invalidation behavior, but allows re-initializing evicted counters.

david strauss’s picture

@crea Also, I have looked at your Namespice module. I understand that it modifies the cache key when you invalidate sets of items. While this succeeds in correctness and initial performance, it bloats the cache contents with invalidated items. In your system, separating serials into a separate bin is critical because even modest cache usage eventually converges on filling the non-serial bin(s). My design does not converge on filling the cache in the same way.

david strauss’s picture

@crea An additional concern I have with your Namespice architecture is that it provides no efficient facility for stampede protection or minimum cache lifetime support. We intend to support these on top of my existing work, and it's trivial because we still load and analyze the validity of items instead of just missing, which is what your approach does.

david strauss’s picture

I don't see the check you mentioned ( $memcached_counters[$table][$prefix] === FALSE). Was it planned feature, not yet implemented or am I missing something ?

We still need to add that check. It will go on the same line as the existing counter validity checks in cache_get.

crea’s picture

@David Strauss:
About assumption: you quote drupal modules published on Drupal.org and how they use cache_clear_all() function and it's exactly kind of assumption I mentioned above. Drupal is framework also, so we should base our code on API, not on existing examples of code. What kinds of custom caching modules sites use, that are not published here ? Noone knows.
API docs for cache_clear_all() function says:

$wildcard If $wildcard is TRUE, cache IDs starting with $cid are deleted in addition to the exact cache ID specified by $cid.

Note, that there are no guidelines about length and variation of $cid.
Consider following scenario. In your custom module you cache different parts of node, or different themed forms: say, themed body, themed teaser, etc. It's very logical to use keys prefixed with "node$nid": on node update event you will be able to flush all cache entries related to the node being updated.
Now, if we implement this feature based on assumption that modules listed on Drupal.org don't use many prefixes in cache_clear_all(), we break (well, not exactly break but make badly perform) Memcache API for you and your custom cache module.
This is only simple example. In fact you could have cache prefix like "node$nid-$year-$month" and wanted to call cache_clear_all("node12345-2009"...) to flush all entries related to the node 12345 in 2009. API allows you to do so.

About counters and Namespice: Indeed, your approach in this patch is better.
In fact, I planned to move features of Namespice to cache backend, such as Memcache API. There were only 2 concerns:

  • There are several cache backends out there (Memcache API, Cacherouter etc) so supporting only one is bad, while implementing it everywhere is too hard.
  • Drupal community could benefit from one central cache backend API (Don't Repeat Yourself).

  • I would abandon Namespice if cache backend could provide flushing for random substring of cache id, i.e. feature similar to "tags" of Namespice.
  • I understand that this feature is not supported by current core functions signature, but I believe by extending signatures (in a backward compatible way) or implementing additional functions we could greatly help modules to flush groups of cache entries. Currently core says you should prefix cache entries with tag, and I think it's very limiting.

david strauss’s picture

Drupal is framework also, so we should base our code on API, not on existing examples of code.

Optimizations should be based on common cases, not what you could theoretically do to break the optimization. While I haven't broken down every cache clearing call I've seen in contributed modules, they're substantially similar to the usage in core. It's uncommon to clear large sets of different prefixes.

I previously considered having no directory and just performing a batched lookup of all possible prefixes. That's reasonably efficient on a single memcached instance but very bad across a distributed cluster. So, we'd have to ensure prefix co-location. We'd also *have* to use a separate bin for counters because there wouldn't be a directory allowing us to detect counter eviction.

A compromise would be only allowing prefixes at breakpoints like underscores, but that changes the API.

Consider following scenario. In your custom module you cache different parts of node, or different themed forms: say, themed body, themed teaser, etc. It's very logical to use keys prefixed with "node$nid": on node update event you will be able to flush all cache entries related to the node being updated.

Like I said, we could put a cap on directory size and flush the whole cache once it hits the limit. That would cause the system to fall back to something comparable to the old behavior.

If we want to get fancy, we could use a tree structure for the prefix directory. While initial traversal would require a few memcached requests, we could cache known directory content in APC (or equivalent) -- the directory can only get larger. The tree structure would ideally be based on branching only as much as necessary:

(1) Directory starts empty.

(2) Clearing of "node_34_*":

node_34_

(2) Clearing of "node_35_*":

node_3
  4_
  5_

(3) Clearing of "node_20_*":

node_
  20_
  3
    4_
    5_

(4) Clearing of "node_21_*":

node_
  2
    0_
    1_
  3
    4_
    5_

(5) Clearing of "user_5_*":

user_5_
node_
  2
    0_
    1_
  3
    4_
    5_

(5) Clearing of "user_6_*":

user_
  5_
  6_
node_
  2
    0_
    1_
  3
    4_
    5_

So, the directory could expand to accommodate a very large set of prefixes with minimal traversal time.

We could also make the tree design append-only to simplify maintaining the APC part and minimize locking. This would converge on branching at the right places with a small compromise on efficiency.

(1) Directory starts empty.

(2) Clearing of "node_34_*":

node_34_

(2) Clearing of "node_35_*":

node_34_
node_3
  5_

(3) Clearing of "node_20_*":

node_34_
node_3
  5_
node_
  20_

(4) Clearing of "node_21_*":

node_34_
node_3
  5_
node_
  20_
  2
    1_

(5) Clearing of "user_5_*":

user_5_
node_34_
node_3
  5_
node_
  20_
  2
    1_

(5) Clearing of "user_6_*":

user_5_
user_
  6_
node_34_
node_3
  5_
node_
  20_
  2
    1_
crea’s picture

Optimizations should be based on common cases, not what you could theoretically do to break the optimization.

That would make sense if this patch was part of the core from the start. There are modules that existed long time ago before this patch. So it's not cases that break optimizations, it's optimization that can break some cases. If a module is following core API, why should it break/perform badly ?

While I haven't broken down every cache clearing call I've seen in contributed modules, they're substantially similar to the usage in core. It's uncommon to clear large sets of different prefixes.

It may be uncommon, but not impossible, and is very logical. My example with nodes is similar to what I use in Advanced Panels Cache module. So it is real example, with only difference that I use node tags and not prefixes. If instead of Namespice I used core cache functions and prefix instead of tag, Memcache API would work inefficiently for me before this patch (flushing whole bin), but could break horribly after this patch, assuming number of nodes grow.

Personally, I'm fine with this prefix directory if it's configurable. Let user disable it, fallback to bin clearing or whatever. Limitation of growing is ok too.

crea’s picture

APC idea is good. Then there's different problem if users of Memcache API would want to use it instead of /insert random opcode cache here/.
Actually, if news about APC being part of php in the future are true, then requiring APC is ok.

crea’s picture

Status: Needs work » Active

Please correct me if I'm wrong. From looking at the patch I understand that you use prefix directory to provide relationship between keys and prefixes, to store prefixes and their counters inside related cache entries. Then, to invalidate the cache for a prefix you increment counter for that prefix, and then all keys containing the prefix with invalid or missing serial automatically invalidated on cache_get().
To summarize the problem, we need to make relationships between cache id and cache prefixes, and this relationship need to be fast and efficient. So are we reinventing relational storage in Memcached here ?

Btw, I think this kind of check can be slow anyway and you do it very frequently:

// Check if the item being fetched matches any prefixes.
foreach ($memcached_prefixes[$table] as $prefix) {
  if (substr($cid, 0, strlen($prefix)) == $prefix) {

During one page request, cache_get() can be called a lot of times, and you basically parse whole directory every time.

crea’s picture

Status: Active » Needs work
crea’s picture

Btw, using Memcached' (which uses libmemcached) getMulti or getMultiByKey we could fetch all possible prefixes for maximum *supported prefix length* in single request. We need supported prefix length anyway, because we can't flush prefixes of arbitrary length. With current prefix directory model, it is too expensive.
So logic would be:
1) When flushing, check prefix length. If $lenth > $max_length, flush whole bin. This is not super efficient, but not worse than before this patch.
2) If length is ok, store prefix counter in Memcached itself
3) To get all flushed prefixes for a key, get substrings of maximum supported length prefix in one getMulti request from Memcached. Should be fast enough.

How about opening 2.x branch that aimed towards Memcached support, and starting from there ? Memcached is superior anyway compared to Memcache. If we start there, I also may incorporate Namespice features there.

crea’s picture

Small coding style note: it's better to move prefix variables from global to static ones in dedicated function to avoid collisions in the global namespace.

crea’s picture

Overall, I think prefix directory is bad idea, if we can't predict directory size. With control of max size it can work: maintaining small directory with falling back to full bin flush when it grows would be improvement anyway over what we have now.
We can agree that this is partial, initial improvement, and further improvement could be implemented separately, such as in 2.x branch. So for 1.x branch I suggest sticking with this approach, as long as directory size limit can be configured by user. Also having this limit set to 0 would fit nicely for disabling this feature, basically falling back to pre-patch behavior.

david strauss’s picture

Status: Active » Needs work

@crea in #26

I already mentioned and dismissed the possibility of looking for all prefixes in #21:

I previously considered having no directory and just performing a batched lookup of all possible prefixes. [...]

Also, fetching multiple keys is already possible using the current memcache PECL extension:

http://www.php.net/manual/en/function.memcache-get.php

@crea in #27

I disagree, especially because the variables are named $memcached_[something]. Creating functions that purely encapsulate static variables is a shameless hack. This all belongs in real objects, but I didn't want to handle that rewrite yet.

@crea in #28

I'm actually leaning toward the prefix directory tree now. It's pretty easy to implement, and it can scale well (especially with APC-like cache support). And it's easy to support correct and efficient operation even with the possibility of parts being evicted.

crea’s picture

@David Strauss

About Memcache: That's nice. I missed that it also has multiple key fetching. Thanks!
So why having separate bin for counters and using it for batch counter lookup is wrong ? We already advise to use several bins for different cache tables. Users would be ok with it. Scaling would be ok, cause whole directory could fit in one server (counters are small enough). Is possible eviction that big issue ?
I'm asking also because I think about using that approach in Namespice (while also switching from inserting counters in cache id to encapsulating them in cache object).

About static variables: it's not hack, because it wouldn't be purely encapsulating. It's very logical to split all prefix and counter management to separate function, then static caching fits nicely. It's what I have in Namespice. OTOH in this patch you fetch this directory in several places, repeating code. If prefix management is abstracted in separate function, switching from storing it in Memcached to directory tree, APC or whatever, also would be easier.

david strauss’s picture

So why having separate bin for counters and using it for batch counter lookup is wrong ? We already advise to use several bins for different cache tables. Users would be ok with it. Scaling would be ok, cause whole directory could fit in one server (counters are small enough). Is possible eviction that big issue ?
I'm asking also because I think about using that approach in Namespice (while also switching from inserting counters in cache id to encapsulating them in cache object).

I didn't say having a separate bin was the problem, I said looking up all possible prefixes was a problem.

About static variables: it's not hack, it's very logical to split all prefix and counter management to separate function, then static caching fits nicely. It's what I have in Namespice. OTOH in this patch you fetch this directory in several places, repeating code. If prefix management is abstracted in separate function, switching from storing it in Memcached to directory tree, APC or whatever, also would be easier.

Yes, there is code duplication right now, and I've already put forth that it should be fixed using proper objects. It is inappropriate to use functions merely to encapsulate variables, despite how often Drupal does this, because you end up with funky arguments designed to perform get and set behavior. Even a single class with static functions is better.

crea’s picture

I said looking up all possible prefixes was a problem.

Ok, what about approach I described above ? Say, we have max prefix length = 10, and wildcard prefix "abcdefghij".
Lookup of the following would be fast, if it's done in single memcached request to single memcached server:
"a"
"ab"
"abc"
"abcd"
"abcde"
"abcdef"
"abcdefg"
"abcdefgh"
"abcdefghi"
"abcdefghij"

That would scale depending only on configured max prefix length, not on number of prefixes or total size of the directory.

david strauss’s picture

Lookup of the following would be fast, if it's done in single memcached request to single memcached server

Yes, the but the key statement there is "single memcached server." This solution is not cluster-able for high availability without a big compromise on performance.

crea’s picture

Well, addByKey, getByKey and setByKey could achieve that prefix colocation you mentioned above. Memcached only. though.

Another idea: use prefix key as "ok" flag, and absence of prefix as "flush" flag. To flush a prefix, delete it, instead of incrementing. On cache_set() set "ok" flag. Then missing prefix because of failed server in a cluster will also lead to flush, and we get desired clusterability without the need for prefix colocation.

david strauss’s picture

Another idea: use prefix key as "ok" flag, and absence of prefix as "flush" flag. To flush a prefix, delete it, instead of incrementing. On cache_set() set "ok" flag. Then missing prefix because of failed server in a cluster will also lead to flush, and we get desired clusterability.

That won't work because you can have this:

(1) Set test_123
(2) Clear test_*
(3) Set test_456

You can have a mix of pre-clear and post-clear items matching the prefix.

crea’s picture

Yep, I get what you mean. So "test_456" will set ok for "test_" again, bringing already invalid "test_123" alive again.

crea’s picture

Ok what's the problem with prefixes spread over multiple servers ? AFAIK Memcache client library manages to do parallel fetching. When prefix counter is missing we can just reinitialize it, and clear corresponding cache table (to prevent counter collision). That's what I want to do in Namespice. So when a server in the cluster fails, we get full flush for every cache table which had atleast one prefix counter in the failed server.
Also this kind of logic can handle counter evictions too, because it will just lead to over-flushing. IMHO we should care only about under-flushing (meaning invalid cache somehow becomes valid), not over-flushing.
Sure it's performance hit, but failing Memcache server is not common situation and atleast we degrade gracefully, meaning we don't break but return to pre-patch behavior in the case of failed server or counter eviction.

david strauss’s picture

AFAIK Memcache client library manages to do parallel fetching.

We need a lot better certainty than "AFAIK" if we might be querying 10+ servers to get the data. I agree on the over- versus under-flushing strategy.

crea’s picture

@David Strauss:
It should work in parallel, starting from 3.0 version of PECL:Memcache. See:
http://lists.danga.com/pipermail/memcached/2007-May/004354.html
http://pecl.php.net/package-info.php?package=memcache&version=3.0.0

Also notice there's also redundant configuration so for redundant storage of prefix counters one could have cluster of several memcached servers where keys are not split but duplicated.

crea’s picture

huh, that 3.0 branch even is backward-compatible.

Version 3 introduces a new class "MemcachePool" which implements the new API, the old class "Memcache" is still retained (but is deprecated) with the same interface for backwards compatibility.

http://cvs.php.net/viewvc.cgi/pecl/memcache/README?revision=1.3.2.5&view...

crea’s picture

Also, I don't exactly understand why would site admin run 10 servers for our dedicated prefix counter pool. Unless we open 2.x branch of Memcache API with dependance on either libmemcached-Memcached or Memcache 3.x branch, we can instruct to run dedicated pool of max 2 servers, cause 2 is enough for failover. There can be general pools with 100 cache servers, but for prefix counters there will be separate one, with 2. That means even if there are 20 prefixes spread between 2 servers, they will be fetched in 2 multi_get requests.
And if we instruct to run only 2, and admin decides to run 10, that means he shoots himself in the foot. Should we care ?

crea’s picture

If everyone will be comfortable with the approach I described above, I can try to provide initial patch: just let me know. Don't want to spend my time on it without some confidence that it will be accepted eventually.
I can also implement it in generic way so any module will be able to use cache tag functions, and make Namespice module obsolete. So to check if prefix was flushed this patch will use tag functions with tag array consisting of sub-prefixes of max prefix length.

david strauss’s picture

@crea Which approach, specifically, do you mean?

crea’s picture

@David Strauss:
Well. Summary of my ideas here:

  • Store prefix counters in separate memcache cluster.
  • Recommended setup for this cluster would be 2 servers with PECL:Memcache 2.x, or any number of servers with PECL:Memcache 3.x
  • Abstract "tags" feature, so that there are separate functions similar to ns_cache_get(), ns_cache_set(), ns_cache_clear_all() of Namespice. These functions will require array of tags to work with. Because we store tags (or prefixes and their counters), unlike Namespice in it's current state, in cache entry itself, we can provide stampede protection (return invalid cache entry while rebuilding) and cache lifetime same as before.
    These functions will get all tag serials (prefix counters) in one atomic get request. Depending on setup this will be handled automatically by client library, in parallel or sequentially.
  • Make core cache functions use these tag functions with tag array consisting all possible subprefixes of max prefix length of the current cache id.
  • For prefixes > max length simply flush entire bin. Safe prefix length could be measured by additional performance tests.
  • Evictions for tag serials (prefix counters) is not big issue: counters are small, and evictions can still be monitored using Memcached statistics. Even more, this logic could still deal with missing counters via over-flushing.
    Also we can directly monitor evictions ourselves, and use eviction counter to distinguish missing counter because of new tag (prefix) from evicted counter. Then eviction would become critical problem reported to site admins.
crea’s picture

Can we have previous patch posted here reversed atleast ? This is now in broken state, since prefix directory does not scale at all.

jeremy’s picture

How does it not scale? I have deployed this on a few live websites, and run it through load testing without experiencing any issues. I don't see any real-world problems here.

jeremy’s picture

Status: Needs work » Fixed

We've run this through comprehensive load testing (using jmeter) and found no performance impact from this patch. And it greatly simplifies configuration of memcache. The patch is already committed to the 6.x-1.x-dev branch, now committed to the 7.x-1.x-dev branch. Thanks, David!

crea’s picture

Status: Needs work » Fixed

@Jeremy:
This works only assuming one has few wildcard prefixes (core Drupal). Given core + arbitrary modules (Drupal is framework too, right ?) it may not scale.
It won't scale if one is flushing many different wildcard prefixes. For example, if you tie your cache entries to nodes so that cache id start with "node$nid" and you cache_clear on prefixes like "nodeN" and you have many nodes, prefix directory will consume lots of memory and slow down your site.
Please read #14 and #20 for more in-depth description.

crea’s picture

Status: Fixed » Needs work
jeremy’s picture

Status: Fixed » Needs work

I've read your earlier comments, and they're all hypothetical problems as far as I can see. What modules actually do what you describe? If these modules actually exist, have you tested them against the latest code? I have indeed tested this code both on production websites, and with targeted load testing tools, and seen excellent results.

crea’s picture

I've read your earlier comments, and they're all hypothetical problems as far as I can see.

Yes, it's my habit to foresee potential problems before they arise.

What modules actually do what you describe?

My Advanced Panels Cache does the same thing - using "node$nid" tags (kind of prefixes). I started it before this patch, so I needed own way of dealing with flushes. OTOH, with this patch I can't use Memcache wildcard flushing too, because of the potential directory size growth.

If these modules actually exist, have you tested them against the latest code?

I would have to rewrite it to use Memcache API wildcard flushing instead of Namespice tags.

I have indeed tested this code both on production websites, and with targeted load testing tools, and seen excellent results.

Ofcourse this will work fine in most general cases. Though if you want to test growth of the directory, you need to flush many different prefixes.

Personally I am ok with this approach if it's documented, i.e. put a warning about prefix directory size growth.
Whether this is a bug or not, depends on if you target Drupal as CMS or Drupal as framework. If you target Drupal CMS, this is ok as long as there are no modules published on Drupal.org that break together with it. If you target Drupal Framework this should not break for any module (even hypothetical one) that is following the API.

jeremy’s picture

Status: Needs work » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.