We've had ongoing reports of problems caused by using memcache with some bins (cache_form), and some bins make less sense going into memcache to begin with (cache_page). This patch adds the ability to exempt any bin from memcache using syntax like this:

$conf['memcache_bins'] = array(
  'cache' => 'default',
// Any bin that goes to 'none' will fall through to database caching.
  'cache_form' => 'none',
  'cache_menu' => 'none',
);

I solved the general problem of how to include memcache.inc and cache.inc at the same time by copy and paste. Ugly, yes, but super easy to maintain. It adds almost no complexity yet makes the module much more flexible.

I've abandoned the idea of memcache.db.inc, and actually think it should be removed from the project. It is woefully outdated and essentially unmaintained.

The implementation of this particular feature (excluding bins) in Drupal 7 is easier, and will look completely different, I imagine.

Comments

Mark Theunissen’s picture

Out of curiosity, what is the problem with cache_form? And why doesn't cache_page make sense?

We have a need for what we call a 'durable' bin, that doesn't get cleared during deployments. For that, we have a separate memcached daemon cluster running and map a bin to that.

robertdouglass’s picture

Mark, there are rumors and suspicions, unconfirmed afaik, that some multi-step forms don't function properly with memcache under certain circumstances. Part of the problem is that cache is supposed to be impermanent - you should be able to lose your entire cache and not lose functionality or state on the site. The cache_form table, however, isn't really a cache, but a temporary state store in the case of multistep forms. As I understand it, if the cache_form table went to /dev/null, some forms would not work. This makes memcache less than suitable for caching this information.

As for the page cache, it is easy to exceed the available limits of memcache, so it's nice to be able to route page cache to something with adequate storage available (like the db).

rwohleb’s picture

subscribe

bleen’s picture

Loving this idea ... testing on one of my sites right now ... so far so good

bleen’s picture

... note that using this patch (and excluding cache_form from being memcachified) has completely alleviated my issue (#962578: If cron runs while a form with an imagefield is open, error: "... This form was missing from the server cache..." on img upload ) which IMO confirms that #966114: cache_clear_all ignores cache "expire" attribute is, in fact, a legitimate problem.

rwohleb’s picture

I'm running this in a large production environment without issue.

robertdouglass’s picture

Component: Code » Documentation
Status: Needs review » Needs work

http://drupal.org/cvs?commit=452056

Committed to 6.x-1.x. Need documentation updates to README.txt, project page.

bleen’s picture

Status: Needs work » Needs review
StatusFileSize
new1.29 KB

sweet

emackn’s picture

I'm a little confused about something. Why would you set up a memcache bin, and then exclude it?

What are the gains/trade-offs for simply not specifying a memcache bin versus specifying a bin but excluding it?

Thanks.

bleen’s picture

emacken check out this from teh README.txt:

$conf = array(
  'cache_inc' => './sites/all/modules/memcache/memcache.inc',
  'memcache_servers' => array('localhost:11211' => 'default',
                              'localhost:11212' => 'default',
                              '123.45.67.890:11211' => 'default',
                              '123.45.67.891:11211' => 'cluster2',
                              '123.45.67.892:11211' => 'cluster2'),

  'memcache_bins' => array('cache' => 'default',
                           'cache_filter' => 'cluster2',
                           'cache_menu' => 'cluster2'),
);

With this setup, anything that would have normally gone into the cache_form, or cache_block, etc... tables will autmatically be sent to the "default" cluster even though neither are specified in this setup. By explicitly excluding cache_form (or whatever you want to exclude) you are telling memecache to follow this algorithm when cache_set() is called:

  1. Check if the bin for the item to be cached is specified
  2. IF NO: cache it in the "default" bin
  3. IF YES: cache it in the specified bin unless that bin is called "none" in which case use the appropriate Drupal cache table instead of memecache

Does that make sense?

emackn’s picture

yes, thanks, didn't realize that not specifying a bin would send the cache sets to the default.

bleen’s picture

StatusFileSize
new1.29 KB

I had a minor spelling error in #8

digi24’s picture

A suggestion regarding the committed code patch:
Could you please move the whole _cache_get, _cache_set and _cache_clear_all function into a separate file? The reason is simple: Many users probably need to exclude certain bins from memcache and handle them another way. (Cacherouter is IMHO no option, as it provides a very good idea, but neglected and inefficient engines support.)

Having the alternative engine functions in a separate file would make custom cache engines besides memcache API way easier to maintain. And users not requiring this functionality in this patch, could just comment it and save some CPU cycles per request.

robertdouglass’s picture

Status: Needs review » Fixed

@bleen18, thanks - added the docs.
@digi24 - provide a patch and I'll review it. Thanks.

jeremy’s picture

Status: Fixed » Needs work

I get the feeling this is an attempt to mask another bug -- if cache_forms is not working in memcache, why? Is moving cache_menu into the database even an option for large Drupal websites? I also don't like how this was implemented, cutting and pasting code from core. Inevitably as core changes this file will get out of sync. If this logic has to stay in, I agree it should be moved into an include file as requested in #13.

jeremy’s picture

robertdouglass’s picture

Jeremy, I agree that this might be a way to mask other bugs. I also argue that disk space is cheap and memory is expensive, and some people use cache_set strategies (like rendered node caching) that quickly fill up memory, and they should have the option of sending high volume caches to disk.

I'll make a separate file and include it with the extra code if that helps people.

digi24’s picture

@robert
Regarding the patch - I just moved the functions to seperate files and put in a require_once. This could maybe made nicer and use a better name then 'none'. So I am not uploading it.

One thing I came across, I would suggest to also add the following patch, which moves the cache_clear_all logic out of the db-specific function, by simply slightly postponing the specific functions:


Index: memcache.inc
===================================================================
--- memcache.inc        (revision 2166)
+++ memcache.inc        (revision 2177)
@@ -152,11 +152,6 @@
  *   match. If '*' is given as $cid, the table $table will be emptied.
  */
 function cache_clear_all($cid = NULL, $table = NULL, $wildcard = FALSE) {
-  // Handle database fallback first.
-  $bins = variable_get('memcache_bins', array());
-  if (!is_null($table) && isset($bins[$table]) && $bins[$table] == 'none') {
-    return _cache_clear_all($cid, $table, $wildcard);
-  }

   // Default behavior for when cache_clear_all() is called without parameters
   // is to clear all of the expirable entries in the block and page caches.
@@ -165,6 +160,13 @@
     cache_clear_all('*', 'cache_page', TRUE);
     return;
   }
+
+  // Handle database fallback first.
+  $bins = variable_get('memcache_bins', array());
+  if (!is_null($table) && isset($bins[$table]) && $bins[$table] == 'none') {
+    return _cache_clear_all($cid, $table, $wildcard);
+  }
+
   if (empty($cid) || $wildcard === TRUE) {
     if ($cid == '*') {
       $cid = '';

and in the specific _cache_clear_all:


Index: db_fallback.inc
===================================================================
--- db_fallback.inc     (revision 2207)
+++ db_fallback.inc     (working copy)
@@ -142,14 +142,6 @@
 function _cache_clear_all($cid = NULL, $table = NULL, $wildcard = FALSE) {
   global $user;

-  if (!isset($cid) && !isset($table)) {
-    // Clear the block cache first, so stale data will
-    // not end up in the page cache.
-    _cache_clear_all(NULL, 'cache_block');
-    _cache_clear_all(NULL, 'cache_page');
-    return;
-  }
-
   if (empty($cid)) {
     if (variable_get('cache_lifetime', 0)) {
       // We store the time in the current user's $user->cache variable which
ball.in.th’s picture

Subscribing. Very interested in this.

jeremy’s picture

Priority: Normal » Major

Bumping up the priority, as this is currently blocking a release. There have been some performance related fixes that I'd like to release, but not until this patch is either backed out (very tempting), or the logic is moved into an include file.

> I also argue that disk space is cheap and memory is expensive

I see two problems with this argument: 1) your patch sends a configurable number of caches into the database which is also going to end up in memory at least partially, and 2) disks are slow while memory is fast -- if you're deploying the memcache module you're doing it because you want to improve performance. While more expensive than disk space, memory really isn't that expensive these days.

ball.in.th’s picture

Could 'none' be changed to something else, e.g., database, core, etc, to avoid confusion? It seems 'none' is used in cacherouter to literally mean no cache -- http://drupal.org/node/797610 .

bleen’s picture

re #20: Please do not back this out ... this change solves a whole bunch of issues that occur when cache is cleared while a form is being filled out. The form cache really is a special case in Drupal and this lets me treat it as such.

jeremy’s picture

> this change solves a whole bunch of issues that occur when cache is
> cleared while a form is being filled out. The form cache really is a
> special case in Drupal and this lets me treat it as such.

Where is the bug report for this? Have you tried the latest development snapshot to confirm whether or not it's necessary to split out cache_form? If it's still true with the latest development snapshot, update the bug report with steps on how to duplicate this, and we'll fix it properly.

robertdouglass’s picture

Jeremy - I agree that hunting down the weirdness with cache form is a good idea, but there's a 90% chance that it is not a memcache problem, rather a core architecture problem. I still maintain that independent of that particular case in point, this functionality is needed and wanted, and wish to hold on to it in the 6.x version (7.x doesn't need it). I'll get the code put into a separate file today, as requested.

robertdouglass’s picture

Jeremy, I didn't understand this sentence:

1) your patch sends a configurable number of caches into the database which is also going to end up in memory at least partially

"which is also going to end up in memory, at least partially" - how do you mean that?

A typical case that I run into is a site that has 4-5 web servers that are CPU bound and never exhaust their memory, so memcache is put onto the webservers (~100M or so per machine). If a site is caching nodes, or has millions of something, then memcache quickly gets exhausted, and most cache calls miss. This is a side effect of deciding that *all* cache_sets need to be stored in memory. The long tail of cache items that account for the "millions of something" are better off in persistent storage. This helps sites like the one I described avoid needing to buy a dedicated memcache machine with 16G Ram just to serve the cache.

The problem might not be as acute if we were tuning slab management to be most efficient for the typical cache item size we're sending in, but we don't do that.

In this way, the cacherouter module has the right idea - being able to direct specific caches to specific storage - but I never wanted to introduce that whole abstraction to this module. This is a compromise.

robertdouglass’s picture

@digi24 I don't think the argument that you could just comment out the include or require statement actually holds. You'd be risking a fatal function not found error if you ever got into a situation where one of the _cache_* were called.

I'm not opposed to putting the _cache_* functions into a separate file, but the only advantage I see of doing so is making it easy for someone else to write a new set of _cache_* functions (say to mongodb) and including that file instead of the database file. Honestly, very few people are going to do that.

So I'll provide a patch to move them into an include file, but I dispute the value of doing so.

robertdouglass’s picture

StatusFileSize
new2.72 KB

Here's a patch that I'm committing to change 'none' to 'database'.

robertdouglass’s picture

Status: Needs work » Needs review
StatusFileSize
new15.66 KB

Here's a patch to move the _cache_* functions to a new file, which I called database.inc.

@digi24 I haven't integrated any of your suggested changes from #18 yet. I'm a little reticent to do so as the "easy to maintain" part of this solution (copy and paste from cache.inc) starts to fall apart.

ball.in.th’s picture

Status: Needs review » Reviewed & tested by the community

The code looks good. Maybe the _cache_* functions in database.inc should be renamed to memcache_database_cache_* instead to avoid name collisions -- http://drupal.org/coding-standards#naming .

robertdouglass’s picture

@ball.in.th - what name collisions? Since we hijack cache.inc altogether I don't expect there to ever be a name collision. I'm not opposed to the idea, though.

ball.in.th’s picture

@robertDouglass
Yes, collisions are not likely. Just might be a good idea to have module name as a prefix.

bleen’s picture

@jeremey: just to answer your question from #23, the bug report can be seen here: #966114: cache_clear_all ignores cache "expire" attribute

@robertDouglass: Thanks for keeping this on your radar!

jeremy’s picture

@bleen18 that issue should now be fixed in the latest development snapshot, and hence was closed.

@robert:

> "which is also going to end up in memory, at least partially" - how do you mean that?

Databases use memory for performance. Especially if using innodb, then it's going to load all indexed columns into RAM.

If you have a cache bin that is abnormally large and is pushing other cache bins out of memory, you should move it to a dedicated memcached instance so it only affects itself. Memcache uses LRU to evict pages so the most popular items will live in the cache.

> I agree that hunting down the weirdness with cache form is a good idea, but there's
> a 90% chance that it is not a memcache problem, rather a core architecture problem.

What core architecture problem are you talking about?

robertdouglass’s picture

Caches should be disposable information. You should be able to run the website (albeit inefficiently) without any caching. cache_form is essential (as I understand it) to multi step forms, and Drupal actually breaks if it isn't available. That's the architectural problem I meant.

jeremy’s picture

Status: Reviewed & tested by the community » Needs work

@robert: I believe the goal was to make this database.inc file configurable via settings.php, much like 'cache_inc' is configurable in core.

Specifically, this line:

+require_once 'database.inc';

Should look more like this line (from bootstrap.inc):

      require_once variable_get('cache_inc', './includes/cache.inc');
thebuckst0p’s picture

Subscribe. I've been having problems with cache_form in memcache causing form_build_id tokens to become invalid and return empty data from CCK "Add another" buttons. This patch looks promising.

robertdouglass’s picture

StatusFileSize
new11.41 KB

Ok, I have made it so. The new variable is called "memcache_extra_include". I wrote some documentation and have committed this change.

robertdouglass’s picture

Status: Needs work » Fixed
StatusFileSize
new19.25 KB

#959200 by robertDouglass, bleen18: Added Add ability to exclude bins (tables) from memcache.

This time with -N so the new file shows up.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

quicksketch’s picture

Though this issue is now somewhat old and already done with, I want to back up something robertDouglass pointed out:

Caches should be disposable information. You should be able to run the website (albeit inefficiently) without any caching. cache_form is essential (as I understand it) to multi step forms, and Drupal actually breaks if it isn't available. That's the architectural problem I meant.

That's right. The cache_form table is not really a "cache" at all. It's a temporary storage of in-progress forms. Any form that utilizes multi-step through $form_state['storage'] or #ahah will automatically make entries into the cache_form table. The problem with this table is that it's content's aren't "temporary" in the manner that they can be disposed of at any time. For example let's say you start a node at node/add/image, using ImageField as an upload. If someone flushes the caches while you're typing out the title to your image, then you try to upload after the caches have been cleared, you'll find that it's impossible. ImageField will helpfully give you the error, #795004: An unrecoverable error occurred. This form was missing from the server cache. Try reloading the page and submitting again..

Now moving cache_form into the database instead of memcache still does not entirely solve this problem, but it does prevent Memcache from pruning in-use "cache" entries that may contain a form because it has used up all its space or from restarting memcache causing forms to break. So kudos, thanks for the patch and I'm looking forward to using 1.8+.

pieter_duijves’s picture

for Drupal 7, someone managed to exclude a bin from memcache and have it fall back to the db with these lines in settings.php :


/**
 * Settings for memcache
 */
$conf['cache_backends'][] = 'sites/all/modules/contrib/memcache/memcache.inc';
$conf['cache_default_class'] = 'MemCacheDrupal';
// The 'cache_form' bin must be assigned no non-volatile storage.
$conf['cache_class_cache_form'] = 'DrupalDatabaseCache';
$conf['memcache_servers'] = array(
  'memcache.local:11211' => 'default',
);