Hi there,

We have a high-loaded Drupal website that usees memcache as cache backend

We're runing memcache daemon with 5Mb object size

Extension:

extension=memcache.so
memcache.hash_strategy="consistent"

settings.php

 $conf += array(
  'cache_inc' => './sites/all/modules/memcache/memcache.inc',
  'memcache_persistent' => 'TRUE',
  'memcache_stampede_protection' => 'TRUE',
  'lock_inc' => './sites/all/modules/memcache/memcache-lock.inc',
  'memcache_servers' => array(
                        'XXX.XXX.XXX.XXX:11211' => 'instance1',
                        'YYY.YYY.YYY.YYY:11211' => 'instance1',

                        'XXX.XXX.XXX.XXX:11212' => 'instance2',
                        'YYY.YYY.YYY.YYY:11212' => 'instance2',

                        'XXX.XXX.XXX.XXX:11213' => 'instance3',
                        'YYY.YYY.YYY.YYY:11213' => 'instance3',
        ),

  'memcache_bins' => array(
                'cache' => 'instance1',

                'cache_admin_menu' => 'instance2',
                'cache_block' => 'instance2',
                'cache_menu' => 'instance2',
                'cache_path' => 'instance2',

                'cache_page' => 'instance3',
                'cache_views' => 'instance3',
                'cache_views_data' => 'instance3',
        ),
);

The issue is that from time to time we got accident menu rebuild. After investigation we made a conclusion, that variables cache at the some point is just empty.

Any help is greatly appreciated,

Thanks in advance.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Status: Active » Needs review
FileSize
825 bytes

I don't have an answer for why the variable cache would get set to empty, but we did add a memcache_variable_set() function to memcache 6.x-1.9 to reduce the expense of setting variables that memcache uses to track cache flushes internally.

Here's a patch that does some additional sanity checking - untested but it's a simple enough patch. If you're able to test this and confirm whether or not the bug persists for you that would be great.

druninja’s picture

Thank you for your response. We've applied that patch, but cache is still being lost for some reason. I wonder if you can propose some debug code to track and catch the issue ?

Thanks.

csavio’s picture

I believe we're seeing this in a similar environment under load with shared bins. I can refresh a page with variables and see the variables content appear and disappear as it bounces between memcache servers. On our load test environment I saw the content disappear entirely until a variable was saved.

Maitreya’s picture

Try adding "'cache_form' => 'database',"

catch’s picture

What do you have set for memcache.hash_strategy in phpinfo()?

csavio’s picture

Ours is set to "consistent" in the production and load test environments.

Jeremy’s picture

What sort of traffic are you seeing? If you enable memcache_admin and go to the admin memcache statistics page, what does it say for aggregate get/s and set/s? Also, what do you see under available memory and evictions?

catch’s picture

FileSize
611 bytes

You could try replacing all calls to memcache_variable_set() with variable_set() just to rule it out. However it may also be you have a pre-existing issue that was being 'corrected' by the variable cache being explicitly deleted and rebuilt instead of write through.

For debug, you could try putting a backtrace in cache_set() if $cid == variables. See attached patch.

csavio’s picture

I looked at your patch and realized we're one version back (6.x-1.8) so ours is possibly not the same issue as the initial poster even if the symptoms are similar. 6.x-1.8 did not have memcache_variable_set.

catch’s picture

You should definitely upgrade to 6.x-1.9, that has a lot of bug fixes and improvements.

csavio’s picture

These the statistics you were looking for Jeremy?

Server 1

curr_items		434973
cmd_get			2662
cmd_set			29
evictions		0
mem_used		0.01%

Server 2

curr_items		345532
cmd_get			2275
cmd_set			41
evictions		0
mem_used		0.01%

Thanks catch, we'll definitely upgrade to the newest version in the next build/publish. We'd just updated to 6.x-1.8 a few weeks back and I hadn't realized a new version had come out in the meantime.

I wrote a quick bootstrap script to check the variables in production. I'll get it deployed to an out of band web server to check against each memcache server.

Jeremy’s picture

> These the statistics you were looking for Jeremy?

In 1.9 these statistics are calculated for you. I was just curious how busy your site is -- but knowing you're running the 1.8 version of the module it's most likely that you're experiencing an old bug. I highly recommend a quick upgrade to 1.9 if you're experiencing issues.

koenvi’s picture

Did you ever get to locate this issue? We seem to be having a similar issue: our variables get lost sometimes, leading to rebuilding of the menu and creating a new drupal_private_key which causes issues with the form validation. We are running version 6.x-1.10 of the memcache module.

techgirlgeek’s picture

We're having this same issue as koenvi described in the above comment. We just updated to 6.x-1.10 and we started seeing more variables become rewritten. We were able to get past the drupal_private_key issue by hard coding the key into settings.php. Would love to hear some more ideas about how to fix this going forward.

In addition to seeting the private_key being reset, we are seeing page manager variables be changed:
variable page_manager_node_view_disabled is being set to TRUE, which is not the current setting.

Our site is pretty high traffic, high load, AND we are using Pressflow for our site. I would be interested to know if others receiving this behavior are also using Pressflow.

Thanks,

Karyn

cedarm’s picture

We loaded up our production code with debug watchdogs and finally figured out what's going on for us. "Something Bad" causes mysql to barf in variable_init(), as in "Lost connection to MySQL server during query" or "MySQL server has gone away". I suspect various sites have multiple underlying causes, but whatever the cause is $variables ends up NULL.

    $result = db_query('SELECT * FROM {variable}');
    while ($variable = db_fetch_object($result)) {
      $variables[$variable->name] = unserialize($variable->value);
    }
    cache_set('variables', $variables);

So here's the catch. If you have a site using core database caching then the cach_set() will probably fail to store the bad result in the database because the database has "gone away", and the problem doesn't get cached. Now if you're using a non-database caching system (like memcache) then the cache_set() will successfully store the empty variables. Boom!

If you want to simulate this add some code to your development sandbox that sets $variable = NULL randomly 1% of the time. This produces various wide ranging effects, most of which we have observed in production at one time or another, such as drupal_private_key being regenerated, theme getting messed up, and page manager handlers being disabled.

I think the fix is simple, and two fold:

  1. Never cache_set() variables if it's NULL.
  2. Refuse to overwrite variables with cached data if the data is NULL.

And #1 should make #2 unnecessary, but better to be safe in case someone else messes directly with the variables cache (like memcache_variable_set()).

So here are the patches for Drupal core and memcache. Even if you don't use memcache I think the core patch is important. If this does indeed solve problems then we can work on getting this into core, even in light of #973436: Overzealous locking in variable_initialize().

Choose your core version, apply, and test!

[edit] Oops. Missed an operator precedence mistake. Use new patches in #16.

cedarm’s picture

catch’s picture

hmm that's interesting. Could you open a core bug about the situation with MySQL server has gone away?

I'd like to see a bit more testing of this before it goes in, but I'd be fine adding this to the custom memcache_variable_set().

valderama’s picture

Hey,

we are still fighting with a similar problem on nodes - sometimes memcache servers empty nodes. Our suspection is, that is happens on high traffic, so it might be the case that the mysql server suffers at that time.

The question, I would be glad if someone could help on is: If this can happen with variables, it might easily also happen with nodes? Or is the caching of nodes handled differently? (We have the problem on an D7 site, so nodes are entities..)

Thanks,
Walter

zerolab’s picture

Thanks for the patches, cedarm.

Using the memcache one and and a simplified version for Pressflow with a patch from #561990: Avoid variable_set() and variable_del() stampedes. That fixed our problems on a large site.

Attaching the patches for core/Pressflow, in case someone needs to use them with work from #561990: Avoid variable_set() and variable_del() stampedes

Cheers,
Dan

gram.steve’s picture

Adding another voice here. Sites that use memcache are a special case when compared to using standard db caching in drupal core.

In the latter case, if the db has gone away, there is no consequence. With memcache, there is.

This really needs to be fixed. CedarM fixed it. This needs to be included, as long as some other consequence to including it is not identified.

The consequences of not adding it are highly aliased toward large sites (larger percentage of memcache users.) But they rarely understand how to identify the problem. So they blame Drupal.

I have been following this issue hoping it would be incorporated.

loze’s picture

-- nevermind wrong issue -