If there are many threads working with a constantly changing variables table, the variables entry in the cache table becomes "unstable".

Duplicate entry 'variables' for key 1 
INSERT INTO cache (cid, data, created, expire, headers, serialized) VALUES ('variables', 'a:876:{... 

Issue has to do with cache_set() being called by multiple variable_init(), from multiple threads in the bootstrap phase.
See #559480: Crawler & Do not cache if error on page variable cache db atomic issues

Comments

mikeytown2’s picture

Title: variable_init not atomic » Use locking framework for cache_set()
Status: Active » Needs review
StatusFileSize
new1.07 KB
kenorb’s picture

What about performance?
I see that lock_acquire() using some db queries (db_insert()).
What is the reason of caching something in memory, if it use database anyway.

damien tournoud’s picture

Title: Use locking framework for cache_set() » Use locking framework for variable_initialize()
Version: 6.x-dev » 7.x-dev
Status: Needs review » Needs work

Pressflow already added a lock in variable_initialize(). I think it makes sense.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new2.93 KB

I've ported the pressflow version from http://bazaar.launchpad.net/~pressflow/pressflow/6/annotate/head:/includ... to a core patch.

Main changes vs. HEAD are:

1. When you call variable_set() or variable_del(), it sets a global, which is then checked in drupal_page_footer() to trigger a cache clear - that way if you submit a long system settings form you don't clear the cache for each one at a time.

2. Adds a lock to variable_initialize() on cache misses, using exactly the same pattern as #802446: Cache stampede protection: drupal_render() and page cache.

I didn't do any testing of this beyond checking HEAD would still bootstrap, but I know that nnewton will need to in the next couple of days in an environment where this issue is getting hit.

Also according to twitter, this is the last core patch from Drupal 6 pressflow that hasn't made it into D7, would be great if Drupal 7 pressflow can start from a clean slate.

It might be possible to add a write through cache to this, but I think if we want to add such a thing, we should do it for the page cache first in #762978: Prevent cache stamped for page cache table.

catch’s picture

Title: Use locking framework for variable_initialize() » Avoid variable_set() and variable_del() stampedes
damien tournoud’s picture

Ok for the lock in variable_initialize(), but the delayed clearing is a really bad idea from a consistency point of view.

Status: Needs review » Needs work

The last submitted patch, variable_stampede.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new1.54 KB

These are really two separate patches anyway. Removing the delayed clearing for now. Let's see if that helps the tests at all too.

catch’s picture

StatusFileSize
new1.54 KB

And whitespace.

Status: Needs review » Needs work

The last submitted patch, variable_stampede.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new1.53 KB

The lock_initialize() causes the single test failure (in lock tests, presumably that's setting global $locks to empty and allowing a lock to be acquired, see if that causes problems elsewhere if we just remove it.

catch’s picture

StatusFileSize
new2.16 KB

lock_initialize() should just take care not to overwrite the global if it's called twice.

damien tournoud’s picture

Aren't you just missing the lock_release() call?

catch’s picture

StatusFileSize
new2.19 KB

I am, but that's not why the test is failing. If you add the lock_release(), but also remove the check from lock_initialize() that test still fails.

damien tournoud’s picture

Please move lock_initialize() earlier in the bootstrap (for example in _drupal_bootstrap_variables()). Having variable_initialize() call lock_initialize() feels slightly messy.

catch’s picture

StatusFileSize
new2.26 KB

That's a better fix, it's only slightly earlier in the bootstrap process so no real bloating, and that means no more need for the isset() check in lock_initialize().

damien tournoud’s picture

Status: Needs review » Reviewed & tested by the community

#17 looks good to me.

nnewton’s picture

Status: Reviewed & tested by the community » Needs work

I've been load testing #17 and found an issue. In variable_init, when you hit the lock and re-call the function we arn't passing the $conf array in, so all settings in settings.php are lost. The upshot of this for a high-traffic site is that once you hit the lock, you lose memcache.

    if (!lock_acquire($name, 1)) {
      // Another request is building the variable cache.
      // Wait, then re-run this function.
      lock_wait($name);
      return variable_initialize();
    }

I'm continuing testing as under high traffic we are also getting a stampede of the semaphore table, which is problematic and may require a memcache lock.inc over-ride.

damien tournoud’s picture

Oh, d'oh for the first issue. For high traffic sites, of course we will need a memcache (or similar) lock.inc. Something that doesn't require polling would be nice. The default db-based implementation is not designed to scale.

nnewton’s picture

Status: Needs work » Needs review
StatusFileSize
new2.26 KB

Here is the version of the patch I'm currently using. Ya, a memcache plugin for the locking framework definitely, but it actually isn't that bad once you fix this.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Well done. This looks RTBC to me.

As for the memcache version lock.inc - might not be needed once the lock.inc improvements from #802446: Cache stampede protection: drupal_render() and page cache are committed

catch’s picture

Ouch on $conf :(

memcache-lock.inc has a patch here #596908: memcached implementation of D7 locking, I re-rolled it yesterday, but didn't test it and that issue is quite old, but looks pretty far along to me.

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

dries’s picture

Updating status.

mikeytown2’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Needs review
StatusFileSize
new1.63 KB

Bump this back down to 6.x
Backport from the 7.x patch

achton’s picture

Subscribing.

mikeytown2’s picture

@achton
Can you test the patch #27 and report if it works for you? It helps our 7 web servers a lot.

mikeytown2’s picture

Status: Needs review » Reviewed & tested by the community

Been using #27 in production for the last 3 months; works very well.

ehudash’s picture

Applied the patch from #27.
Now I get the same error but for a different table:

Duplicate entry 'variable_init' for key 'PRIMARY' query: INSERT INTO semaphore (name, value, expire)

(Using D6 with Boost)

dmendo11’s picture

Mikey,

I did the samething like #31 and I am getting that error as well. Do we have to delete that and see if it works?

Any help is appreciated.

David

mikeytown2’s picture

This error Duplicate entry 'variable_init' for key 'PRIMARY' query: INSERT INTO semaphore (name, value, expire) shouldn't be thrown, as the query for inserting that has an @ in front of it (lock_acquire()), which should suppress any errors thrown by it (see http://php.net/operators.errorcontrol). In reality this is not the case so lock_acquire appears to have a bug, as it doesn't suppress the race condition error for acquiring a lock; the patch in #27 doesn't have a bug in regards to this error, error is in lock.inc.

dmendo11’s picture

Mikey,

What I did is just install number #27. Do I have to do anything prior to Patch #27 then? Should I do the number #17 or #22 as well?

The reason I am interested is because I would get about 10 to 20 errors in the watchdog page like this:

 boost 11/02/2011 - 15:26 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:25 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:19 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:17 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:14 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:13 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:12 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:09 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:07 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:06 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:04 There are php errors on this page, ... Anonymous  
 boost 11/02/2011 - 15:01 There are Drupal messages on this ... Anonymous 

Thanks for your quick response.

David

mikeytown2’s picture

Port from pressflow with some modifications to it from my pressflow fork https://github.com/mikeytown2/6

Status: Reviewed & tested by the community » Needs work

The last submitted patch, drupal-561990-35-fix-variable_init.patch, failed testing.

mikeytown2’s picture

Status: Needs work » Needs review
StatusFileSize
new2.85 KB
ehudash’s picture

Thanks mikeytown2

I Applied patch #37...
It solved the original error like expected, but now I get a slightly different error

Duplicate entry 'variable_cache_regenerate' for key 1
query: INSERT INTO semaphore (name, value, expire) VALUES ('variable_cache_regenerate', '63824228350d1e4bc9b4692.81439270', 1355932890.636)

Is this related, or a whole different issue?

mikeytown2’s picture

@ehudash
It is a whole different issue. Luckily this is actually a database error you can ignore... odd I know. The code for lock_acquire actually suppresses this error so where is this error being reported?

ehudash’s picture

I see it in the /admin/reports/dblog under "boost" type:

There are php errors on this page, preventing boost from caching. ERROR:

Array

      [type] => 512
      [message] => Duplicate entry 'variable_cache_regenerate' for key 1
query: INSERT INTO semaphore (name, value, expire) VALUES ('variable_cache_regenerate', '151849102550d2f24540f987.74939510', 1356001891.2661)
      [file] => /var/www/mysite/includes/database.mysql.inc
      [line] => 135

Is this what you were looking for?
Drupal 6, Boost, NGINX, InnoDB tables (most of them)

EDIT: When boost is set to cache pages even with PHP errors, the error is ignored in the DB log.
Bug in Boost module?

mikeytown2’s picture

There is a bug is in the boost module. It sees the error but doesn't ignore it.

zerolab’s picture

Hi mikeytown2,

thank you for a great patch!
Using it in production on a large site and it works well.

Attaching a patch for Pressflow 6.28.111 (from https://github.com/pressflow/6/pull/37) in case anyone needs it.

Cheers,
Dan

Status: Needs review » Needs work

The last submitted patch, 561990-42-fix-variable_init-pressflow.patch, failed testing.

rfay’s picture

Status: Needs work » Needs review

Moving back to "Needs review" for #37 which is the patch in play here for D6, if I read this right.

jainmca4444’s picture

#11: variable_stampede.patch queued for re-testing.

jainmca4444’s picture

#27: drupal-561990-D6.patch queued for re-testing.

jcisio’s picture

Issue summary: View changes
StatusFileSize
new1.64 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch, 47: 561990-fix-variable_init-pressflow-47.patch, failed testing.

jcisio’s picture

Status: Needs work » Needs review
StatusFileSize
new665 bytes

Patch 47 failed because it was rerolled against Pressflow.

rfay’s picture

One thing I do notice - although #37 solves the stampede problem, I regularly see

Duplicate entry 'variable_cache_regenerate' for key 'PRIMARY'

In several different contexts, both in core areas and contrib. Anybody else seen this or started to understand it.

jcisio’s picture

Re #50 I get this, a lot, usually when there is cron. Two traces with this error:

in mysql_query called at /home/ttcn/live/www/includes/database.mysql.inc (143)
in _db_query called at /home/ttcn/live/www/includes/database.mysql-common.inc (42)
in db_query called at /home/ttcn/live/www/includes/lock.inc (117)
in lock_acquire called at /home/ttcn/live/www/includes/bootstrap.inc (569)
in variable_init called at ? (?)
in mysql_query called at /home/ttcn/live/www/includes/database.mysql.inc (143)
in _db_query called at /home/ttcn/live/www/includes/database.mysql-common.inc (42)
in db_query called at /home/ttcn/live/www/includes/lock.inc (117)
in lock_acquire called at /home/ttcn/live/www/includes/bootstrap.inc (569)
in variable_init called at /home/ttcn/live/www/includes/bootstrap.inc (1528)
in _drupal_bootstrap called at /home/ttcn/live/www/includes/bootstrap.inc (1443)
in drupal_bootstrap called at /home/ttcn/live/www/index.php (15)

In my guess, it is because of the concurrency between two SELECTs to check for the lock, then two INSERTs in different requests. But it's a separate issue.

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.