Consider removing the cache in variable_initialize() [#1595070]

The current implementation has two apparent problems:

Variables are cached in all cases, even when the default (database) caching is in use, in which case, they are simply copied from {variables} to {cache}. This is no faster (probably — slower), than simply taking them from {variables} at all times.
~~When cache is being updated, it may be done by multiple processes instead of just one~~. When cache is being updated by one process, others are stalled.

There is some more reasoning in my earlier forum post. The full text of the proposed implementation of the function is there too. A patch is attached.

Comment	File	Size	Author
#27	1595070-27.patch	4.1 KB	mpotter
#2	bootstrap.inc_.patch	1.64 KB	unitedwallabies
	bootstrap.inc_.patch	1.63 KB	unitedwallabies

Comments

Comment #1

22 May 2012 at 18:47

Status:

Needs review

» Needs work

The last submitted patch, bootstrap.inc_.patch, failed testing.

Comment #2

unitedwallabies commented 22 May 2012 at 18:56

Status	File	Size
new	bootstrap.inc_.patch	1.64 KB

Attaching the version of the patch to conform to `-p1' flag.

Comment #3

unitedwallabies commented 22 May 2012 at 18:57

Status:

Needs work

» Needs review

Comment #4

brockjo commented 1 June 2012 at 23:42

Status:

Needs review

» Reviewed & tested by the community

Patch Tested on:
-Drupal 7.14

Browsers Tested:
-Firefox 12.0
-Google Chrome 19.0.1084.52
-Safari 5.1.7

Themes Tested:
-Default "Bartik" theme 7.14
-Two of my custom subthemes (Zen based)

There were no negative effects observed from the patch. I havent noticed anything different since implementation. Because variable_initialize() is loaded on every Drupal request, this has been thoroughly tested on my site (threw it on a site I am building, 3 days stable now). Due to the nature of the patch I haven't really noticed any differences either, so it is hard to test completely.

Comment #5

marcingy commented 2 June 2012 at 00:27

Version:	7.14	» 8.x-dev
Status:	Reviewed & tested by the community	» Needs work

This need to be implemented in d8 and backported

Comment #6

unitedwallabies commented 4 June 2012 at 21:23

Thank you very much for the testing/confirmation, brockjo.

marcingy, what is the customary workflow — is the original submitter supposed to (re-)implement for D8, or will someone (you?) do that? Thanks!

Comment #7

tstoeckler

he/him

German

Essen, Germany

commented 4 June 2012 at 23:05

+++ new/includes/bootstrap.inc	2012-05-22 14:01:25.000000000 -0400
@@ -925,23 +925,21 @@
+  $usecache = isset($conf['cache_class_cache_bootstrap']) &&
+    $conf['cache_class_cache_bootstrap'] != 'DrupalDatabaseCache' ||
+    isset($conf['cache_default_class']) &&
+    $conf['cache_default_class'] != 'DrupalDatabaseCache';

This could use some parentheses to make the order of operations clear.

+++ new/includes/bootstrap.inc	2012-05-22 14:01:25.000000000 -0400
@@ -925,23 +925,21 @@
-      lock_wait($name);

Why are you removing the lock_wait()? That seems to be point 2. in your original post, but you should provide some reasoning for this.

+++ new/includes/bootstrap.inc	2012-05-22 14:01:25.000000000 -0400
@@ -925,23 +925,21 @@
+    $variables = array_map('unserialize',
+	db_query('SELECT name, value FROM {variable}')->fetchAllKeyed());

This is using a tab, please use spaces. Also this should be on one line.

In general this could use some benchmarks. If you can prove that this is an improvement for people using DatabaseCache I can see at least point 1. going in. (Again, I don't get what you're trying to achieve with 2.)

EDIT: Removed Dreditor double-post.

Comment #8

unitedwallabies commented 5 June 2012 at 15:28

This could use some parentheses to make the order of operations clear.

I could never understand, how the order of and (the multiplication) and the or (the addition) could be confused — by anyone... These are such basic operations of elementary middle-school calculus...

Why are you removing the lock_wait()?

Because, if this instance could not acquire the lock at the first attempt, it means, another instance got it and is already updating the cache — and the current instance need not bother. This not only makes sense at all times, but is also safer: when the locking begins to fail repeatedly (such as due to some database-server problem making it only partially available or memcached being restarted), the current code will aggravate the problem by sending PHP into a tight loop repeatedly trying to obtain a lock and recursively invoking variable_initialize() — we have seen it here. The proposed implementation will react to such degradation more gracefully — returning variables even if placing them into cache fails.

This is using a tab, please use spaces. Also this should be on one line.

Sorry, I was unaware of Drupal code's style-guide — and used that of FreeBSD by default :-) Easy to change, of course. If there is a formal style guide for Drupal, could you, please, post a link? I'll try to make this and future patches compliant.

In general this could use some benchmarks. If you can prove that this is an improvement for people using DatabaseCache I can see at least point 1. going in. (Again, I don't get what you're trying to achieve with 2.)

Well, the improvements will certainly be there, but will be hard to measure. Consider the DatabaseCache case. Currently the data is obtained from {cache} for most hits (one SELECT). When the cache is considered stale, the same data is obtained from {variables} (a simpler SELECT) and then stuffed into {cache} (a hundred or so INSERTs). But this refreshing of cache does not happen very often. So, one would need to pound the server for a while for the difference to be noticeable — and even then it may not exceed the margin of error. Also, the unqualified SELECT from {variables} is, of course, cheaper than a more complicated one (with WHERE-clause) from {cache}. But the modern machines are so fast, it may be hard to measure the difference. I'll try to set things up, but I can not afford to build a giant collider to confirm, what is obviously there :-)

Comment #9

tstoeckler

he/him

German

Essen, Germany

commented 5 June 2012 at 16:06

The lock_wait() point is, as far as I can tell, moot. The lock_wait() pattern was introduced exactly to allow multiple processes not to fail while one process is rebuilding the cache. Please open that as a separate issue, if at all.

Without some benchmarks I don't see any point in doing this at all.

Well, the improvements will certainly be there, but will be hard to measure.

If this makes something faster, than you can measure it, it's as simple as that. Stating the opposite is quite simply an excuse. For this patch you could for instance insert a couple hundred variables and then measure how long rebuilding the variable cache takes.

These are such basic operations of elementary middle-school calculus...

That is quite condescending, but I won't fight it. I just think parentheses make these operations easier to read, but I'm not really attached to that.

Comment #10

marcingy commented 5 June 2012 at 17:33

Status:

Needs work

» Postponed (maintainer needs more info)

Moving to postponed this needs bench marks before it can be considered further.

Comment #11

unitedwallabies commented 5 June 2012 at 19:28

Status:

Postponed (maintainer needs more info)

» Needs review

The lock_wait() point is, as far as I can tell, moot. The lock_wait() pattern was introduced exactly to allow multiple processes not to fail while one process is rebuilding the cache.

In the current implementation, all of the processes, that encounter stale cache, will rebuild it — one after another. In my implementation, only the first such process will rebuild the cache. The subsequent ones — those, that find the cache stale, but are unable to acquire the lock on the first attempt — will proceed without trying to rebuild the cache (with the values obtained directly from the database).

This saves some time in all cases (though still hard to measure because cache-rebuilding is very infrequent). In unusual cases, where the locking itself might be failing for a while, it prevents more serious problems as each process only attempts to obtain the lock once — better to not use cache at all, than to get stuck forever trying to rebuild it.

If this makes something faster, than you can measure it, it's as simple as that.

Light travels at different speeds in vacuum and in denser environments (such as gas or liquid). But measuring the difference is very difficult...

I've devised the new code and offered a theoretical proof, that it is no slower and usually faster than the existing implementation. I don't understand, why you are placing this burden of proof on me. Is it the coding style or the attitude of my comments or some other non-technical reason?

For this patch you could for instance insert a couple hundred variables and then measure how long rebuilding the variable cache takes.

Could you elaborate? How can I measure the time of this one function? What I am trying to do now is hit the freshly-installed Drupal site on my server with Apache Benchmarking tool (ab). But there are so many other variables (I can't even ensure, the server is not used by anything else during the benchmark), that the timings are different even from run to run — without any changes to the code. As I suspected, the margin of error is higher, than the theory-suggested improvement.

The server is a 4-core machine running FreeBSD/amd64. Web-server is Apache-2.2.21. PHP is 5.3.8. Drupal is 7.14. Vanilla install (no memcache, no apc). Benchmarking from a separate host on the same LAN with the following command-line:
ab -r -t 90 -k -n 100000 -c 7 -H 'Host: dp7.example.net' http://narawntapu/
Without the patch:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       9
Processing:   189  359 103.2    340    1246
Waiting:      162  301  85.5    286     984
Total:        189  359 103.2    340    1246

With the patch:

              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:   176  360 147.6    325    1093
Waiting:      150  293 114.5    266     920
Total:        176  360 147.6    325    1093

So, there is some improvement, but I don't think, it is conclusive... Until I started using the -k flag ("keep-alive") the overhead of establishing a new TCP connection for every hit was hiding all improvements completely.

I then installed PHP's apc-module and got a major improvement out of that, as expected. My patch still makes some difference, but is it conclusive? I would not be sure, if I did not know the theory behind it already...
Without my patch:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       4
Processing:    41   84  27.8     78     587
Waiting:       37   77  26.4     71     579
Total:         41   84  27.8     78     587

With my patch applied:

              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0      11
Processing:    41   83  24.7     79     333
Waiting:       37   77  23.5     72     327
Total:         41   84  24.7     79     333

In addition, here are some benchmarks talking to the MySQL server directly. Using the server's own BENCHMARK function, I demonstrate, that my approach is not any slower even when the cache is tiny ({cache_bootstrap} only has 6 rows here). Accessing the {variable} table directly:

mysql> select BENCHMARK(100000000, (select count(CONCAT(name, value)) from variable));
...
1 row in set (1.88 sec)

Obtaining the same data from {cache_bootstrap} as the code in cache.inc is doing:

mysql> select BENCHMARK(100000000, (select count(CONCAT(cid, data, created, expire, serialized)) from cache_bootstrap where cid in ('variables')));
...
1 row in set (1.88 sec)

So, why bother caching the data, if the retrieval from cache is no faster, than the retrieval from origin?

Comment #12

marcingy commented 5 June 2012 at 20:49

Status:

Needs review

» Closed (won't fix)

Why bother changing if there is no considerable performance improvement.

Comment #13

tstoeckler

he/him

German

Essen, Germany

commented 5 June 2012 at 21:03

Status:

Closed (won't fix)

» Needs review

The problem I am placing this "burden of proof" on you is that this patch introduces less abstract and somewhat "smelly" code. That is because variable_init() should really not know anything about cache backends. And this introduces a tight coupling on a spehttp://drupal.org/node/1595070#newcific backend.
Now if this would lead to a measurable improvement for users of the default cache backend then in this case the performance benefit would definitely outweigh the small amount of code smell, but if not there really is no point. So if you cannot show that the rebuild operation happens often and is significantly faster, then this patch is "won't fix".

Additional notes:

- The benchmarks you did, I think, simply accessed the home page and so did not measure the performance impact of rebuilding the cache. To benchmark I would use a simple script such as this one (untested):

const DRUPAL_ROOT = '/path/to/drupal';

require_once DRUPAL_ROOT . '/core/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// Insert some variables
for ($i = 0; $i < 200; $i++) {
  variable_set("variable_$i", "value_$i");
}

$start = microtime();

// Rebuild the variable cache 100 times and then initialize the variables.
for ($j = 0; $j < 100; $j++) {
  drupal_flush_all_caches();
  variable_initialize();
}

$end = microtime();

print $end - $start;

Again, I don't think it would even be worth it to write such a script, because variable rebuilding happens so seldomly that the improvement would have to be huge for this patch to be worth it.

In the current implementation, all of the processes, that encounter stale cache, will rebuild it

That is not true, please read the current implementation again. As I said, if we want to rework the lock pattern, please open a new issue, although I don't think you will get far with that, to be honest.

Leaving at "won't fix" per the previous post.

Comment #14

tstoeckler

he/him

German

Essen, Germany

commented 5 June 2012 at 21:04

Status:

Needs review

» Closed (won't fix)

...

Comment #15

unitedwallabies commented 5 June 2012 at 22:10

Status:

Postponed (maintainer needs more info)

» Needs review

Actually, come to think of it, my benchmarks did show an important improvement. While in most cases there is no difference, the maximum times decreased substantially after my patch. I explain this by the code not copying the data into cache any more — the current implementation does not do it every time, but, when it does, it takes measurable time to do. Here are the steps, that are currently done in this case:

SELECT from {cache_bootstrap} — happens for each hit
detect staleness
INSERT into {semaphore} — create the lock
SELECT from {variable}
INSERT into {cache_bootstrap}
DELETE from {semaphore} — remove the lock

the proposed implementation would simply do a single SELECT at all times — and that may be why, the "max" timing is continuously lower in my benchmarks, when my version of the function is used. Thus, my implementation is just as fast in the normal case, but much faster (14% without APC, 76% with APC enabled) occasionally.

I agree with you on the "smelly" sentiment — I just do not know the API enough to make it better. But you do... Perhaps, it should be possible to set caching provider to "None" — either for the entire bootstrap or just for variables somehow?

And then there is the other aspect — in the current implementation there is no check for why lock_wait() has failed. If the failure is due to a database problem or (if memcache is used to implement locking) due to memcached being restarted, the existing recursion becomes very deep and, if the server continues to process hits, can bring the server to its knees. We have seen it here... My implementation is not recursive and will only attempt locking once — providing for more graceful degradation in case of a problem.

There is no need to hold the lock while loading values anew — MySQL can handle thousands of simple SELECT-queries without a hitch. It is only when updating the cache, that locking (to prevent "stampede") is justified...

I had to modify your script a little:

const DRUPAL_ROOT = '/home/mi/dp7';

require_once DRUPAL_ROOT . '/includes/bootstrap.inc';

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

print "Bootstrapped\n";
// Insert some variables
for ($i = 0; $i < 200; $i++) {
  variable_set("variable_$i", "value_$i");
}

$total = 0.;

// Rebuild the variable cache 100 times and then initialize the variables.
for ($j = 0; $j < 100; $j++) {
  drupal_flush_all_caches();
  $start = microtime(true);
  variable_initialize();
  $end = microtime(true);
  $total += ($end - $start);
}

print $total . "\n";

The results:

Stock Drupal-7.14: 0.66572022438049
With my patch applied: 0.18468594551086

Comment #16

marcingy commented 5 June 2012 at 22:02

Status:

Closed (won't fix)

» Postponed (maintainer needs more info)

Your tests were based on apache workbench those are not valid tests for bench marking code. We need real bench marks against the php code as suggested in #13. Also this approach does indeed fail the smell test. And the lock prevents a cache stampeed which was it was added to this function, remove the lock introduce a stampeed. Back to needs info.

Comment #17

unitedwallabies commented 6 June 2012 at 16:29

Status:

Needs review

» Postponed (maintainer needs more info)

We need real bench marks against the php code as suggested in #13. Also this approach does indeed fail the smell test.

I used the method suggested in #13 — and appended the results to #15. I think, the results are rather impressive (3.5 times win!) — not that I expected anything less :-). Try it yourself...

And the lock prevents a cache stampede which was why it was added to this function, removing the lock introduces a stampede

The cost of stampede-prevention is higher, than the stampede itself in this case :-) Consider the situation, where N processes encounter cache-staleness at the same time.

My way:

there are N SELECT-queries to get the expired values anew from {variable}.
N attempts to acquire the lock with an INSERT (N-1 of the attempts failing).
1 process rebuilds the cache and DELETEs the lock, while N-1 process their requests.

The current way:

N attempts to acquire the lock with an INSERT (N-1 of the attempts failing)
1 SELECT-query to get the expired values anew, while N-1 processes sleep for 25 milliseconds (in lock_wait()).
N-1 SELECT-queries initiated by lock_may_be_available() — invoked from lock_wait().
1 process rebuilds the cache and DELETEs the lock
N-1 obtain values from the freshly rebuilt cache and, finally, proceed with their requests.

I posit, that the N SELECTs of my implementation are not any more expensive, than the N-1 SELECTs initiated by lock_wait() (and a certain K of these N-1 may have to repeat the SELECT again)... Moreover, while waiting, these N-1 processes aren't serving the clients — they are sleeping in the current implementation. In my implementation, they all go on processing the request with the values obtained from the database directly (while the very first one goes on to rebuild the cache under lock).

And then, in step 5, after the cache is rebuilt, these N-1 unfortunates still need to read the data from it, — which means N-1 more SELECTs from cache_bootstrap (or something faster, depending on the caching implementation used) — my way they all have read the data already in step 1.

To rephrase: while the cache-rebuilding should, indeed, be done by just one process, reading of the original data straight from the database (with a single SELECT) needs no such protection.

Although it may be possible, that some alternative implementation of lock_wait() is fantastically faster than the default one, N identical SELECT-queries cost most databases (with things like query-caching enabled) just as much as a single one. In other words, the N SELECTs from my step 1. is not any costlier, than the 1 SELECT from the current step 2.

Comment #18

unitedwallabies commented 7 June 2012 at 16:24

Status:

Postponed (maintainer needs more info)

» Needs review

Come to think of it, it is not even immediately obvious, that preventing even cache-rebuilding from stampede is always worthwhile... For example, when caching is done by memcache, but locking is still done by the default lock.inc (using the {semaphores} table), it may be cheaper to modify the cache by N processes simultaneously, than to have N INSERTs and one DELETE...

But this may be too radical and some caching implementations may even get corrupted by such simultaneous updating, so I still agree, that cache-rebuilding should be done under a lock. Yet I maintain, that there is no need for a lock to cover the reading of data from the DB...

Comment #19

damien tournoud commented 7 June 2012 at 16:53

Title:	Addressing two problems in variable_initialize()	» Consider removing the cache in variable_initialize()
Status:	Needs review	» Needs work

The theory (which was definitely true at one point) is that retrieving a single row from the database + deserializing it is faster then retrieving all the rows and deserializing them one by one.

Not sure if it is still true (given that MySQL buffers result sets), but anything that changes this should be heavily benchmarked.

Comment #20

unitedwallabies commented 7 June 2012 at 18:57

Status:

Needs work

» Needs review

The theory (which was definitely true at one point) is that retrieving a single row from the database + deserializing it is faster then retrieving all the rows and deserializing them one by one.

Damien, do you have references to the past benchmarking showing this to be true at some point? I wonder, when this was last measured and under what conditions...

I don't agree with the title-change — when using faster cache-implementations (like APC or memcache), it may still be better to cache variables. But it makes no sense to do it with default (DB-using) cache...

I'd add more... For a small site with a single database server it makes fairly little difference either way — on modern (or even last year's) hardware. Even if response time spikes on occasion (as it does now), no one will notice: despite a thousandfold difference, humans can't distinguish a millisecond from a microsecond anyway :-)

However, for bigger setups — those using multiple database servers with replication (either master->slaves, or multi-master) — each write (whether it is an INSERT, or UPDATE, or DELETE) to the database becomes more expensive, while the reads remain, for most intents and purposes free.

This means, using database for a caching back-end should be discouraged in general — not just for variables. It is the simplest method to ensure for somebody just getting started, but I doubt, variables-loading is the only place, that's unduly pessimized by using it.

Comment #21

marcingy commented 7 June 2012 at 18:53

There is nothing to review at the moment.

Comment #22

mpotter commented 2 August 2012 at 18:59

Status:

Needs review

» Needs work

On several of our sites I have found that performance *is* improved by using the cache. Especially if you do a lot of variable_set for any reason. A single test page is to make a for-loop of 100 variable_set (with 100 different variable names). It's an edge-case test but helps identify problems caused by this recursion and locking mechanism around variables. Hit that test page with "ab -n 100 -c 100 ..." and you'll start to see the problems.

There are a couple of issue threads on this whole variable cache and locking recursion around so I didn't want to create "yet another". But since this thread has gone "off the rails" the least amount (don't want to get into D8 and Symfony stuff to deal with my problem on D6) I will link to a pull request for Pressflow6 that addresses a similar solution to this problem: https://github.com/pressflow/6/pull/52. Should be pretty easy to port this patch to D6, D7, and D8 without getting into over-complex cache building schemes.

My approach started with just removing the cache as in the first patch of this issue. But because I found performance decreases, I added in a single check to see if the variable cache lock could be acquired. If the simple lock_acquire is successful, then I go ahead and write-thru the cache. Otherwise I clear-cache so the next page request will rebuild it.

So most of the time you get the good benefit of the variables cache. But it removes all of the recursion that leads to so many locking problems and waiting worker threads.

If I get some time, I'll try to post real patches for D6 and D7 here, but hopefully this will give others another place to look for a potential solution to this tough problem. I'm also interested in getting other smart eyes on this potential solution from people who have already been thinking about this issue.

Comment #23

andrea.campi commented 29 August 2012 at 14:42

I found this issue while investigating a surprising performance problem.

Some background: multi-node installation with APC, FPM, and both caching and locks on memcache. We are monitoring performance with NewRelic.

What we see from time to time is that a request will spend seconds (10s of seconds at time) in lock_wait.
The "funny" thing is that once the lock is acquired, the recursive call to variable_initialize will complete in ~100ms.

We are investigating in several directions (what is invalidating cache_bootstrap, and why would the lock be held for that long), but that's beside the point.
At least in our setup, I have to agree with unitedwallabies: letting the requests that "lose" the race go through and read the variables from the DB would be a huge win for us.

Comment #24

mrfelton commented 5 November 2012 at 03:19

I'm experiencing severe locking issues with the mysql processlist showing TRUNCATE cache_field, followed by several entries of SELECT filename, name, type, status, schema_version, weight FROM system WHERE type = 'module', all stuck in 'Opening tables' state. Searching on those query strings brought me here. This is with D 7.16, and its happening on an Aegir migrate operation.

Comment #24.0

mrfelton commented 5 November 2012 at 03:19

Issue summary:

View changes

Correct the description of the second problem being addressed.

Comment #25

moshe weitzman commented 22 July 2014 at 18:32

Version:	8.x-dev	» 7.x-dev
Issue summary:	View changes

variables are gone from d8

Comment #26

marcingy commented 24 July 2014 at 04:31

Category:

Bug report

» Task

Reclassifying

Comment #27

mpotter commented 25 July 2016 at 16:29

Status:

Needs work

» Needs review

Status	File	Size
new	1595070-27.patch	4.1 KB

Here is a patch for D7 that reflects the work done in D6 Pressflow mentioned in #22. This moves the variable cache rebuild to the shutdown and removes the bad recursion of variable_initialize()

Comment #28

stefan.r commented 26 July 2016 at 10:21

Issue tags:

+Performance

Comment #29

26 July 2016 at 10:21

Status:

Needs review

» Closed (outdated)

Automatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.

Consider removing the cache in variable_initialize()

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #24.0

Comment #25

Comment #26

Comment #27

Comment #28

Comment #29

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community