Justin says in #973436: Overzealous locking in variable_initialize().

as a D8 aside, now that we're trying to fix the race here (which exists in D6 as well), i wonder at what point we want to ditch the "hey, lets read it all in one go pattern". as long as we stick to that, we make the race-window much, much bigger than it ought to be.

we can't lock at the scope of the change, instead we have to lock at the scope of the whole variable table. it kinda reminds me of those poor applications that get stuck with MyIsam on mysql. oh, wait ;-)

i wonder if in D8 we couldn't move to something more like the old-registry caching system, where we let the persistent variable cache build up over time.

This is a great idea, shouldn't be too hard to do, and if it was sufficiently worthwhile could be backported to D7 without an API change.

We'd need a single cache entry, that caches both the value of variables, and whether they're not set anywhere (can't see a way to distinguish between not set and a cache miss otherwise). In a cold start situation, you'd have a lot of little queries, and a lot of cache_set(), but it's going to be a much smaller cache entry than with most sites, so shouldn't be too bad, and could be benchmarked with siege easily enough.

CommentFileSizeAuthor
#125 variable_cache.patch5.98 KBcatch
#123 variable_cache.patch4.68 KBcatch
#119 variable_cache.patch11.41 KBcatch
#117 variable_cache.patch11.39 KBcatch
#115 variable_cache.patch11.85 KBcatch
#111 multi-get.patch10.88 KBAnonymous (not verified)
#107 multi-get.patch10.3 KBAnonymous (not verified)
#104 multi-get.patch9.8 KBAnonymous (not verified)
#103 multi-get.patch9.8 KBAnonymous (not verified)
#102 multi-get.patch9.8 KBAnonymous (not verified)
#101 multi-get.patch9.73 KBAnonymous (not verified)
#99 variable.patch11.19 KBcatch
#98 var_patch_30_variable_get_cache_miss.png53.03 KBcatch
#98 var_patch_30_cache_miss.png55.84 KBcatch
#98 var_patch_30_cache_hit.png61.23 KBcatch
#98 var_2000_cache_miss_head.png73.21 KBcatch
#98 var_2000_cache_hit_head.png46.51 KBcatch
#98 var_30_d8_cache_miss.png76.28 KBcatch
#98 var_30_d8_cache_hit.png46.01 KBcatch
#96 variable.patch12.96 KBcatch
#94 variable2.patch12.6 KBcatch
#91 variable.patch12.76 KBcatch
#90 variable.patch11.28 KBcatch
#88 variable.patch11.82 KBcatch
#86 variable.patch11.52 KBcatch
#83 variable.patch10.44 KBcatch
#80 variable.patch10.47 KBcatch
#78 variable.patch8.91 KBcatch
#74 variable.patch16.02 KBcatch
#67 variable_cache.patch15.94 KBcatch
#65 variable_cache.patch14.3 KBcatch
#63 variable_cache.patch13.62 KBcatch
#62 variable_cache.patch14.08 KBcatch
#59 variable_caching.patch13.6 KBcatch
#54 variable_987768.patch13.35 KBcatch
#52 variable_987768.patch13.35 KBcatch
#51 variable_987768.patch13.31 KBcatch
#47 variable_987768.patch13.44 KBcatch
#46 variable_987768.patch13.38 KBcatch
#43 variable_987768.patch13.43 KBcatch
#40 variable_987768.patch13.96 KBcatch
#39 variable_987768.patch13.95 KBcatch
#38 variable_987768.patch15.76 KBcatch
#36 variable_987768.patch15.77 KBcatch
#32 variable_987768.patch14.16 KBcatch
#31 variable_987768.patch14.9 KBcatch
#29 variable_987768.patch14.95 KBcatch
#27 variable_987768.patch14.44 KBcatch
#24 variable_987768.patch12.7 KBcatch
#20 variable_987768.patch12.71 KBcatch
#18 variable_987768.patch10.72 KBcatch
#17 variable_987768.patch11.32 KBcatch
#14 variable_987768.patch11.05 KBcatch
#10 variable_987768.patch10.95 KBcatch
#8 variable_987768.patch8.89 KBcatch
#6 variable_987768.patch8.11 KBcatch
#4 variables.patch6.94 KBcatch
#3 variable.patch5.77 KBcatch
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Building the variable cache over time would be nice for memory, and would probably have a measurable performance impact on normal page caching where variable_init() still happens (and not a lot else). In fact it might even allow for skipping the variable loading altogether on some sites, since we'd not actually need to have variable_initialize() at all.

On a D6 site I found over 2,000 records in the variable table, of which around 300 were from uninstalled modules that hadn't cleaned their variables up over time. This is also something that could be eventually backported to D7 if it wasn't too intrusive.

larowlan’s picture

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Active » Needs review
FileSize
5.77 KB

While I doubt this'll get in before release, I'm moving it back to D7 for a look since it's at least backportable (and potentially to Drupal 6 much since not a lot has changed).

The variable cache on a Drupal 6 site I'm profiling uses 1.2mb of memory in variable_init(). This site has 2272 entries in {variables}. Each page needs around 100 unique variables (to the page, most of these are shared between pages) to actually be served.

Another Drupal 6 site has 1276 entries in that table, this is after I did a major prune (down from 1800 or so) a year or so back.

A standard install has 30 variables set in the variables table by the time you hit the front page for the first time.

So there's some trade-offs here:

1. Every time a new variable is requested, we have to do a new cache_set(). I didn't see a way around this without locking cache writes for the duration of a request (between the first variable cache miss and drupal_exit()) which seems a bit too long. However the cache is going to be very small when it's first getting built, so at least those are smallish queries.

2. We have to cache information for variables that have never been set as well, to avoid looking up in the database just to see if they've been set or not. That means if you have more variables using their default value than set, the patch won't be more efficient.

So overall this makes me think the patch will be more or less memory neutral on a clean install, but should come into its own once a few settings pages have been saved.

Every Drupal site I've ever looked at that's been around for more than a year or so, has always had dozens if not hundreds of variables left stale in the {variable} table due to contrib modules being disabled or uninstalled and not cleaning up after themselves properly. Currently we load an unserialize these on every single request despite them never being used.

I'm also hopeful that this should help to unblock the hook_variables() patch, which has been stalled on performance for a while, since with our current model it's necessary to build the full $conf array up front and having multiple sources for that can pretty much kill page caching.

One @todo here is that I'd like to add a variables_are_cacheable(FALSE); function similar to drupal_page_is_cacheable() - and call this from system_settings_form() to stop just viewing a form polluting the cache. That's not done yet, but Drupal installs and clicks around OK.

Once this passes tests I'll do a D6 backport and try to test it on one of those sites, so that we can get an idea how big the cache entry gets with a realistic data set.

catch’s picture

Issue tags: +Performance, +memory
FileSize
6.94 KB

Removed some debug cruft, improved the locking.

Status: Needs review » Needs work

The last submitted patch, variables.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
8.11 KB

New patch:

* Removes all locking.
* Only sets the variable cache once at the end of the request if new variables were found.
* Gets a fresh copy of the variable cache immediately prior to setting it - this means we ensure that the variable caching is always additive - we won't overwrite variables cached by other processes during the course of the request, and we also skip writing to the cache if another request has built exactly the same set of variables. This should reduce stampedes while still not blocking any requests via lock_wait().
* Changes variable_set() and variable_del() to explicitly set the cache arrays to empty arrays rather than clearing the cache, this allows us to detect whether we're in a cold start situation or have just had a variable crud operation in variable_write_cache(), so should avoid stale cache entries being set.
* There are likely some very small race conditions with these two, but they should be smaller than both current HEAD and Drupal 6.

Haven't run tests on this, nor profiled it again yet.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
8.89 KB

Cleaned up some errors.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
10.95 KB

This one should fix simpletest.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Or not. So the issue is that now when there's a variable_get() after variable_init() that doesn't find anything in $conf or $cache, it will go off and query the database, but in DrupalWebTestCase::setUp() the db is pulled from underneath it, it tries to query a table that doesn't exist, and you get exception without a stackframe on line 0. Don't have a fix for this yet.

catch’s picture

Status: Needs work » Needs review
FileSize
11.05 KB

Let's try this.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Better idea for memory usage:

Keep variable_initialize() the same as here, but this does cache_get('variable_bootstrap_cache', 'cache_bootstrap'); instead

In _drupal_bootstrap_full(), right at the end, add two functions:

1. variable_bootstrap_cache() - caches the variables requested between variable_initialize() and the end of full bootstrap (if new variables were requested) - this is a global cache across the site same as now.

2. variable_initialize_full() - this is the same as variable_initialize() except it starts at the end of full bootstrap, and caches per strtok(current_path(), '/'); - so an entry per 'node', 'taxonomy', 'admin'.

Sites are going to have maybe 10, 15 top level system paths like that, so this is more cache entries, but each one should be a lot smaller - since they only have stuff collected on those paths, and don't contain anything already loaded during bootstrap.

Implications of this:
- we may see a significant speed up with normal page caching - this currently does a full variable_initialize() and the cache entry to load and unserialize is going to be 60-95% smaller depending on the site. Will try to do benchmarks and profiling of this once there's a patch.
- it's two database queries rather than one if you use database cache, but loading and unserializing less data will be less. Also if you really care about the extra db hit, you can install a non-db backend.
- memory should be /much/ lower than currently, reckon we could save a full mb of memory on some sites.
- it will split the variable system between bootstrap.inc and common.inc, but that's an existing problem we have for many different things already.

catch’s picture

Status: Needs work » Needs review
FileSize
11.32 KB

Here's a patch, still needs work but basics are in place.

I created 500 variables on a stock D7 install (it started out with around 50 after saving a handful of forms) to simulate a site where a lot of variables have been set. Testing with a more realistic setup would be welcome. However a very rough survey of a couple of sites shows only around 50-60 variables actually requested on most pages, whereas these had 1-2,000 in the database, so it might not be too far off, or even a bit conservative.

All 500 variables look like:

array(
 'foo',
 'bar',
 'baz',
);

Numbers:

HEAD:
Memory used at: devel_boot()=1.98 MB, devel_shutdown()=5.71 MB, PHP peak=6.25 MB.
Patch:
Memory used at: devel_boot()=1.39 MB, devel_shutdown()=5.12 MB, PHP peak=5.5 MB.

So that's 600kb of memory saved, as noted above, the full cache can be more than 1.2mb on some sites.

Also benchmarks with normal page caching:

HEAD:

Server Software:        Apache/2.2.14
Server Hostname:        d7.7
Server Port:            9090

Document Path:          /
Document Length:        9068 bytes

Concurrency Level:      1
Time taken for tests:   3.022 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      4741000 bytes
HTML transferred:       4534000 bytes
Requests per second:    165.47 [#/sec] (mean)
Time per request:       6.043 [ms] (mean)
Time per request:       6.043 [ms] (mean, across all concurrent requests)
Transfer rate:          1532.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     6    6   0.5      6      15
Waiting:        5    5   0.4      5      13
Total:          6    6   0.5      6      15

Percentage of the requests served within a certain time (ms)
  50%      6
  66%      6
  75%      6
  80%      6
  90%      6
  95%      7
  98%      7
  99%      7
 100%     15 (longest request)

And with the patch:

Server Software:        Apache/2.2.14
Server Hostname:        d7.7
Server Port:            9090

Document Path:          /
Document Length:        9068 bytes

Concurrency Level:      1
Time taken for tests:   2.611 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      4741000 bytes
HTML transferred:       4534000 bytes
Requests per second:    191.50 [#/sec] (mean)
Time per request:       5.222 [ms] (mean)
Time per request:       5.222 [ms] (mean, across all concurrent requests)
Transfer rate:          1773.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     5    5   0.8      5      18
Waiting:        4    5   0.4      5       8
Total:          5    5   0.8      5      18

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      5
  75%      5
  80%      5
  90%      6
  95%      6
  98%      7
  99%      8
 100%     18 (longest request)

Would be great to get this verified on someone else's system, but that looks like a ~15% improvement for normal page caching.

Not in the patch yet, but this also allows us to only clear the main variable cache if a variable is actually cached in it.

There's still some cruft in here from the previous iterations of the patch, but CNR for the bot to look at it.

catch’s picture

FileSize
10.72 KB

Slightly less cruft.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Title: Build variable cache over time » Optimize variable caching
Status: Needs work » Needs review
FileSize
12.71 KB

This should pass tests, other @todos still outstanding but uploading for the bot.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

FileSize
12.7 KB

Should..

catch’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
14.44 KB

That upgrade test is tricky, this one should fix everything but three failures in one upgrade test plus batch.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
14.95 KB

Have batch and upgrade tests passing locally now.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
14.9 KB

Now fixed the cron test, let's see if the bot is a bit happier.

catch’s picture

FileSize
14.16 KB

Minus some debug cruft from batch.test. There's some extra optimizations doable here - for example we don't need to clear the full cache on cache_set() any more, but this could do with review.

podarok’s picture

Status: Needs review » Reviewed & tested by the community

#32

7.0 release w/o patch

ns2# ab -c5 -n1000 http://test.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking test.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        test.com
Server Port:            80

Document Path:          /
Document Length:        29103 bytes

Concurrency Level:      5
Time taken for tests:   11.993 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      29515000 bytes
HTML transferred:       29103000 bytes
Requests per second:    83.39 [#/sec] (mean)
Time per request:       59.963 [ms] (mean)
Time per request:       11.993 [ms] (mean, across all concurrent requests)
Transfer rate:          2403.44 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.9      0      22
Processing:    50   60  14.1     57     250
Waiting:       50   59  14.1     57     249
Total:         50   60  14.2     57     250

Percentage of the requests served within a certain time (ms)
  50%     57
  66%     58
  75%     60
  80%     61
  90%     67
  95%     77
  98%     96
  99%    129
 100%    250 (longest request)

7.o release w #32 patch

 ns2# ab -c5 -n1000 http://test.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking test.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        test.com
Server Port:            80

Document Path:          /
Document Length:        29103 bytes

Concurrency Level:      5
Time taken for tests:   11.146 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      29515000 bytes
HTML transferred:       29103000 bytes
Requests per second:    89.72 [#/sec] (mean)
Time per request:       55.730 [ms] (mean)
Time per request:       11.146 [ms] (mean, across all concurrent requests)
Transfer rate:          2585.97 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0      10
Processing:    49   55   6.9     54     137
Waiting:       48   55   6.9     54     137
Total:         49   56   6.9     54     137

Percentage of the requests served within a certain time (ms)
  50%     54
  66%     55
  75%     56
  80%     57
  90%     60
  95%     68
  98%     77
  99%     85
 100%    137 (longest request)

7.5% speedup
nice "catch"

catch’s picture

Thanks for the benchmarks podarok. Could you do a SELECT COUNT(*) FROM variable; and post the results here?

There's still a couple of things I want to do here and it could do with some additional reviews, so I'm marking it back to CNR for now.

catch’s picture

Status: Reviewed & tested by the community » Needs review
catch’s picture

FileSize
15.77 KB

Here's the modification to variable_set() and variable_del() - when using these functions, the variable cache in the bootstrap won't be cleared - either because the variable wasn't in there in the first place, or because we load the cache entry, modify it, and write it back. Batch and variable tests continue to pass.

podarok’s picture

SELECT COUNT( * )
FROM variable

SQL result

Host: localhost
DB: test
Час створення: Лют 12 2011 р., 15:33
Згенеровано: phpMyAdmin 3.3.7 / MySQL 5.1.50-log
SQL-запит: SELECT COUNT(*) FROM variable;

Рядки: 1

COUNT(*)
508
catch’s picture

FileSize
15.76 KB

Thanks podarak.

Uploading a new revision - this fixes a bug in variable_set() - was setting 'defaults' rather than unsetting, not a functional bug since defaults never gets consulted unless the variable is set anyway, but means the array could be left bigger than necessary until the next actual clear.

catch’s picture

FileSize
13.95 KB

Spoke to chx and beejeebus in irc. Trying to simplify this a bit:

1. I dropped caching a separate array of variables that should use defaults, in favour of caching the default value itself as part of a single cache entry. This simplifies things a lot, the only trade off is that changing the default value of a variable will require a cache clear for it to be picked up, but that's it.

2. I took out the count tracking between initialization and write (although left in the counts at the end to keep some write stampede protection), this also simplifies things a fair bit, and chx pointed out that if you had both a variable_del() and a variable_set() during one request, the count tracking won't catch that anyway, so meh. We already have race conditions in the variables system, and that needs its own dedicated issue.

catch’s picture

FileSize
13.96 KB

Missed on strtok(), also missed an unset in variable_del(), and one comment typo cleaned up.

Anonymous’s picture

subscribe.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
13.43 KB

Removes variable_get_from_storage(), only used once anyway.

4 files changed, 160 insertions(+), 58 deletions(-)

sun’s picture

Interesting.

Though, why does variable_set() re-retrieve the cache when everything should be readily available already?

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
13.38 KB

Removed the cache updating from variable_set() and variable_del() and allowed variable_write_cache() to do it instead.

catch’s picture

FileSize
13.44 KB

This should help to answer sun's question, all comments relating to that are in one place now.

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

Anonymous’s picture

I'm still on the fence about this patch.

Right now, I'd prefer to allow modules to signal which variable's to always load, and use that to set a flag in variable_set().

IMHO this is easier to think about, and more powerful.

catch’s picture

Setting a flag on variable_set() won't help with variables that are requested, but aren't set in the database, of which there can be a lot. I'm also trying to keep changes here to ones that can be applied to D7 - whether they go into HEAD or the D7 branch of Pressflow, rather than a full refactoring.

Will try to have a look at installation failures later.

catch’s picture

Status: Needs work » Needs review
FileSize
13.31 KB

Fixed the installer. Tried to use variable_cache_reset() in DrupalWebTestCase - that bit's untested.

catch’s picture

FileSize
13.35 KB

Now it installs there were errors in variable_write_cache().

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
13.35 KB

Status: Needs review » Needs work

The last submitted patch, variable_987768.patch, failed testing.

emdalton’s picture

If the patch in #38 works and fixes the performance problem, could it be backported to D6? I seem to have a lot of these variables left over from old modules (some from D5).

catch’s picture

I would like to backport this at least to Pressflow, very much doubt anything like this will get into D6 core since that's stable now, but would like to see an agreed approach make it into D7/8 first.

DanPir’s picture

subscribe

catch’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs work » Needs review
FileSize
13.6 KB

Fixed the exceptions, haven't run all tests though.

I also modified the cache miss tracking a bit, so it's 1. a bit more self documenting 2. should now be around half the memory usage of previous patches. Will re-profile and re-benchmark soon, but would be nice to have more tests passing again first.

catch’s picture

Issue tags: +Needs backport to D7

Adding tag.

Status: Needs review » Needs work

The last submitted patch, variable_caching.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
14.08 KB

This fixed exceptions in one test and fails in another, did not run full suite again.

catch’s picture

FileSize
13.62 KB

Without the .htaccess change.

Status: Needs review » Needs work

The last submitted patch, variable_cache.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
14.3 KB

grr, down to two fails locally.

Status: Needs review » Needs work

The last submitted patch, variable_cache.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
15.94 KB

Alright, fixed those two tests, and fixed some performance regressions in the patch itself.

This is now neutral for performance with out of the box Drupal 7 standard profile, both with and without page caching. I still need to test it with a large-ish variables table which is when the performance and memory improvements would kick in.

Anonymous’s picture

Status: Needs review » Needs work

the more i look at this patch, the more i think we're headed down the wrong road.

seems we're still open to races in 'cache_bootstrap' cid, as we can write back cache values in one process that miss an update to the canonical data from another thread.

i stand by the first couple of paragraphs catch quotes in the original post, but i think we need to bite the bullet and make some interface changes, or make variable loading pluggable, or both.

catch’s picture

Status: Needs work » Needs review

We already have race conditions with variable caching in all versions of Drupal. Since the purpose of this issue is not to fix all of those conditions, that's not a reason to mark it needs work.

I think if the configuration stuff happens in Drupal 8, we won't have a recognisable variable system in D8. So I'm mainly looking for something that can stop the bleeding (of which there is a lot on every site I look at in relation to variable_set()) in 7 and D6 pressflow. The patch here is definitely ugly though, and it's not getting any prettier dealing with some of the odder uses of the variable system in core.

catch’s picture

Discussed this a bit with Justin in irc he suggested +http://drupalbin.com/18286 <---- variable_write() simplified as a cleanup to variable_write_cache() - if we do that we can actually clean up some more stuff.

I need to:

1. Add a lot of comments
2. Add an issue summary
3. Generate a bloated variable table to get better numbers on what this saves.

Dries’s picture

Status: Needs review » Needs work

I'm intrigued by this patch. It translates to some great memory and performance savings.

That said, I'd love to see a cleaner, more intuitive API -- the current API feels rather convoluted.

+++ b/includes/bootstrap.inc
@@ -822,14 +826,82 @@ function variable_set($name, $value) {
+ * @param $options
+ *   Optional paramater to allow updating the cache for a specific variable.
+ *   This is an array in the form of array('name' => $name, 'value' => $value)
+ *   or array('name' => $name, 'unset' => TRUE).
+ */
+function variable_write_cache($cid, $bin, $options = array()) {

For example, I'm not a big fan of the $options parameter. It's too much of a Drupal-ism.

Lars Toomre’s picture

I come to this issue from #973436: Overzealous locking in variable_initialize() and now have read through all of the comments as well as the most recent patch. I very much am in favor of this incremental building of the static variables array. It will help both to lessen memory use and to reduce compute cycles.

I too have seen variable tables in D6 with over 2000 entries and wondered why so many had to be loaded with each and every page load. I also have wondered why it was necessary to rebuild with each variable_set()/variable_get() with so many entries.

Regarding Dries' comment in #71, it appears that there are three calling patterns (is pseudo code):
a) variable_write_cache($cid, $bin),
b) variable_write_cache($cid, $bin, $op='unset', $name), and
c) variable_write_cache($cid, $bin, $op=other, $name, $value).

Perhaps it makes sense then to use the function profile instead:
variable_write_cache($cid, $bin, $name=NULL, $value=NULL, $clear=FALSE)

Looking at the code and pretending that I needed half of the above 2000+ variables to construct a page request, I gather that I would need to do on the order of 1000 data base requests because of the db_query now embedded in variable_get(). While it is an extreme case, it makes me question whether in the hypothetical case whether it is not better to retrieve all 2000+ variables in one data base call.

That leads me to the thought that we might want to create another function like variables_static_load(). Ideally, this would take either a variable name pattern or an array of variable names. It then would do a (potentially incremental) bulk loading of that subset of variables with one data base call into the $conf GLOBAL variable.

Hence, if on a node page (with taxonomy and comment modules enables) one could bulk load 'node_', 'comment_' and 'taxonomy_' variables in at most three and potentially one data base retrieval call. Similarly, contrib modules could bulk load their module specific variables with one data base call. Variables missing from the bulk load would still be caught by this new variable_get() as a fall back. This way cache misses would be reduced and variable_get() usually quite quick.

The thought with the list of variable names is that one might want a group of variables that do not fit a pattern. For instance, in an ideal world for anonymous visitors with page caching enabled, one might bulk load a subset of those variables that were in the $conf[] array x% of the time for the last y page loads. In essence, one would be pre-loading a high percentage of the variable cache hits in one go and yet further improving on performance.

To implement this bulk load thought, we probably also would need to add a hook in the variable_initialize() to allow modules to bulk load upon variable_initialization. We probably also want to define another hook that would be called at the end of the page creation process that would allow for the recording of variables in memory under whatever conditions need to be recorded.

One final thought: Perhaps we want to rename the function variables_initialize() [plural form], since it clearly would be doing a plural operation? We also might want to introduce the functions variables_set(array) and variables_del(array) for bulk operations that minimize the rebuild and data base calls.

Thoughts??

pillarsdotnet’s picture

catch’s picture

FileSize
16.02 KB

Looking at the code and pretending that I needed half of the above 2000+ variables to construct a page request, I gather that I would need to do on the order of 1000 data base requests because of the db_query now embedded in variable_get(). While it is an extreme case, it makes me question whether in the hypothetical case whether it is not better to retrieve all 2000+ variables in one data base call.

That leads me to the thought that we might want to create another function like variables_static_load(). Ideally, this would take either a variable name pattern or an array of variable names. It then would do a (potentially incremental) bulk loading of that subset of variables with one data base call into the $conf GLOBAL variable.

Hence, if on a node page (with taxonomy and comment modules enables) one could bulk load 'node_', 'comment_' and 'taxonomy_' variables in at most three and potentially one data base retrieval call. Similarly, contrib modules could bulk load their module specific variables with one data base call. Variables missing from the bulk load would still be caught by this new variable_get() as a fall back. This way cache misses would be reduced and variable_get() usually quite quick.

Since the variables to be cached are in drupal_static(), nothing stops custom code or a contrib module from doing this in hook_boot() or hook_init(), however it would complicate the current patch (which is already very complicated) to add this explicitly.

On the 1000 database queries, those will be spread out over a lot of requests, over time. In testing, most requests ask for at most 60-80 variables each, at least 30 of these from during the bootstrap process, so it's going to be a maximum of 10-50 database queries on cache misses per page (a full variable_initialize() at the moment takes around 100ms, so that's probably going to be about the same even worst case) - and most sites only have a few public facing system path roots ('node', 'user' 'taxonomy' - then some views and panels or similar).

One final thought: Perhaps we want to rename the function variables_initialize() [plural form], since it clearly would be doing a plural operation? We also might want to introduce the functions variables_set(array) and variables_del(array) for bulk operations that minimize the rebuild and data base calls.

That's such a good idea there's already an issue for it ;) #317930: Allow multiple values in a single variable_set() call

Uploading a new patch - this does not pass tests locally, I get one fail in the page cache test which is proving hard to debug. Uploading to see if the testbot can reproduce, and more importantly if there's other failures which might expose the bug better.

This also changes the function signature of variable_write_cache()ys

variable_write_cache($cid = 'variable_cache', $bin = 'cache_bootstrap', $op = NULL, $name = NULL, $value = NULL)

Before anyone says "but we removed $op!!" - we removed it for hooks, it still makes sense when the argument is only ever going to be called from two functions.

catch’s picture

Status: Needs work » Needs review

CNR for the bot only.

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

Fabianx’s picture

Subscribe.

catch’s picture

Status: Needs work » Needs review
FileSize
8.91 KB

Alright, starting from scratch here in terms of code, using CacheArrayObject from #402896: Introduce DrupalCacheArray and use it for drupal_get_schema().

Functionally the patch differs from some of the revisions here, I'll try to summarize the two approaches:

Approach 1 (represented by early patches and this one):

- Keep a single cache entry.
- build that cache entry over time (only set it once per request though)
- to deal with NULL/defaults, store information about variables that were never set.
- update cache items when variables are changed or deleted rather than clearing the cache and starting again.

Approach 2 (represented by #74, does not yet have CacheArrayObject implementation):

- keep a bootstrap cache entry which is global (up to hook_init()), then a per-system-path-root (admin*, node*, user* etc.) cache for everything that's not included there. Everything else is the same.

The CacheArrayObject version is much easier to backport, so I will check the difference in memory on a D8 site with a few settings pages saved, vs. a D6 site where the current variable cache is around 1.3mb - if it's a wash for the first and an improvement for the latter, then we should likely keep the single cache entry for simplicity, and because it'll have a higher cache hit rate.

This also changes the locking behaviour for the cache entry. If there is a cache miss, we no longer do a recursive lock_wait() waiting for it to be rebuild, instead every process will individual request variables from the table (up to 50+ queries per page but all fast selects) - then only one process gets to write to the cache at any one time, if processes miss the lock write, they just throw away the cache item. There is some work to do here on making it atomic.

I was able to install D8 with this patch and profiling suggested the caching was working, interested to see how badly it fails the test bot so marking CNR for that.

Here's a diffstat (if you take the CacheArrayObject hunks out of the patch, they are there because it depends on them):

catch@catch-toshiba:~/www/8$ diffstat variable2.patch
 bootstrap.inc |   58 ++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 32 insertions(+), 26 deletions(-)

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
10.47 KB

drupal_web_test_case::setUp() is not friendly to object destructors that hit the database. This is the same as #1188316: DrupalUnitTestCase incompatible with any classes with destructors that access the database. Simpletest's own tests pass locally with this patch but I didn't run the full test suite.

Anonymous’s picture

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
10.44 KB

A lot of the exceptions are because unsetting a non-existent array key on ArrayObject throws a notice whereas it doesn't for a 'real' array. The only places I can find that do this are variable_del() and drupal_web_test_case.php so far.

This still has some test failures, but should be a lot less. Apologies for abusing the test bot like this (not that it usually stops me).

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
Issue tags: -Performance, -memory, -Needs backport to D7

double post.

catch’s picture

Issue tags: +Performance, +memory, +Needs backport to D7
FileSize
11.52 KB

OK so a few things going on here, this slightly differs from some of the other CacheArrayObject patches in the queue, will try to explain how:

variable_get() is a weird function since, especially with the default value, it works when there is no database, for items that are set in settings.php, in the db, or neither. This makes it tricky to refactor since it is literally anything goes. That is a large reason why the patches on this issue got progressively uglier the closer they got to working.

There are a couple of possible ways to do this with CacheArrayObject, but I went with making $conf an instance of the class directly. The usual tricks like $conf['foo'] = 'bar'; work as they would normally. I needed to override offsetUnset() to avoid unset() on a non-existing key from throwing a notice - could not find documentation for that behaviour on php.net so wondering if it's a bug. Will see if I can do a test case and a bug report to link to.

There is one search test that shifts from DrupalUnitTestCase to DrupalWebTestCase - if a unit test is indirectly calling varlable_get() then it's not a unit test (i.e. if the site running the tests has the variable set in settings.php it might break), this previously silently worked, but now missing variables can trigger a cache miss and hit the database, which DrupalUnitTestCase doesn't like. This is the sort of thing CMI should help with - presumably we'll be able to dependency inject configuration for unit tests once that is in.

Architecturally the main change from the other patches is that CacheArrayObject now defines/redefines some public methods. These allow variable_set() to directly manipulate variable caching. If the variable API was stricter, we could have done that with offsetSet() - but that is used for $conf['foo'] = 'bar'.

The patch continues to write to cache once per page when building the cache, but immediately writes to cache when variable_set() is called - this is partly for testing purposes and partly to maintain current behaviour.

I haven't run all tests, so there still might be a couple of failures, but this new patch should fix some obvious ones.

There's one @todo I have in mind, which is to prevent caching of variables via system_settings_form() or system_settings_form_submit(), this needs some thought but it would help to avoid cache pollution if we can do it cleanly. Just clearing the cache from system_settings_form_submit() might also help.

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
11.82 KB

Apparently that fixed some failures but added others, this ought to improve things.

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
11.28 KB

Should be getting a bit closer. The unit tests fails because they are calling variable_get(), which makes them not real unit tests, didn't get to all those yet but this should fix some others.

catch’s picture

FileSize
12.76 KB

Potentially down to batch API and one or two others now.

catch’s picture

Fixed locale UI. I have an alternative approach in mind that would fix the empty() issue, variable_init() and variable_get() get a bit more complex with this, but would leave $conf as what it currently is.

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
12.6 KB

Here's the alternate approach, only ran some tests, this leaves $conf as is.

Status: Needs review » Needs work

The last submitted patch, variable2.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
12.96 KB

Most of those failures dealt with, this feels a lot better for backwards compatibility. Having trouble with that one batch test still, it's doing very strange things with variable_set() / variable_get(). May be time to take a break from fighting tests and get some performance numbers.

I think we can probably move the 'unit tests' back to DrupalUnitTestCase, although those are extremely fragile as they are currently - unit tests shouldn't be calling variable_get().

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

catch’s picture

Here's some performance numbers.

I profiled two situations on PHP 5.3 with xhprof:

1. A clean Drupal 8 install with the standard profile. There are 31 variables set on this install. On the front page, 50 variables are requested.

2. The same Drupal 8 install, except I did INSERT INTO 8.variable SELECT FROM big_d6_site.variable which increased the number of variables in that table from 31 to 2021. During previous surveys of Drupal 6 sites, every one has had between 1,000 and 2,200 variables in the system. This is at the top end of what I'd expect, but not that unusual.

So with 31 variables, this is how it looks:

Since the data set is very small, variable_initialize() is negligible both for cache hits and cache misses with and without the patch in terms of wall time. It takes around 10ms.

However memory usage varies quite a bit (note inclusive, and inclusive peak memory were very close for this one, so I'm not differentiating):

Drupal 8:
cache_hit() variable_initialize() takes 175kb of memory.

cache miss it takes 750k of memory.

Patch:
cache_hit: 180kb (5kb is likely the cost of having the object + caching NULL for variables that are requested but not set)
cache_miss: 175kb of memory

With the patch, a cache miss also causes variable_get() to do individual lookups:

This adds 22ms wall time, (50ms CPU time) and 107kb of memory.

So in this case the cache miss is a bit slower (and some other pages will also have to request variables and set the cache again), but the memory usage less than half (280kb vs. 750kb).

This is the worst case for the patch, it adds 20ms to a cache miss (ish), and saves 600kb of memory. Cache hits are about even.

With lots of variables, you start to see where the current pain points are a lot clearer:

With the patch, both cache hits and misses look exactly the same (a couple more variables are set that are being requested, so it looks like maybe 5-10kb more memory usage but barely noticeable). This is what you'd expect since the size of the cache and number of cache misses is still dependent on what is requested (which hasn't changed), instead of what's in the db (which has).

Without the patch, things get more interesting.

variable_initialize() is now loading and unserializing just over 2,000 variables, this means:

440ms of wall time (nearly half a second). Note that we currently lock every, single, request while this is happening - so that means that if the variable cache is cleared, everything is going to hang for half a second until the lock is released.

variable_initialize() also takes 2.7mb of peak memory (30% of total memory usage on this minimal install), and 1.8mb of inclusive memory (does not get freed).

With a cache hit, variable_initialize() takes 1.4mb of peak memory, and 1.2mb of inclusive memory (i.e. never freed during the request). So in this case the patch is an 80% reduction in memory usage (although over time the gap will slowly close as more variables are requested and cached).

Attaching xhprof screenshots, I'm probably going to backport this to Pressflow next and profile on a real install - that should give better numbers when more variables are actually in use. The other thing to look at is how many variables tend to be cache misses when visiting multiple different pages.

catch’s picture

Status: Needs work » Needs review
FileSize
11.19 KB

Slight update, overriding exchangeArray in CacheArrayObject so it zeroes out keysToSave(). Also removing s/Web/Unit to see if we're still on one fail.

Status: Needs review » Needs work

The last submitted patch, variable.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
9.73 KB

new patch attached, based on a different approach i discussed with catch in #drupal-contribute.

instead of storing the keys and values, we just store the keys for the cache items.

when initialising, we grab the keys, then multi-get to get the values.

while this adds a single round trip to the cache, it makes the issues related to locking around the variables table much simpler. also, it means that we clear/update cache items based on a single $cid.

i've ripped out the locking code - we may need to add some locking back, but it should be much easier to manage now.

Anonymous’s picture

FileSize
9.8 KB

catch pointed out i'd missed the bit where we actually, you know, write to the cache, so updated patch to write through cache misses in VariableCache:: resolveCacheMiss().

Anonymous’s picture

FileSize
9.8 KB

oh dear, another typo.

+    parent::__construct($cached ? $cache->data : array());

fixed, is now:

+    parent::__construct($cached ? $cached->data : array());
Anonymous’s picture

FileSize
9.8 KB

more dumbness fixed, this actually behaves as expected now.

catch raised the idea of a new base class for this pattern, as its right for a more than just variables, but not right for "build it all at once" type caches.

catch - with this approach, i couldn't get it to work with this:

+  $variables = $cache->exchangeArray(array());

not sure what i'm doing wrong.

also, perhaps we need a 'variables' bin?

catch’s picture

The exchangeArray() in recent patches here was mainly to free up memory (and avoid the foreach()), if we are not caching values except when consulting the db directly (i.e. never relying on the values in CacheArrayObject storage which could be affected by $conf), then #104 seems fine doing it that way.

So the trade-offs with this one:

- extra round trip (mainly noticeable with page caching, or cache backends that can't handle multi get).
- cold starts are going to have a lot more cache_set() (although at most a few hundred, which is nothing compared to the menu links caches if you have a few dozen pages).

However we never have to clear the whole cache (except complete cache flushes), there is less chance of stampedes etc. so it might well be a good trade-off. Will need to do some testing with this, interested to see if it has better luck with the test bot.

Status: Needs review » Needs work

The last submitted patch, multi-get.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
10.3 KB

updated patch checks for 'language_default' before trying to unset it.

i'm not sure if we can get rid of that line in DrupalWebTestCase entirely, or if this indicates a larger issue.

new patch does this simplest thing, so we can focus on the real fails.

Status: Needs review » Needs work

The last submitted patch, multi-get.patch, failed testing.

Anonymous’s picture

in url(), we do:

  if (!empty($GLOBALS['conf']['clean_url'])) {

not sure WhyTF it does, but this doesn't play nice with ArrayAccess. even though 'clean_url' is set to 0, that condition always evaluates as if its not empty.

php--

messing with $GLOBALS directly--

drupal--

catch’s picture

There's a lot of issues replacing $conf itself with ArrayObject, that was the reason for the exhangeArray approach in previous patches. empty() feels like a php bug, I found a won't fix issue from 2005. Also unset throws a notice when offers don't exist, this could be worked around by overriding offsetUnset() I think but didn't get far with that until changing tack entirely.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
10.88 KB

lets see what the bot thinks looks like if we don't mess with $GLOBALS.

that $GLOBALS smells like a performance hack. so this patch will probably have to add a hack to get around that hack. sigh.

Anonymous’s picture

re. #110 - ouch and ouch. i didn't really grasp how ugly and complicated this is. if there's no clear way through with the approach i started in #101 soon, i guess we should ditch it and move back to #99.

catch’s picture

It might be doable to use exchangeArray() or otherwise leave $conf as a normal array along with the multiget, sorry I didn't notice this when seeing the patch first time, would have tried to warn earlier. Won't be on irc until later tonight but let's discuss next time we're both on.

Status: Needs review » Needs work

The last submitted patch, multi-get.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
11.85 KB

Started converting #99 to the ArrayAccess version from #402896: Introduce DrupalCacheArray and use it for drupal_get_schema(). I think Justin's plan makes sense for a lot of things but it's going to be even harder to deal with than this is currently.

Main thing was needing to implement a version of exchangeArray() (since this is now just a custom method I changed that to replaceStorage()). However now rather than just batch tests failing, I appear to have broken batch API altogether, still in the middle of debugging that but uploading what there is, and to see what the bot thinks about it.

Status: Needs review » Needs work

The last submitted patch, variable_cache.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
11.39 KB

Removing .htaccess changes.

Status: Needs review » Needs work

The last submitted patch, variable_cache.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
11.41 KB

This fixes the installer/simpletest itself. Uploading to see where we are with test failures. I have a feeling this is so similar it's going to just be the 1 failure from before, have confirmed that one still fails but not run the rest of the test suite yet.

Status: Needs review » Needs work

The last submitted patch, variable_cache.patch, failed testing.

TonyGuy’s picture

Subscribed

suryaden’s picture

Status: Needs work » Needs review

#36: variable_987768.patch queued for re-testing.

catch’s picture

FileSize
4.68 KB

drupal_static_reset() in the batch test fixes that.

Status: Needs review » Needs work

The last submitted patch, variable_cache.patch, failed testing.

catch’s picture

Fixed those new test failures:

1. Notice was genuine, needed to override offsetUnset() in VariableCache.

2. UpgradeTestCase needed similar changes to DrupalWebTestCase (resetting drupal_static() when the database has been moved out of the way doesn't work if you static cache a class that accesses the database in __destruct().

Did not run full test suite so it's not impossible something else was introduced.

catch’s picture

Oh lovely green, how long it has been.

donquixote’s picture

Not sure what is the technical approach of the most recent patch (issue summary would be nice :) )

So, here is just my own thought before I found this issue:

New things in the db:
- Remember a "last changed" timestamp with each variable.
- Remember a "last cached" timestamp for the variable cache.

On variable_initialize,
- First load all variables from cache.
- Then load those variables with a more recent "last changed" timestamp directly from the db table, and override the values from the cache.
- If there is more than just a few values that were changed recently, we write the full array to the cache. Otherwise, we leave the cache untouched. (we might want a lock for this, not sure yet)

On variable_set():
- Write the new value to the db, with a timestamp.
- Don't do anything about the cache.

On cache reset:
Either just flush, or flush + rebuild. That's the same arguments as already discussed, I think.

This needs a bit of protection against race conflicts, but it's not any worse than other solutions or what we have now.
And it's a lot faster, I imagine.
Especially, a single module can no longer cause the variable cache to be flushed / rebuilt on each request.

catch’s picture

Status: Needs review » Closed (duplicate)

I'm going to mark this as duplicate of #973436: Overzealous locking in variable_initialize() and upload the latest patch from here to there.

Fabianx’s picture

Title: Optimize variable caching » [PP-1] Optimize variable caching
Issue summary: View changes
Status: Closed (duplicate) » Postponed

#973436: Overzealous locking in variable_initialize() decided to use a different approach, so marking this one postponed on that one.

catch’s picture

Version: 8.0.x-dev » 7.x-dev
Issue tags: -Needs backport to D7
yonailo’s picture

it's a pity that Drupal 7 doesn't take this approach by default, #973436 will not address the memory footprint improvements that this ticket does, so I dont quite understand why #973436 has more priority...

izmeez’s picture

Now that issue #973436: Overzealous locking in variable_initialize() has been committed should this issue be made active again and is the patch in comment #125 a further enhancement?