Standardize static caching
chx - May 4, 2008 - 11:07
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | patch (code needs work) |
Description
Instead of using static variables, we can use something like this patch does. If people like this one then we can refactor the rest of core in subsequent patches, this is just bootstrap.inc and whatever calls those functions, This is better than global variables because reset always resets to whatever the original function specifed as default. Also, for testing purposes we could include a reset of all static variables in one fell swoop. I would not to like to refactor evertything in one patch.
| Attachment | Size |
|---|---|
| drupal_static.patch | 12.11 KB |

#1
nice! will review this shortly...
#2
I do like the idea, it will solve lots of problems with static variable (especially the strange function signatures in some cases).
But I'm not convinced by the "default" argument, that basically duplicate the variable in all cases (including when the default is not useful at all). This looks like a waste of memory.
#3
Damien, I do not get your problem, why would it duplicate...? The default stays unchanged and the data itself gets changed by the caller because it's returned by reference. Also, the memory implications are rather small, the default is typically like 0 or '' or array().
#4
interesting. folks interested in this might also be interested in context module. It provides a static cache api as well ... Look especially at context.module itself, and not at its helper modules. Context has been useful for me on a couple of projects already.
#5
@chx: I was worried that some function could have a large $default. If its not the case (just something simple like
array()orNULL), I'm ok with it, but it should be documented.#6
context does than this very simple facility and it does not have a reset. It's a whole different thing, really. Mine is a search-replace drop in:
<?phpfunction function_name() {
static $foo = bar;
}
?>
becomes
<?phpfunction function_name() {
$foo = &drupal_static('function_name', bar);
}
?>
and then without getting
function_nameinvolved, it can be reset bydrupal_static_reset('function_name')nice and easy.#7
I'd like this to be added very much. I am currently working on refactoring Taxonomy to use one centralized place to save the term data. I would figure more modules are duplicating data. With this function a lot of static's will be checked or they can be merged to one place.
It would also allow you to break for example into 'roles' which are normally cached in a static allowing you to add roles on the fly much easier. It could however lead to unexpected behaviour because non-core modules will be able to change data they shouldn't, this should be checked for each static.
#8
(subscribing, possibly testing tomorrow)
#9
Subscribe.
#10
Subscribe.
#11
Adding to my watch list.
#12
<?phpfunction function_name() {
$foo = &drupal_static('function_name', bar);
}
?>
Ok I have two comments on this: one, why don't we use debug_backtrace? Two, what if we have multiple static variables func?
#13
One, debug_backtrace is not too fast, two, I was just explaining what the code does, if you would have looked at the code you'd have seen
$phase_index = &drupal_static('drupal_bootstrap_phase_index', 0);so it's not mandatory that you use the function name as the static name, it's just a possibility.#14
I strongly support this idea but the API is not quite right yet, though I am not yet sure I know what the right API is.
A major requirement we need to declare for Drupal coding standards is that using static caching as a "performance enhancement" in any way that makes it possible for any caller of any public API function to tell that static caching is enabled is a bug as severe as "the API function does not work at all." This does not apply to functions that intentionally build up internal state like drupal_set_header(), but consider:
* $node = node_load($nid)
* modify $node
* node_save($node)
* $new_node = node_load($nid)
Currently, the second node_load() returns the original $node. That's severely wrong; it should return the modified node.
The current drupal_static() API makes doing things right easier but not quite easy enough. Suppose that node load caches a loaded node with:
<?php$nodes = &drupal_static('node_load', array()); // replaces current static $nodes = array();
$nodes[$nid] = $node;
?>
then node_save() can call
<?php$nodes =& drupal_static('node_load');
unset($nodes[$nid]);
?>
to ensure that a later node_load() does the right thing. There are a few problems here: (1) It requires two lines of code for what should be one line. (2) It exposes all of $nodes to damage. (3) It requires duplicating the default value for the 'nodes' key ("array()") in two places or, as in my (buggy) example above, omits it from the second case and therefore leads to subtle bugs when someone calls node_save() before node_load().
If I could write
<?phpunset(drupal_static('node_load')[$nid])
?>
I think the simplest solution is to support, in addition to the &drupal_static() function, explicit drupal_static_set(). drupal_static_get(), drupal_static_clear() functions. Then, node_load could do
<?phpdrupal_static_set('node-'.$nid, $node)
?>
<?phpdrupal_static_clear('node-'.$nid);
?>
While I'm thinking about it, another benefit of central static cache control is that we can limit the amount of memory used by the static cache (perhaps with FIFO or even an LRU GC if we get ambitious, though that is probably overkill). Not something we should try to do in this patch.
That's a lot of rambling without a specific suggestion. Sorry 'bout that.
#15
Barry, I already have drupal_static_reset, adding a clear is very very easy, like
<?phpfunction &drupal_static($name, $default = NULL, $reset = FALSE, $unset = FALSE, $key = '') {
elseif ($reset) {
$cache[$name]['data'] = $cache[$name]['default'];
}
elseif ($unset && is_array($cache[$name]['data']_) {
unset($cache[$name]['data'][$key]);
}
function drupal_static_clear($name, $key) {
drupal_static($name, NULL, FALSE, TRUE, $key);
}
?>
so that's quite solvable.
About using cache_set, I am unsure, I will study. Thanks for the review.
#16
This could also get rid of lots of global variables that exist simply because they want to be cross-function statics. So, overall, +1. However, a bit of thought needs to go into naming conventions, as this is going to be very prone to duplicates. I propose:
'origin_function__variable_name'. This way it will be very hard to create conflicting names, you must really be trying hard in order to break it.Another fun idea: a drupal_static_set function:
<?phpfunction drupal_static_set($name, $value) {
$temp = &drupal_static($name);
$temp = $value;
}
?>
This would allow for modules to perhaps hook into core a bit more by being able to alter core's statics... which can be a very good thing.
Also agreed, a drupal_static_clear() function would be awesome too.
#17
However, when I suggested this to Dries, he preferred using the cache_get/set/clear() functions with a special table name of 'memory' (which I'd suggest we refer to as CACHE_TABLE_MEMORY). This is a good idea from the developer's point of view because it means there is one "caching API" instead of two. We can make cache_get return a reference (only useful in the 'memory' case) but we can't really add the "default" semantics you have for drupal_static() which makes the almost search/replace update possible.
A lot of static variable caches could be cached longer, i.e. across multiple page requests. The caching in node_load() and node_save(), per Barry's example, could potentially be cached for an extended period of time. We don't necessarily want to cache this in the database, but in presence of memcached, we might want to cache it a bit longer. In absence of memcached, we could use a static variable.
Think of the static variable as a poorman's solution in a more versatile cache API. Using a single cache API will make it easier to do things like this. I don't quite know what the API would look like, and how what semantic sugar is required to communicate cache lifetimes, but it is well-worth thinking about this more.
Because Drupal is so flexible, it's also difficult to asses how feasible this all is without breaking any modules. I guess it requires some more thought and discussion. It might be nothing but a crazy idea.
#18
Yesterday I followed your chat in #drupal-dev with Crell and bjaspan. Generally agreed was:
a. Caching should be a simple 'get data' and 'set data', which can be done with or without OOP
b. The backend for a cache would be specified at 'table' level. Though I think this name should be renamed because table is outdated
and only for databases.
c. Different drivers could be included and users could specify in their settings file or in a cache info hook what table to use what driver. Drivers
that can be shipped with core could be: 'db', 'fsystem', 'static variable' and maybe even 'memcached' or 'apc'.
What are the benefits? One general Cache API which can be tweaked for each website. People could use EARLY_PAGE_CACHE with fsystem caching for cache_pages, use database caching for views and use memcached caching with expiration 5 minutes for node_load.
The static cache variable would just be another driver for the cache api because it is all just getting data and setting data. I really like the idea and it could lead to some nice performance gains.
#19
#18 is a pretty good summary. I am going to try and draw up some code ideas shortly and post them, probably on my blog since they'll be longer, but we'll see.
#20
This was not agreed these were merely Crell's ideas and too big a change for now.
#21
Hmmz, I thought you liked the idea too, sorry for that. What is the reason it is too big of a change, you'd rather do it in stages?
#22
I will introduce a
DRUPAL_STATIC_CACHEand replace static with cache_get/set into that table. cache_get will get a default option too. Meanwhile, I will cahnge the cache_get to return only $cache->data and have a cache_get_found call to determine whether cache_get found the data in the storage and a cache_get_headers to grab the headers. This makes it possible to replace the static with a simple cache_get because it no longer returns a complex data structure though we need a cache_set at the end of the function.This has absolutely nothing to do with drivers and such. Once this is in, we can ponder on that.
#23
chx: Put me on record as saying that a separate cache_found() function without an object to handle protected data members is about as nasty an API as you could ask for there. It makes no sense whatsoever.
#24
FYI, we have a module today that allows you to specify different cache strategies for different parts of your site. See http://drupal.org/project/cacherouter
#25
Here is much unfinished work. I am debating w/ myself, should I use arrays or the kind of cache ids that drupal_get_filename now uses? If the latter, how we are going to reset? I was thinking on arrays as cid, actually, but not yet there.
#26
First questions first: all this stuff just to avoid globals? Are they really really so bad? Why not just change those statics -- we do not to change them all, not at all -- that needs a reset to a global?
#27
Since fixing this issue is a breaker on running all tests at once (see http://drupal.org/node/260360 ) I'm changing it to a critical bug.
#28
Oh sure it is critical -- Barry mentioned it blocks fields in core too.
#29
Please handle a new generalized caching API in a new issue.
Edit: using cache* for this issue is ok, adding necessary facilities for static is also ok but to totally revamp cache* -- only if we think that's the absolute best and I am not convinced at all.
#30
Here is a summary so far: the problem is that we need to be able to reset static in function A from function B. Solutions:
$cid = array('node_load' => $nid)or something similar as cid?$nodes = cache_get('node_load'); unset($nodes[$nid_to_reset]); cache_set....The inner workings of cache_get is hardly relevant to this issue.
#31
I can't talk to anyone without having
cache('static')->set('value', $data);thrown at me. That's a different issue and I am out of this.#32
#33
having a global seems the best solution if we care about performance (i.e. avoiding zillions of extra function calls). We then just rely on a clear convention rather than an API per-se. For example:
$GLOBALS['cache']['node_load'][$nid] = clone $node;#34
I guess you could throw an extra dummy function call into every function that currently has a static to gauge the difference?
#35
I won't fixed this because boombatower and David Tournoud are already on the problems with simpletest in two issues ( http://drupal.org/node/260882 and http://drupal.org/node/261258 ) and the rest can be fixed invidually. If we do not want to convert all statics into this new api -- and it would be a big work -- then why bother at all.
#36
I think that unfortunately all of you misunderstood the correct application of static caching.
Static caching should not be meant to live across multiple page views. it should not be meant to be shared by more than one site user. Thus static cache should not be implemented with persistent caching mechanisms (db, memcached and what not).
what i think static cache should be used for is reducing redundant query and computation execution from many times down to one time.
1)
Q) why cant we cache this for more than 1 page request?
A) Some data is just too dynamic and cannot be cached, one such case is access permissions. functions return different data based on user access. do not eliminate these function calls, eliminate their repetitiveness.
2)
Q) where can we see benefit from this?
A) have you ever seen devel query log? it is not uncommon to see repetitive calls like user_load(UID) or taxonomy_get_vocabularies(VID) when UID and VID do not change. this is very common when these calls are spread around multiple modules or functions when these functions need similar data. so if we have a function that fetches us data in a simple manner, without any complex computation, just data fetching, then why can we not static cache it? now consider nested cases that appear inside hook_nodeapi a single static cache implementation will reduce function and query calls by large order.
the other day i went around our code base and static cached a lot of such functions, i managed to reduce query calls down from ~1000 to ~650 thus reducing page execution time down from 4 seconds to 300ms, ponder on that.
#37
i am setting this to patch (code needs work) because unfortunately our status messages do not include 'architecture (design needs more thought)'
#38
subscribing
#39
subscribing.
#40
Changing a few aspects of this issue to hopefully get some more eyes on it.
Also, it would be great if we could get more eyes on this or someone to push this through to RTBC because right now there are many more uses for static caching that we're starting to see, like #281596: Performance: static caching of user objects in user_load(), respecting conditions and #91786: user_load() static caching.
Dries spoke on this to keep it limited to static caching across a single page load. It seems like there were discussions of pushing caching further, but I agree with others in that this is out of scope for this issue.
Should there just be a standard coding convention on using a
$GLOBALSarray instead. Seems like it's the most efficient and easiest way to solve this issue. I'm not sure why abstraction through more function calls would actually help this situation, but for some reason people want to add that.#41
Add #223075: Adaptive path caching to Susurrus' list.
#42
Wait, so use global variables instead of some
drupal_staticfunction? This could somehow be indirectly related to #113435: Data API: first steps.#43
Note that extending the cache system for multiple drives is WAY more than this patch wants to do. Can we get back to #1 ? Edit: original patch.
#44
I'm looking at the original patch (at the top of the issue). fails to apply (fails on install.inc, bootstrap.inc, install.php). I think this is basically the right direction to go. We need a simple API, and don't need to confuse this with cache_set/get. The key piece this original patch is missing is a way to reset all the statics at the same time, which would be very useful for testing, etc.
so, in response to chx in #30, I'd argue that #1 is the right general approach, with several possible implementations.
#45
Subscribing. Getting sick of committing these patches that add a $reset variable to $foo function, so would be nice to have this stuff centralized.
#46
@Dries - what you describe sounds very similar to my proposed caching enhancement (with a dated patch) at http://drupal.org/node/152901 - this implements a similar static cache "collector" and can save and preload the caches based on language, path and (optionally) per-user, and also add items to the cache later on that weren't cached originally.
The static cache element of that patch was one of the parts of that patch I was not done with, so I am happy we are looking at this here. The cache_collect function in my patch lets you provide some additional metadata that describes the cachability of that data, and the code should automatically manage saving it all via cache_set (if appropriate) at the end of the page load.
#47
FWIW, here's a reroll of #1 that applies to the new unstable tag.
#48
#49
this is still missing a facility for resetting all the statics in one call.
Also, I think the logic here is wrong:
if (!isset($cache[$name]['default'])) {should be:
if (!isset($cache[$name]['data'])) {since I might want to have no default value (NULL) and I think the code as-is would overwrite the previously stored data on every function call.
#50
Also, I'm now recalling a proposal Crell had for an "environment" for each page load. Let's see how this could fit in with that, or perhaps be the start of it.
#51
subscribing.