Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
As follow-ups to #497118: Remove the registry (for functions), we should introduce back some modest memory and function call optimizations.
The elements are 2-fold:
- allow hook implementations to live in a single .inc file per module. This would be for info-type hooks like hook_menu() or hook_theme() that are only called rarely. E.g. modulename.hooks.inc
- Change functions like module_invoke_all() so that we cache a list of modules the implement certain hooks (and possibly whether the implementation is in the new .inc file). This would let us avoid doing a file_exists() + require_once for every hook invocation which the first element alone would need, plus make all such invocations more efficient.
Comment | File | Size | Author |
---|---|---|---|
#191 | 557542-cache_module_implements-191.patch | 1.13 KB | artkon |
#190 | 557542-cache_module_implements-190.patch | 1.37 KB | dbehrman |
#186 | 557542-cache_module_implements-185.patch | 1.92 KB | dbehrman |
#184 | 557542-cache_module_implements-184.patch | 1.63 KB | dbehrman |
#179 | 557542-module_implements-cache-179.patch | 8.88 KB | mikeytown2 |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commentedThis is effectively like the page-split patch in that the first part is a totally optional optimization, and the second part is an internal optimization that would not affect the API. Hence it shoudl be ok post-code freeze.
Comment #2
webchickThe second part I'm fully on-board with, and we might want to look into this sooner than later since we added a bunch more hook calls this release with the expectation that they were 'free' with the inclusion of the registry.
On the first part, I'd like to see benchmarks to see if this makes much of a difference. Also, if it's only for certain hooks, let's definitely not name it *.hooks.inc. As a module developer, if I saw files named that in core and tried to copy them for my own module, I would naturally assume that that file was for hook_init() as well. If we decide to move forward with this, let's make *sure* that the code is split up around easily memorizable lines. *.admin.inc and *.pages.inc and *.install are all very easy.
Comment #3
pwolanin CreditAttribution: pwolanin commentedwell, catch suggested something like "modulename.hooks-not-on-every-page-you-idiot.inc", but that might be a little long ;-)
Comment #4
webchickLet's figure out what hooks we're actually talking about first, and pick a name from that. ;)
Comment #5
pwolanin CreditAttribution: pwolanin commentedhook_menu, hook_menu_alter, hook_theme, hook_theme_registry_alter for sure. maybe hook_action_info and hook_node_info?
other rare hooks:
hook_modules_enabled, hook_modules_disabled, hook_system_info_alter
Comment #6
Crell CreditAttribution: Crell commentedContrib has a few dozen of these as well, so we cannot just have a hard-coded list. We need some way to declare "and oh yeah, this is an info hook so it has these extra properties".
Comment #7
pwolanin CreditAttribution: pwolanin commentedThis is really a documentation question - the idea is that we would look for *any* hook in this file, but we document that only info-type hooks can be (optionally) placed there for best performance.
Comment #8
Crell CreditAttribution: Crell commentedHow would we look for any hook in an optional file without loading and parsing the file to know if we need to load it or not?
Comment #9
pwolanin CreditAttribution: pwolanin commented@Crell - right, that's where the caching comes in - so once we check the .module and .inc for hook, we can mark a particular module as implementing it or not.
Comment #10
Crell CreditAttribution: Crell commented... It sounds like you're heading toward a Reflection API-based reimplementation of the registry, especially if you're talking about it working for any hook. I considered Reflection API but rejected it on the grounds that requires parsing all of the code into memory, so the initial scan would still be ridiculously huge on memory usage.
Comment #11
mattyoung CreditAttribution: mattyoung commentedsubscribe
Comment #12
catchSubscribing.
Comment #13
pwolanin CreditAttribution: pwolanin commented@Crell - I don't follow why this needs to be huge.
All I need to store is an array linking each hook name to every module that implements it (and whether the .inc needs to be included)
Then any time I need to invoke a hook, I build this data for the one hook if it's not cache (e.g. cached in a drupal variable). I iterate the array, and only invoke hooks known to exist. If the value is TRUE, I call module_load_include() to make sure the hooks.inc file is in memory.
Also, we shoudl also optimize module_load_include() so it's like drupal_load() and knows what it already loaded.
Comment #14
Crell CreditAttribution: Crell commentedAnd is that array dynamically built or developer-defined? If developer defined, then we're at hook_hooks(). (May not be a bad place, but that's where we are.)
If dynamic, how do you propose we build that list in the first place? The options I am aware of are:
1) Parse code into memory and use Reflection. This is memory intensive.
2) Tokenize and parse code to analyze where code lives. This is the registry that was just removed.
3) Require developers to explicitly define, somewhere, "I implement this hook and it is in this file".
I am not aware of an option 4.
Comment #15
pwolanin CreditAttribution: pwolanin commentedbuild the array automatically:
option 4:
function_exists() || (module_load_include() && function_exists())
Comment #16
Crell CreditAttribution: Crell commentedThat's still a variant on option 1. "Keep loading code until we find it or know that we're not going to find it". With an empty cache, that could be a non-small operation.
Comment #17
pwolanin CreditAttribution: pwolanin commented@Crell - yes, it would not be a small operation, but rare.
Comment #18
Crell CreditAttribution: Crell commentedI could keep asking questions about implementation, performance, and stability, but I suppose seeing code will answer most of those. :-)
Comment #19
fagoWhat about .info.inc ?
So the name suggests to only add some metadata providing hooks there, which area ideally cached or not performance related. Hm but module.info.inc could be easily confused with module.info.
@implementation:
What about going with one fixed file name and make sure it's included when needed with a simple helper function? Should be simple and effective.
Comment #20
pwolanin CreditAttribution: pwolanin commented@fago - yes - the idea was to have one fixed file name - well, or a fixed [modulename].X.inc, where X is fixed.
Comment #21
fagoWell, so what about that + manually invoke module_load_all_includes('inc', X) for hooks living in module.X.inc ?
Comment #22
Crell CreditAttribution: Crell commentedA general "sub component of a module" includer like fago suggests is a good idea, but to be honest I think the naming convention for that function is and has always been backwards. :-) We really should just standardize on $module.$subcomponent.inc and be done with it.
That's not a complete solution for the memory problem, but it helps if we have proper standards. (Earlier versions of the registry considered a similar concept, but I kept ending up with overly-complex designs and the self-learning logic seemed much more robust.) It could be useful for contribs, too, like $module.views.inc.
Comment #23
pwolanin CreditAttribution: pwolanin commentedI think the static to track if it's already loaded should be in module_load_include
Comment #24
catchIs there any way we can unify module_load_include() and drupal_load() - so we can have one central function for require once, and then change it to require?
Comment #25
pwolanin CreditAttribution: pwolanin commentedit seems like we shoudl be able to - but probably needs to be a post-freeze exception.
Comment #26
Crell CreditAttribution: Crell commentedExpanding this idea a bit, what if we did the following:
1) Change module_invoke_all() to be:
module_invoke_all($hookname, $args_array, $groupname);
2) module_invoke_all() then does the following:
a) Have we already cached which modules implement $hookname? If so, load that array and call those hooks. (We can protect against a stale cache when a module goes missing with a simple function_exists() check, which should be at worst no more expensive than what we do now.)
b) If not, load all $module.$groupname.inc files. Then iterate all module/hook combinations to get a list of all hooks that are implemented. Cache that. Then follow step A. (If $groupname is not specified, skip that part.)
This establishes that for a given hook, it can always live either in $module.module or in $module.$groupname.inc. What $groupname is, that's left up to whoever is implementing the hook. That is, menu_rebuild() arbitrarily decides to use a group of "info", which means hook_menu() implementations could live in either $module.module or in $module.info.inc. Views can specify a group of "views" for hook_views_plugins(), and so those hooks can live either in $module.module or in $module.views.inc. And so on.
Very low overhead, Very easy to extend in contrib, no extra parsing, and no BC break on file organization because .module is still a viable place for any hook. That makes the entire thing "opt in" for module developers who don't want to have zillions of 5 line files. But we still get the ability to move virtually any hook off to another file, using a common, recognizable pattern. (Callbacks would still have to be specified manually as they are in hook_menu and hook_theme.)
The only caveat is that module_invoke_all() currently takes a variable number of parameters rather than an array of arguments. That means we could do one of two things:
module_invoke_all($hook, $group, $list_of_args): This way, we still have a list of args but the $group comes first. If you don't want to use it, specify NULL explicitly.
module_invoke_all($hook, $args_array, $group): Very easy to leave off $group if you want, but you now have to specify your arguments as an indexed array. (Arguably we could encourage the use of groups by making $group the second param, so you do have to specify NULL if you don't want to have an optional group.)
Either one is going to be an API change to module_invoke_all(), which means not quite trivial to write. We'd have to either really really hurry or allow this as a post-freeze exception. I'm willing to write it if people like this approach, but as I'm about to get on a plane the odds of me being able to finish it pre-freeze are not high unless we allow an exception for it. Drieschick, I defer to you on that front.
Comment #27
fagoNot directly related, but changing to an $args_array would have the advantage that we could support passing arguments by reference by using array($arg1, &$arg_by_ref). That way it wouldn't be necessary anymore to bypass module_invoke_all() just to don't loose the reference.
Comment #28
pwolanin CreditAttribution: pwolanin commentedIn general this sounds like a good idea, but I'm a little leery of the groupname thing. I guess that means in some cases we can force the hook to be in .module? I can see how this might be useful for something like modulename.views.inc
Comment #29
catchOf the two options, I think I prefer:
Although it seems like the caching should really be maintained by module_implements() so we'd have module_implements($hook, $group) too. This'd mean module_implements() caching would also count for drupal_alter().
Comment #30
catchmodule_implements() + module_hook() are currently taking up around 10% of page execution time, so we are missing that caching a fair bit.
Comment #31
catchDoh, that was a page with a lot of cache misses, immediately after an apache restart, it's actually more like 7% when cached hooks are taken into account.
Comment #32
Crell CreditAttribution: Crell commentedSo this has some test failures still, and is going to be very fragile with other parallel patches, but this shows what I was talking about. It adds a $group parameter to module_invoke(), module_invoke_all(), etc. that is optional. If not specified, nothing changes. If it is specified, then we first try to include $module.$group.inc. In order to support optional parameters, $args is also converted to an array, not a variable length
Notes:
- module_load_include() is a horrifically stupid function, as its signature and documentation sucks. I am not touching it here. We fix it another time. :-)
- This patch does potentially introduce a lot of file stats. However, file stats are cached by most modern OSes so should be fairly cheap.
- This does not YET introduce any caching, but that could be added. My concern is that there's so many layers of function hand-off here that it may not be as easy to cache nicely as we'd like.
- I've moved node_menu() off to another file to show how it works.
- The important parts of the patch are quite small, in module.inc. There rest of the patch is just converting code to the new API, which means throwing lots of array() around stuff. :-)
I could definitely use help tracking down the last test failures.
Comment #34
alexanderpas CreditAttribution: alexanderpas commentedThe problem with the implementation of explicitly specifying a group for a hook, is that it needs to be set in each call of that hook.
What if we take the first part of the hook_name as the groupname and require that the module that declares that group needs to specify it in the .info file. (to help with performance)
for example:
The Block module could declare the block group and hook_block* could live module.block.php
The Views module could declare the views group and hook_views* could live module.views.php
(we could even go as far as to move hook_boot to module.boot.php)
In addition, all that code never gets loaded when the respective module is not installed, as the declared group is not availble.
Also, it keeps the API everyone is used to straight ;)
(development machine not availble here, or i would've provided patch)
Comment #35
Crell CreditAttribution: Crell commentedThe problem with making it magic is that most hooks are not multi-word. So if you want to break up a large module, you're going to end up with an awful lot of files. Do we really need to put hook_menu(), hook_theme(), etc. all in different files? It also means we need to do more parsing of the hook that's being called to determine what files to try and include. That raises the CPU cost along a very critical path. I also don't see what the info file has to do with it then, since we'd just magically determine "Oh you're calling hook_foo_bar, I'll check for $module.foo.inc files while I'm at it."
It would still be better than what we have now, though. :-) So I'm open to it if the committers are. I don't think that the requirement for a group name in module_invoke_all() is that big of a deal, though. Generally a given hook is called from only a limited number of places, and all in one module. (And if they are not, that may be a sign that you should have a wrapping function of some kind to unify that lookup anyway.)
Comment #36
Crell CreditAttribution: Crell commentedHm, I just realized another problem. That effectively means one file per contrib module that declares hooks. Problem. hook_views_data(), hook_views_plugins(), etc. are not that big, and can/should continue to live in $module.views.inc. hook_views_default_views() is frickin' huge, and should be avoided whenever possible. The approach in #34 would push hook_views_default_views() into $module.views.inc as well, which we very much do not want.
Comment #37
catchI'd be very concerned about the file_exists() without the caching - we already call module_hook() about 1000 times on each page in HEAD.
Comment #38
catchHere's a patch which just implements the caching. I think we should get that right first, then add the optional group/includes stuff onto it - so that we can verify the performance impact in both cases somewhat independently.
Didn't run tests yet but seems to be working. It handles missing functions, and reduces module_hook() calls from 783 to 58 on a completely empty front page. This will be an exponentially bigger saving the more hooks are called, and the more modules installed on a site.
I removed some reset calls from module_implements() in module_load_all() - since nothing seems to break if I do that, there were no supporting comments and it removes three cases where the variable would be set on every page load (those update queries would cancel out the caching, and we don't want a soft and hard cache clear if at all possible). We'll find out if "nothing" really is nothing when the test bot runs.
I don't think this is too far off, so marking CNR for the bot, and for early-ish reviews.
webgrind screenshots attached, I'm getting bugs with screenshots and percentages for some reason, so measurements are in milliseconds. Looks like about 35ms saving - again this is on a very light page, add a few nodes, comments, or additional modules and the number of calls to module_implements() and module_hook() could increase by a factor of 10 or more.
Comment #39
catchComment #41
catchWhat about this one?
Comment #42
catchThis time we also rebuild the cache from scratch when a missing hook is found - this mean if a hook implementation changes from one to another, we ought to find the new implementation as well as not getting fatal errors from the old one. Completely new hook implementations won't be found until the variable is reset though - I've added that to update.php.
Comment #44
catchMoved some more logic around to save checking hooks twice if they're not cached.
Comment #46
catchLess broken version.
Comment #47
Anonymous (not verified) CreditAttribution: Anonymous commentedthe patch at #46 looks fine to me.
ran some benchmarks with all modules enabled, three nodes with two comments each, hitting the front page.
didn't see any improvement with the patch, but i guess that's to be expected. we're loading the same amount of code.
think we'll need this all in one go to test out what gains we can get.
Comment #48
pwolanin CreditAttribution: pwolanin commentedIt doesn't look to me like there is a ever a variable_set() once you generate a list of valid hooks.
Comment #49
catchWell it does variable_set() every time there's new hooks. The last time there's new hooks would presumably be the first time the hook cache is complete no?
Comment #50
pwolanin CreditAttribution: pwolanin commentedah, ok - missed that. I was looking for the variable_set() inside module_implements().
Comment #51
dropcube CreditAttribution: dropcube commentedsubscribing
Comment #52
moshe weitzman CreditAttribution: moshe weitzman commentedInteresting.
How many variable_set() calls are we expecting per page? This can be an expensive function on a mature site since it leads to a rebuild of all (crufty) variables with its many unserialize() calls. You might say that a site should keep its variables table clean but thats pretty hard and infrequent in practice.
Comment #53
catchHmm, both pwolanin's and moshe_work's comment suggest my code comments are crappy. Here's a summary of what the patch is doing, if I need to add some of this as phpdoc, I guess the header of module_implements_write_cache()?
In the first call to module_implements(), get the array of $hook[$module] from $conf. This starts off the $implements static variable with whatever's in there. Each time module_implements() is called, if we find any $hook that wasn't there before, we add that to $implements.
Then in drupal_page_footer(), we compare the value of $conf['module_implements'] - which is still the same as it was at the beginning of the request, with the &drupal_static('module_implements') static variable we built up during the request.
If, and only if, there are hooks found during the page request which weren't previously cached (array_diff_key() in module_implements_write_cache()), we do a single variable_set() with the expanded array of hooks. This process is only additive - and once it's initially set up, I reckon the only places likely to cause a variable_set() are administrative or CUD pages which have more rarely called hooks. hook_foo_delete() hook_foo_update(), hook_form_FORM_ID_alter() - but again, once these have been found once, we don't see a write again until both the cache has been cleared AND that specific operation is formed.
Additionally, there's a failsafe in there, so that if a module is disabled via the system table, or a cvs update removes a hook implementation, we start the cache building again with the hooks found during that request - this also gives the system a chance to find new hook implementations if they were added outside of the usual modules page / update.php cycle.
Comment #54
pwolanin CreditAttribution: pwolanin commentedThis looks and sounds like a pretty good implementation. The only nagging question I have is regarding multiple threads doing all these steps in parallel. I think the worse case is that we variable_set() (and consequent cache clearing) several times with the same data and incur that extra overhead, but likely on average that's much better than doing any kind of query (and better than using the lock API) since the write shoudl be an atomic operation.
Comment #55
CorniI CreditAttribution: CorniI commentedcool patch :)
a) imho, the variable would be better named module_implements_cache instead of module_implements.
b) I don't understand why we add an array_diff_key in the main code path, wouldn't setting a $GLOBAL['module_implements_cache_write'] = TRUE; and in module_implements_write_cache() an if(isset($GLOBAL['module_implements_cache_write']) be faster and nicer to read?
Especially as the $new_hooks array is not used later.
c) The line
is definitly wrong, the operator+= is misplaced, $cached_implementations is not used later anymore. For better readability, i'd prefer array_merge here anyways.
d) Are the benchmarks from #38 still valid for this patch? Or can you rerun the test with a warm cache (after addressing my feedback)?
Thanks
Corni
Comment #56
catcha) agreed. I started thinking about this with cache_set()/cache_get() then pwolanin suggested a variable in irc before I got to rolling a patch, but $cid was still on my mind.
b) I don't think a single array_diff_key() matters much, but wouldn't particularly object to a global there.
c) Yes we can just use $implementations
d) There's no benchmarks there, only profiling output, but with those changes they'd still be valid yeah.
Comment #57
CorniI CreditAttribution: CorniI commentedrerolled, per #56 and #57
verified that it works in xdebug, still gives a good performance improvement :)
Comment #59
CorniI CreditAttribution: CorniI commenteduh, testbot doesn't like me :(
i'm trying again, now without the array_merge as it is unneeded as catch told me.
Needs review for the testbot.
Comment #61
CorniI CreditAttribution: CorniI commentedI can't run the whole testsuite locally, but I picked some failures and wasn't able to reproduce them...
Comment #63
CorniI CreditAttribution: CorniI commentedtry something...
sorry for spamming everyone's inboxes!
Comment #65
Damien Tournoud CreditAttribution: Damien Tournoud commentedPlease do not use the variable table as a cache. This is not what it has been designed for, and makes clearing the cache a non trivial operation.
There is no point in trying to save one query, as this "query" will only happen if the website is using the standard DB cache.
Comment #66
CorniI CreditAttribution: CorniI commentedper #65 and chx's comments in irc, now again with cache_*(), also I hope i finally found the test failure source.
Comment #68
CorniI CreditAttribution: CorniI commentedFor testing purposes now with #576096: Port cache-install.inc to the new cache system.
Comment #69
CorniI CreditAttribution: CorniI commented#576096: Port cache-install.inc to the new cache system went in, so please consider the patch from #66 for reviewing/committing.
Thanks!
Comment #70
catchMissing a space.
This shouldn't be here ;).
Otherwise looks great.
I'm on crack. Are you, too?
Comment #71
CorniI CreditAttribution: CorniI commented...
Comment #72
pwolanin CreditAttribution: pwolanin commentedthe use of a global is a bit ugly. perhaps you can just set something in the $implementations array - e.g. a key starting with a # can never be a hook name.
Comment #73
CorniI CreditAttribution: CorniI commentedbest idea till now, thanks!
attached patch changes $GLOBALS to the static per #72.
Comment #74
Crell CreditAttribution: Crell commentedThe comment in module_implements() still says that we're using the variable table, when the line right after it is a cache_get() call.
Right now, module_hook() is just a trivial wrapper around function_exists(), is it not? Couldn't we save ourselves some function calls and just do a function_exists() ourselves? (The lazy-load mechanisms I was working on above would do more there, so that may be self-defeating of me to suggest, but I'll mention it anyway for completeness.)
Comment #75
pwolanin CreditAttribution: pwolanin commentedYeah, looking at http://api.drupal.org/api/function/module_hook/7 there is no reason to add that extra function call.
Comment #76
catch@Crell, yeah I was thinking any type of lazy load stuff probably needs to be encapsulated in module_hook() - at this point, assuming the cache is primed we're calling it once for ever real hook implementation (approx 40 times on HEAD front page, but I guess this can easily go up to a few hundred, so might well be worth skipping that function call).
See also #529194: module_hook() your time has come.
Comment #77
chx CreditAttribution: chx commentedif you are using drupal_static then you should drop the refresh parameter and use drupal_static_reset instead.
Comment #78
CorniI CreditAttribution: CorniI commentedwrt #548470: Use something other than a single underscore for hook namespacing module_hook instead of function_exists() has an advantage...
chx: doesn't work, because you'd have to do a drupal_static_reset + a cache_set('module_implements', array()); and this is an implementation detail i'd like to hide from the clients.
@Crell: removed the comment
I also replaced a comment and a cache_set with a better way.
Comment #79
catchCornil - can you roll again with -up, patch has funny arrows in it ;)
Comment #80
CorniI CreditAttribution: CorniI commentedargh sorry...
Comment #81
catchWhy don't we do a cache_clear_all($cid) instead of cache_set() when we need to rebuild the cache?
Comment #82
CorniI CreditAttribution: CorniI commentedIt doesn't really matter, imho it is fine this way. query-wise it would be delete+insert vs 2xupdate...
I think a cache_set is fine, but if you want, I can reroll with cache_clear_all :)
Comment #83
RobLoachSubscribing. The ability to have [mymodule].[othermodule].inc is what I loved about the the registry as it improved the developer experience.
Comment #84
chx CreditAttribution: chx commentedI think I saw Tab characters in that patch. Oh horror!
Comment #85
CorniI CreditAttribution: CorniI commented/me runs away frightened of the tabs ;)
no functional changes, just tabs=>spaces.
Comment #86
CorniI CreditAttribution: CorniI commentedFollowup per chx's request which is possible because of drupal_static's :)
#577950: API-cleanup: Factor module_implements $refresh parameter into it's own function
Comment #87
chx CreditAttribution: chx commentedRemoving
module_implements('', FALSE, TRUE);
frommodule_load_all
can't be good. Adrupal_static_reset
is necessary.Also, I am fairly sure that you want to write that cache only on GET to avoid bloating the hell out of cache with hooks only used when POSTing.
Comment #88
chx CreditAttribution: chx commentedSo we have a problem here that we can't really reset the static cache because we already loaded the cache implementations and this would neuter the db cache. So we need to be able to tell to module_implements that we are in bootstrap and when in bootstrap, make sure it does not static cache non-bootstrap hooks 'cos modules are not loaded and when bootstrap is finished then it can begin caching in earnest.
Comment #89
chx CreditAttribution: chx commentedOn further thinking, the bootstrap issue is moot. You would need to call a Drupal function which, in turn, calls module_invoke_all. Not much of Drupal is loaded at this point, not even theme.inc. If someone manually begins to include .inc files from includes then it's her own problem to get Drupal functioning after that. The statics are exposed enough to make that possible and that's good enough.
So it's okay to remove that but the GET/POST issue still stands.
Comment #90
CorniI CreditAttribution: CorniI commentedNow uses drupal_page_is_cacheable() to determine if we want to write the cache or not.
Comment #91
CorniI CreditAttribution: CorniI commentedper request of chx better comment for the addition
Comment #92
CorniI CreditAttribution: CorniI commentedtypo in comment...
Comment #93
CorniI CreditAttribution: CorniI commentedchanged comment per catch's request...
Comment #94
catchI'm (very) happy with this now. We keep the cache a manageable size, all the extra hooks we added to Drupal 7 won't cost us half as much, it's compatible with the lazy load stuff Crell wants to do to reduce memory usage on non-APC hosts, and it's a small amount of code with ~ equal number of comments to code. Should also have less DX side-effects than the registry since you should never get undefined function for a hook implementation.
So setting RTBC.
Comment #95
webchickLet's get some benchmarks for this.
Comment #96
catchVanilla install with one node on the front page:
ab -c1 -n500
HEAD:
Patch:
A bit faster but possibly margin of error - the main advantage here is that the cost of module_implements() is now based on how many hook implementations you actually have(i.e. more or less flat compared to most things on a site), not on how many possible hook implementations could possibly exist ever under any circumstances. So if we enable 50 modules it might go up 20% instead of 400%.
Also attached dreamy webgrind screenshots. You want to look at the right hand column for module_implements() to see where the impact is - numbers are percentage taken of the page request.
Comment #97
donquixote CreditAttribution: donquixote commentedMaybe the following could be useful to extend on this:
#519940: Performance: optimize building theme registry by using get_defined_functions() instead of function_exists()
The idea is that in some cases it is much faster to look at all defined functions, instead of looking at all possible module + hook (+ suffix) combinations. This is especially valid for _theme_process_registry, but could also be useful for module_implements() or module_invoke_all(), in some cases.
The other idea is that hooks don't need to be explicitly defined anywhere. Instead, we remember functions that could be a hook implementation, and wait for some code to ask for a hook with that name. This naive approach can make things a lot easier.
The downside:
This only works if either all candidate functions are already loaded at the moment when get_defined_functions() is called (mostly the case in D6), or if you have a list of function names available from elsewhere (defined via a config file or other means).
Btw, I think this is no different from the #93 patch, which uses module_hook / function_exists.
Which one is faster?
On my local test site the time for a full scan with get_defined_functions was around 50ms, with views and ubercart and a lot of other modules (measured with PHP microtime).
I imagine that in most situations where you need only one hook name, the #93 patch is faster than that, because it does only test the module + hook combinations with module_hook (= function_exists), for one specific hook name. Number of modules is smaller than number of defined functions, and there are fewer string operations involved for each iteration.
The get_defined_functions solution can be a lot faster if we want to find many different hooks at once, or if we allow different suffix combinations (with _alter_ or _preprocess_, or theme functions).
The idea would be to do one full scan on the modules page, and cache it in the DB.
Works for D6?
There is one problem both with the #93 patch and the get_defined_functions solution: It does not find hook implementations in include files. This is no problem for D7, but it can be a problem in D6, see #519940-#9 or #221964-#214 for a description of the problem and a possible solution.
Comment #98
sunsubscribing
Comment #99
catchLazy loading / includes now at #588766: Standardize lazy-loading of optional include files.
Comment #100
Dries CreditAttribution: Dries commentedI think this is a great patch, but some extra code comments would be appreciated. :-)
Comment #101
sunWhy? :)
Shouldn't this also happen in drupal_flush_all_caches() to clear any stale data?
@see module_implements_write_cache()
Could we rename that last argument to $reset? We have another issue in the queue that tries to find a sane pattern for reset functions/arguments, so it would be good to catch this instance when grepping for $reset.
This means we're writing the cache on every request that finds/invokes not-yet-known hook implementations, right?
Here, the comment could additionally explain the condition.
@see module_implements()
I don't understand where #write_cache is (ever) unset - it seems even to be contained in the cache?
This review is powered by Dreditor.
Comment #102
catch.
Yep, except for POST requests - this is a relatively cheap cache to build (compared to say the menu cache, which is an insert per router item visited (I think)), so I think the trade-off is fine.
Everything else looked valid to me - tried to add as many additional comments as possible.
Comment #103
sunFixed some text.
However, one question: What happens if my module loads an include containing further hook implementations in a POST request, so the cached hook implementations do not contain those implementations?
Comment #104
CorniI CreditAttribution: CorniI commented@sun: Just _fired_ hooks are recorded in the cache, not loaded hooks, so you're question is kinda weird wrt the POST part.
You're simply not expected to conditionally include hook implementations I think (wether it's POST or GET...).
Comment #105
catchIf it's a POST request, no new hook on that request will get cached - so it will just be as if there wasn't a cache, your module won't be singled out. If those hook implementations then get loaded in a non-POST request, the cache would kick in again for those hooks which have not yet been cached. In other words we only cache the implementations of each hook once, in non-POST requests.
Comment #106
Dries CreditAttribution: Dries commentedThis looks great now. Committed to CVS HEAD.
Comment #107
catchWe missed an obvious error in the patch. drupal_page_is_cacheable() is set to FALSE for all authenticated requests. Since we fetch from cache whenever, I believe I tested with a primed cache (and hence the benchmarks were still OK). This is a very quick fix which just uses the code from drupal_page_is_cacheable() instead of calling it. I don't have time to write it now, but it's possible we could add a very, very simple test which just visits a couple of pages, then does a cache_get() and does an assertTrue() on it.
Attaching webgrind screenshots which show the difference (module_hook() calls down from 700-odd to 60).
Comment #108
catchWith a test. I get 'base table or view not found' exception when running this, but thinking that might be a local issue.
Comment #110
catchcache_get() has 'cache' as default table, but cache_clear_all() doesn't, obviously :(
Test fails without the patch, passes with it.
Comment #111
pwolanin CreditAttribution: pwolanin commentedFixes the test exception
Comment #112
webchickThe old code here was much easier to understand. Please leave a comment why we're checking the request methods, and why we're checking for GET and HEAD and not also POST and others.
I'm on crack. Are you, too?
Comment #113
catchHowzat?
Comment #114
catchAdded explicit test for logged in user.
Comment #115
catchComment #116
webchickLooks good. Thanks for the extra tests there.
Committed to HEAD!
Comment #117
JacobSingh CreditAttribution: JacobSingh commentedThis caused a regression in:
#524728: Refactor install.php to allow Drupal to be installed from the command line.
Tries to access a cache before it is created.
PDOException: SQLSTATE[42S02]: Base table or view not found: 1146 Table 'dev7_drupal.cache' doesn't exist in /Users/jacob/work/github/drupal/includes/database/database.inc on line 1735
Comment #118
David_Rothstein CreditAttribution: David_Rothstein commentedHm, this also broke command line installs for me for a few days (although I was getting a different error message than Jacob), but magically, it seems to be working again. Anyone else experiencing this? The inability to run the installer via the command line seriously cut down on my Drupal productivity for a little while :)
Jacob debugged this pretty well at #524728-88: Refactor install.php to allow Drupal to be installed from the command line but we should really solve it here. It sounds to me like if the problem reoccurs (or if other people are still experiencing it now), the solution might be to remove the funny cache tweaking that is currently in the installer and replace it with a simple call to drupal_flush_all_caches() after the installation is over. We discussed this possibility previously in #577298: Blogs and forums don't get body fields when included in an installation profile as a possible way to future-proof the installer against weird caching bugs, and this might be a good example of that... In general, the installation of Drupal is so different from a regular running Drupal site that I'm not sure we ever really want to trust anything that gets put in the cache during that time period.
Comment #119
catchYes that seems like the best option to me, and was what we came up with when discussing with Jacob in irc. Note that my debugging suggests we still need the fake cach class, it's the 2007 commit which tries to switch to the 'real' cache early which is the problem - and this needs to be replaced with a cache flush at the very end.
Comment #120
catchHere's a patch which fixes command line installs. Improvements to the comment welcome, but it's better than the obscure comments from 2007 about clean urls check which are in there at the moment.
Comment #121
catchmissed a spot.
Comment #122
JacobSingh CreditAttribution: JacobSingh commentedIs that all it took? I thought I tried this and had problems... Anyway, nice one, it totally works.
I'm on crack. Are you, too?
Comment #123
catchBit too simple - it fixes cli installs but regresses back to the original bug from 2007.
Comment #124
David_Rothstein CreditAttribution: David_Rothstein commentedHuh, really? That's surprising, since the cache should be totally empty afterwards.
By the way, it looks like the patch missed removing the call to node_types_rebuild() - it removes the code comment only.
It's also strange that command line installs went back to working for me automatically but no one else :) I wonder if this is related to #561306: Installer corrupts cache when database connection is in settings.php in some way... the local script I use replaces settings.php, but maybe others don't? I can't wait until we have "drush install drupal" :) But either way, something is definitely broken here, and it seems like the drupal_flush_all_caches() solution should in theory be able to fix it.
Comment #125
catchI can't find the original issue right now, but here's what I get with the current patch:
Command line install works perfectly.
UI install works fine EXCEPT that when you hit the front page of your Drupal site, the sitename is 'Drupal' instead of your actual sitename. What's even more bizarre is that if you just refresh the page, you'll get the correct site name. i ran this several times, including rubbing my eyes, and that's what appears to happen. The refresh of that page should not clear any caches, so it looks like $conf itself is somehow corrupted outside of the variables cache.
Comment #126
David_Rothstein CreditAttribution: David_Rothstein commentedStrange indeed, but actually, I can reproduce that exact same problem (site name switching) without the patch as well.
So it doesn't appear to be caused by this patch... or fixed by it either, apparently.
Comment #127
moshe weitzman CreditAttribution: moshe weitzman commentedThis patch fixed CLI installs for me and they were certainly broken without it. Since the site_name bug happens without this patch too, this is RTBC.
FYI, I am about to commit an initial install command for drush that (re)installs core including the whole install profile. Handy for core devs and committers. Requires this patch.
Comment #128
catchAgreed with RTBC, the site name patch we can do as a followup issue when this goes in.
Comment #129
catchSomehow the node_types_rebuild() call never got deleted. No other changes.
Comment #130
David_Rothstein CreditAttribution: David_Rothstein commentedCertainly looks right, and should go in - but Moshe, you might want to avoid doing a CVS update any time soon :)... it turns out the CLI installs are broken for a different reason now also: #412730: Theme system should optionally report when a theme key is not found
Comment #131
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks!
Comment #132
David_Rothstein CreditAttribution: David_Rothstein commentedNote: The followup patch for the site name problem is at #605880: Settings and menus broken after install due to faulty caching. (Also see #615528: Incorrect code comment regarding DrupalFakeCache in the installer for another followup.)
Comment #133
Damien Tournoud CreditAttribution: Damien Tournoud commentedWhat happens if:
* the cache is not primed
* module_implements is called before DRUPAL_BOOTSTRAP_FULL
... module_implements() will prime the cache based on the bootstrap modules. #fail
Comment #134
catchThe only situation that would be a problem is when module_implements() is used to invoke exactly the same hook both before DRUPAL_BOOTSTRAP_FULL and afterwards. That would be a bug with the calling function - since by definition it wouldn't be a proper hook invocation for all modules, and it's also the reason we have bootstrap_invoke_all(). Do you have a specific example where such a thing happens and bootstrap_invoke_all() isn't an option?
Comment #135
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe point is: this is documented nowhere, and is a DX hell, because the function doesn't necessarily know it can be called in a non-fully-bootstrapped context, and as such shouldn't use module_implements() (or any function indirectly calling module_implements()).
Comment #136
catchWell we can probably prevent caching in that situation by setting a static if the first call to module_implements() is before DRUPAL_BOOTSTRAP_FULL with drupal_get_bootstrap_phase(), then prevent the cache being written if that's the case. Only needs to be checked once since we can't go backwards and unload modules so would have no overhead compared to removing the caching again.
If someone calls module_implements() in hook_boot() then no caching would ever happen, but tough luck I suppose.
Comment #137
CorniI CreditAttribution: CorniI commentedThis spin-off os probably related...
#578044: move module_invoke_all() out of the reach of hook_boot()
Comment #138
sun.core CreditAttribution: sun.core commentedI like the patch(es) that was committed, but Damien's concerns sound also worrying. However, can we please move this to another issue? This one was RTBC like 5 times already and this means there is not much room for quick discussion. Ideally, more or less copy #133 to a new critical issue and link to it here. Thanks.
Comment #141
c960657 CreditAttribution: c960657 commentedHere is a first stab at a D6 backport. Xdebug reports that the time spent in module_implements() is reduced by 75%, but I couldn't measure any difference inthe total page generation time using siege, so it may be Xdebug that skews the results. I'll leave the patch here if anybody wants to benchmark or continue work on it.
Just a small note about the patch: I moved the functionality from module_implements_write_cache() into module_implements(), because D6 does not have drupal_static(), so two functions cannot access each others static variables.
Comment #142
c960657 CreditAttribution: c960657 commentedReopening for D6. The patches are similar, so D6 should see a performance improvement similar to that in D7, unless the two versions differ in some way that affects this.
Comment #143
iamer CreditAttribution: iamer commentedI gave the D6 patch a spin on a test pressflow installation and I it have two problems :
* Apache error log on accessing update.php: PHP Fatal error: Call to undefined function cache_set() in /var/www/html/includes/module.inc on line 433
* All pages at the top : Notice: Undefined offset: 0 in /var/www/html/includes/path.inc on line 54
The second one probably because I have the path cache module enabled. Line 54 says $modules = module_implements('lookup_path');
Maybe it's getting called too early in bootstrap ?
Comment #144
catchNot a critical bug in D6.
Comment #145
mauritsl CreditAttribution: mauritsl commentedBenchmarked the patch from #141 with ab -c10 -n100 (anonymous).
Used /node of a somewhat basic Drupal 6.16 installation with 2 additional blocks (from views module), page and block cache disabled.
Without patch:
Requests per second: 4.98 [#/sec] (mean)
Time per request: 2008.131 [ms] (mean)
Time per request: 200.813 [ms] (mean, across all concurrent requests)
Transfer rate: 142.75 [Kbytes/sec] received
Patched:
Requests per second: 5.04 [#/sec] (mean)
Time per request: 1983.727 [ms] (mean)
Time per request: 198.373 [ms] (mean, across all concurrent requests)
Transfer rate: 144.51 [Kbytes/sec] received
That shows a 1.2% improvement.
Comment #146
c960657 CreditAttribution: c960657 commentedWhen building the module list, module_implements() loops over all installed modules, so I would expect the improvement to increase with the number of modules installed.
Comment #147
c960657 CreditAttribution: c960657 commentedUsing XDebug+Webgrind I get the following numbers for a Drupal site with 250 modules:
Without patch
Total self cost: 106
Total inclusive cost: 241
With patch
Total self cost: 8
Total inclusive cost: 19
Comment #148
ezar CreditAttribution: ezar commentedWorks fantastic! Please commit to current D6 dev!
Thanks c960657
eZar
Comment #149
c960657 CreditAttribution: c960657 commentedNew D6 patch that includes the fix from #836630: (Tests for) module_implements() caching is broken.
Comment #150
ezar CreditAttribution: ezar commentedworks fine here!
thanks
Comment #151
c960657 CreditAttribution: c960657 commentedWe need to bootstrap the cache (and thus the database layer) in update.php before flushing the module_implements cache.
Comment #153
c960657 CreditAttribution: c960657 commented#151: module-implements-cache-3.patch queued for re-testing.
Comment #155
catchJust needs D6 extension, this works fine for me.
Comment #156
jcmarco CreditAttribution: jcmarco commented#155 patch works fine but it breaks the drush updb command.
If this patch goes to core it would be needed to update drush commands/core/drupal/update_6.inc to add the same changes than in that patch
Comment #157
glennpratt CreditAttribution: glennpratt commentedRe-roll against 6.20
Comment #159
glennpratt CreditAttribution: glennpratt commentedWierd, applied fine for me...
Re-roll with CVS from DRUPAL-6.
Comment #160
glennpratt CreditAttribution: glennpratt commentedTest please...
Comment #162
glennpratt CreditAttribution: glennpratt commentedSorry, of course the tests don't work for D6. This is catch's patch from 155 re-rolled with git.
Running this patch on a production site for a few months with no issues. DX change having to clear cache when implementing new hooks, same as D7.
It provides a modest boost when running just a few modules, the boost becomes much larger on a site with many modules.
Simple ab with most of core + views + cck enabled.
ab -c5 -n100 http://d6.vbox.local/node/add/page
Comment #163
Jeremy CreditAttribution: Jeremy commentedSubscribe.
Comment #164
Fabianx CreditAttribution: Fabianx commentedSubscribe, this is cool ....
Comment #165
pwolanin CreditAttribution: pwolanin commentedLooks a bit odd - are we always refreshing the cache on every page footer?
Comment #166
catchLooks fine i think:
Would only refresh if $reset was true, and when $write_cache is TRUE, module_implements should still be checking if there's anything to actually write before doing so.
Just a note while I'd like to see the Pressflow merge get in, I'm not convinced about committing this to D6 since it does change behaviour in terms of adding/removing new hooks (and will require a drush patch for drush updatedb to work), however it's also good that the patch is sitting here for people to find it.
Comment #167
mikeytown2 CreditAttribution: mikeytown2 commentedSubscribe. Patch #162 applies to pressflow without any rejections.
Initial cachegrind shows an improvement of 150ms or so in the module_implements() function when using memcache. If the module_hook() check is removed, getting called 3,759 times only takes 21ms under total cumulative time. With that check in place It took 203ms of total cumulative time. I understand why that check is needed; but I think we could optimize this.
Patch below takes 32ms total cumulative time and will still verify that the function exists. Changes done is it only runs the foreach loop once per hook, via the new $verified static variable. This patch brings over a 300ms improvement on our test server compared to stock drupal.
What was the issue with drush, is it #156?
Comment #168
catchThe drush patch has been committed to 4.x but afaik isn't in a point release yet, this patch breaks drush updb without it though. I have not seen many calls to module_hook() left with the patch applied so interesting that those are still showing up for you. Typing from phone so did not review latest patch yet.
Comment #169
mikeytown2 CreditAttribution: mikeytown2 commentedGood to see that the drush patch has been committed
http://drupalcode.org/project/drush.git/blob/refs/heads/7.x-4.x:/command...
http://drupalcode.org/project/drush.git/commitdiff/c6d5531b32290681c8942...
Comment #170
glennpratt CreditAttribution: glennpratt commentedHere's an interdiff for 167 from 161.
Comment #171
seandunaway CreditAttribution: seandunaway commentedWow, this is awesome. Applies cleanly to pressflow 6.22.
Comment #172
rjbrown99 CreditAttribution: rjbrown99 commentedI tried #167 tonight and ran into issues where the queries to the url_alias table seemed to go away. Links reverted back on the main site as well, no more aliases - /my/alias was now simply /node/123. I'm using Pressflow and also have the path_alias_cache module enabled. One thing I noticed was a catch branch of Pressflow with this patch, and it has a follow on patch for path.inc here:
http://bazaar.launchpad.net/~catch-drupal/pressflow/pressflow_implements...
This is similar to comment #143 above. I have not re-tested with that fix applied nor did I debug why my aliases stopped working, but I thought I'd bring it up in the event it is more widely applicable to others.
Comment #173
glennpratt CreditAttribution: glennpratt commented@rjbrown99 - Yes, you should use the Pressflow branch for Pressflow, not this patch.
Comment #174
mikeytown2 CreditAttribution: mikeytown2 commentedgood to know about the issues with the path_alias_cache module; we disable it on our test servers.
Comment #175
mikeytown2 CreditAttribution: mikeytown2 commentedHere's a patch based off of #167 that keeps the behavior of module_implements() the same as before thus fixing the pressflow issue.
New patch uses
$implementations[$hook][] = $module;
Instead of$implementations[$hook][$module] = $module;
and uses a $key in the verification loop.Comment #176
mikeytown2 CreditAttribution: mikeytown2 commentedAnd for the curious here's an interdiff for 175 from 161.
Comment #177
mikeytown2 CreditAttribution: mikeytown2 commentedAnd for the very curious here's an interdiff for 175 from 167.
Comment #178
mikeytown2 CreditAttribution: mikeytown2 commentedRe-worked #175. I had a very odd case where cache_get() didn't exist yet (selective bootstrapping), so we need to check that the cache functions exist before using using them inside of module_implements(). I doubt anyone else will have this issue, but ya never know. What changed
Comment #179
mikeytown2 CreditAttribution: mikeytown2 commentedrenamed the above patch so the testbot will pick it up. This is the same as #178
Comment #180
Grayside CreditAttribution: Grayside commentedDrop this into some drush.inc file and you can use 'drush cc hook' in development. The kind of sites that this patch helps are the kind that make cc all a pain :)
Comment #181
donquixote CreditAttribution: donquixote commentedSomething here does not make sense (EDIT: talking about D7).
This means, if I call module_implements('foo') 10x, then the first loop will be called 1x, the second loop will be called 9x.
So, if $group is not empty, then module_load_include() will be repeated 9x in the same request, for the same module+hook, for no reason.
Am I missing something?
Why do we even bother with "Use the advanced drupal_static()" if we waste our time down there in the loop?
The only way this could make sense is if the statically cached $implementations can change over time, from something external. This is theoretically possible, given that drupal_static() is globally accessible, but we are not that bad, are we?
I can open a new issue, but maybe some feedback first..
Comment #182
donquixote CreditAttribution: donquixote commentedI should add, the D6 patch by mikeytown prevents this problem with a second static variable, $verified.
But I'd say this is not even necessary.
EDIT: Oh yes it is.
Comment #183
donquixote CreditAttribution: donquixote commentedNew issue: #2263365: Second loop in module_implements() being repeated for no reason.
Comment #184
dbehrman CreditAttribution: dbehrman commentedMy site hits this a few hundred times, and making this additional modification saves about 1/4 second on each page load.
Comment #186
dbehrman CreditAttribution: dbehrman commentedComment #187
mikeytown2 CreditAttribution: mikeytown2 commentedComment #189
Fabianx CreditAttribution: Fabianx commentedI think that is a D6 patch that cannot be tested, but ooh, it makes sense to also cache the verified part.
Comment #190
dbehrman CreditAttribution: dbehrman commentedHere's another one. It looks like verified can be completely removed instead of cached for the same effect. But also, thanks to the 'wild hack' in ctools features, there are some hooks that exist sometimes and don't exist other times. The problem is when they don't exist, and you call module_invoke(), it will return null. If the function did exist but returned nothing, you would at least get an empty array. So this patch includes a patch for that, also in the modules.inc file.
Comment #191
artkon CreditAttribution: artkon commentedUsing 191 but found a bug, module_invoke should return NULL when no implements otherwise it breaks update.php among others.
Comment #192
mikeytown2 CreditAttribution: mikeytown2 commentedComment #195
Fabianx CreditAttribution: Fabianx commentedThis issue is for 6.x, tests won't work there.