http://api.drupal.org/api/function/drupal_get_schema_versions/7
Get all possible functions, then loop through them, then multiply this by the number of modules installed. With the default profile, this equals 40,000 iterations, it must be well over 200,000 on some actual sites.
sun suggested using the registry, that saves a lot.
Even if we do #497118: Remove the registry (for functions), we could probably parse all .install files and find the update function signatures and it'd still be quicker than this.
HEAD:
3.53 [#/sec]
Patch:
4.39 [#/sec]
Comments
Comment #1
catchComment #2
moshe weitzman CreditAttribution: moshe weitzman commentedthis looks fine. do you want to go further and add some _multiple logic so we do one query for a bunch of modules?
Comment #3
sunWe could also skip
by scanning for '%%_update_%%' in the LIKE query condition.
Comment #4
yched CreditAttribution: yched commentedsilly nitpicking, but : is a placeholder needed for the '%update%' litteral ?
Comment #5
catchAdded multiple, tested with devel module.
However there's two concerns I have with the current approach - if you update a module's code and forget to run update.php, we want to remind you as soon as possible - this patch requires a registry_rebuild() before you'll start getting prompted. Also this is the same in Drupal 6 http://api.drupal.org/api/function/drupal_get_schema_versions/6 and we can't backport the current patch, so I'd like to look at a backportable version which parses .install files directly itself, and compare performance against the query. We've gone down from 30% of execution time to 0.6%, but even if parsing .install files isn't as efficient it'd be more robust.
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedAnother option is to refactor http://api.drupal.org/api/function/system_requirements/7
system_requirements() doesn't need to know all the update number available, but simply need to know the number of the top-level update function. What about creating a quick drupal_get_all_update_available() that would return a (module => top level function) array?
Comment #7
catchpatch.
Comment #8
catchHere's a version using preg_match_all() instead of the direct query on the registry table. Advantages are we don't need to wait until a registry_rebuild() to inform people the module they just upgraded has database updates pending, and it should be backportable to D6. Disadvantage is I'm terrible and regexp, so if you can fix the very basic one here, please go ahead. This now takes 1.3% of page execution time, so not quite as snappy but cutting down 29% of page execution time instead of 30% ain't bad.
I'd originally had a look at doing this with token_get_all() as the registry does, but that was as slow as going through get_defined_functions() (c. 36,000 calls to next()).
Main question is if we can rely on regular expressions here to get the correct list of functions or not, if not, #7 is still good for D7.
Comment #9
Damien Tournoud CreditAttribution: Damien Tournoud commentedWhat about something like this?
Comment #10
catchFixed up Damien's suggestion, 3.5% self, including time spent inside preg_match() it's over 7% though, whereas my last one is 1.8% incl.
Comment #11
Damien Tournoud CreditAttribution: Damien Tournoud commentedInteresting. I forgot to anchor the regexp, that should make it faster:
Another thing to consider: would simplifying the regexp as
"/^(.*)_update_(\d+)$/"
make it slower or faster?Comment #12
catchBoth lead to about 4.5% incl.
To keep the patch options updated, attached option 1 along with both webgrinds.
Comment #13
sunWeird idea: Can we skip the foreach entirely and do just one preg_match_all() instead?
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedAnother option is to do a preg_grep() upfront:
Comment #15
Damien Tournoud CreditAttribution: Damien Tournoud commentedWe want the first regexp to be as cheap as possible, so maybe just filtering the function names that end with figures would be enough?
Comment #16
catchnot with that regexp we can't, matches is empty. Interesting idea though.
Comment #17
Damien Tournoud CreditAttribution: Damien Tournoud commentedHere is what I meant:
Comment #18
catchI crossposted, was replying to sun's, but nice update anyway. I think we may have a winner!
Comment #19
Damien Tournoud CreditAttribution: Damien Tournoud commentedOk, we need to document why we are doing that, then.
By the way, do we only have 65 update functions in core?
Comment #20
catchYeah we removed all the d5-6 upgrades so it's only 6-7 in there now. Not sure what to say for documentation, mind re-rolling with that?
Note I haven't profiled sun's implode() approach, but we're not going to get much better than 0.5% so if it's comparable would just be a matter of taste.
Comment #21
catchTeam effort at this point.
Comment #22
sunCombination of a few suggestions.
Comment #23
catch0.02% slower than Damien's but that's margin of error now.
Comment #24
sunMost probably, the more modules with updates are installed the more likely the implode() approach will win. ;) Because we invoke preg_match() for all hook_update_N-alike function names being defined. Contrary to that, 1x implode() and 1x preg_match_all is very, very cheap. :)
Anyway, awesome collaboration, gentlemen :)
Comment #25
jrchamp CreditAttribution: jrchamp commentedNice job all! I tend to agree that the preg_match_all will likely scale better.
Comment #26
Damien Tournoud CreditAttribution: Damien Tournoud commentedVery unlikely. I'm pretty certain that the preg_grep() / preg_match() approach scales better.
Edit: and it does use less memory.
Comment #27
catchShould be easy enough to swap the same function out on a heavily loaded D6 site and compare the two, but I don't have that set up locally at the moment so others are welcome.
Comment #28
sunEither way ;) Marking #18 as ready to go. :)
Comment #29
Dries CreditAttribution: Dries commentedLooks good, but can we have some code comments?
Comment #30
sunhah, you can't sleep, or why are you here, but not in IRC? ;)
At your service. (with some grammar + language help from dbabbage in IRC)
Comment #31
drewish CreditAttribution: drewish commenteda couple of small nits to pick but sun loves dropping into other people's issues and tossing off a few and kicking it back to CNW so i think he'll understand ;)
I think it'd be better to reference hook_update_N() rather than the more vague "module update functions".
Same goes here. I guess both comments should be reworked to make it clear that first we'll make two passes first with the looser regex looking for functions ending with a numeric digit then reducing it further to those matching the hook_update_N() pattern.
Other than that it seems good to go.
Comment #32
catchRe-rolled with comments.
Comment #33
drewish CreditAttribution: drewish commentedcomments look good to me. bumping back to previous status.
Comment #35
deekayen CreditAttribution: deekayen commentedtrying to get a new testing client working. it's not there yet.
Comment #37
catchComment #38
catchImproved code comment to explain /why/ we use preg_grep() following discussion with webchick in irc.
Comment #39
webchickHm. I'm still confused by this code, which might mean the the docs need a bit of improvements, or it might mean that this still needs work.
Curious. Why _\d+$ and not _update_\d+$? I understand why just "_update_" is a bad idea, but it seems odd that you're re-running another preg_match just below to check for something you could've caught the first time.
In fact, why not cut to the chase and just use $regexp to begin with?
I'm on crack. Are you, too?
Comment #40
Damien Tournoud CreditAttribution: Damien Tournoud commentedIn fact, we are changing the API here for no good reasons. All known modules (enabled or not) are updated in D6. Here we will only pick the enabled modules. I don't believe we need to specialize the inner regexp per module, let's just pick whatever is before the '_update_N' and take that as the module name, just as before.
Comment #41
drewish CreditAttribution: drewish commenteddmaz, what do you mean by "updated in D6"?
Comment #42
jrchamp CreditAttribution: jrchamp commented@Drewish, I think he means that whenever a module is "known" to the Drupal installation that it runs the associated updates whether that module is enabled or not. The difference being that module_list() won't return the disabled modules whereas drupal_get_schema_versions() could in theory be used on a module that isn't enabled but whose functions are defined. I'm not sure how that works, but it seems like a slight change in functionality. Though it's probably better if disabled modules don't get included anyway given that a broken module which is disabled could then fatal error the admin.
@webchick, #13 and #15 relate to your solution. They wanted to make it as small an impact as possible. However, I agree that matching with the additional literal '_update_' in addition to the '\d+$' would likely be ideal in minimizing false positives while hopefully not slowing the matching performance.
@Damz, were you thinking something like this?
Comment #43
jrchamp CreditAttribution: jrchamp commentedRolled a patch with the changes from #42 and a variable overwrite issue for $module.
- $regexp = '/^(' . implode('|', module_list()) . ')_update_(\d+)$/';
+ $regexp = '/^(.+)_update_(\d+)$/';
- foreach (preg_grep('/_\d+$/', $functions['user']) as $function) {
+ foreach (preg_grep('/_update_\d+$/', $functions['user']) as $function) {
- list(, $module, $number) = $matches;
- $updates[$module][] = $number;
+ list(, $module_name, $number) = $matches;
+ $updates[$module_name][] = $number;
Any objections?
Comment #44
jrchamp CreditAttribution: jrchamp commentedComment #46
jrchamp CreditAttribution: jrchamp commentedNot ideal, but this passes the tests. catch, does this still provide a benefit or does it defeat the purpose?
Comment #48
jrchamp CreditAttribution: jrchamp commentedThe tests still pass for me. Kicking the test bot.
Comment #49
catchThis is still faster, but not as fast as previous versions:
HEAD about 30%, patch about 14%. Well worth committing but not as dramatic as previous versions (although possibly they were dramatic because they broke things...).
Comment #50
alexanderpas CreditAttribution: alexanderpas commented- is that if statement inside the foreach "unneeded", or do the the two regexp used not return exactly the same?
- isn't a strpos and and two substr faster than the preg_match and the list?
Comment #51
jrchamp CreditAttribution: jrchamp commentedThe preg_match matches are subsets of the string, whereas the preg_grep matches are a subset of the array. The preg_grep operates on an array of items and returns the matching items. The preg_match operates on a single item and returns the substrings of the items.
The problem with strpos is that we would need to do strrpos and two substr's. The list() is a language construct and not a function so it doesn't incur the function overhead. It is possible that strrpos and two substr's would be faster than a function call which uses a regex, but I'd want to see some numbers before I'd believe it. The nice thing about using the same regex multiple times is that it should already exist preprocessed within the regex engine. If you're considering running the numbers, you might also want to try:
Comment #52
jrchamp CreditAttribution: jrchamp commentedcatch, this (with the change I proposed in #51) saves us a preg_match for each matched function. Of course, this would conflict with modules whose name contains "_update". Currently, that list consists of http://drupal.org/project/jquery_update and http://drupal.org/project/jquery_form_update. Of those, the only current function which is applicable to Drupal 7 is jquery_update_update_7000. Perhaps mfer could weigh in on this? The whole point would be a performance and readability gain, so I'm not all that excited about doing an array_pop() and an implode().
Comment #53
CorniI CreditAttribution: CorniI commentedbased on the patches above, I rolled a version which a) should maintain D6's behauvior and b) ist (profiled) faster than the patch by jrchamp on the admin index.
I query {system} for all modules which drupal has knowledge about. This is okay, because modules just uploaded, but never installed are not shown there. If a module wasn't ever installed, we don't have to update anything for it. That approach is slightly slower than the API change catch introduced.
Comment #54
catchThis seems like a decent approach. Cornil - could you post a screenshot from your profiler?
Comment #55
CorniI CreditAttribution: CorniI commentedI missed the duration of the patch request on the screenshot, it's 590ms.
I don't know how to make screenshots showing the whole thing, so just the importan sections here, sorry.
Comment #56
CorniI CreditAttribution: CorniI commentedwhitespace+80cols fix
Comment #57
chx CreditAttribution: chx commenteduse
$regexp = '/^(?P<module>' . implode('|', $modules) . ')_update_(?P<version>\d+)$/';
and then$updates[$matches['module']][] = $matches['version'];
way more pleasant to read isnt it?Comment #58
CorniI CreditAttribution: CorniI commentedit is, and removes the awkward list construct :)
Comment #59
chx CreditAttribution: chx commentedGood job! Benchmarks in #55.
Comment #60
jrchamp CreditAttribution: jrchamp commentedI am concerned that the preg_grep does not contain '_update_'. Does it really hurt the performance to be a little more specific? Thank you for coming up with a better solution for this one.
Comment #61
CorniI CreditAttribution: CorniI commentedYes, in that case the performance hit is measurable, because the regex needs to be applied to every array key string. The smaller, the better, and we filter out later the functions we need anyways.
Comment #62
Dries CreditAttribution: Dries commentedReviewed and committed to CVS HEAD. Thanks for the careful tweaking and performance testing!
Comment #63
catchSame code is in D6 too.
Comment #64
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis patch is broken. Install a clean HEAD: the status report says "Database updates: out of date".
Comment #65
CorniI CreditAttribution: CorniI commentedDries/webcick, can we get a rollback on this so I can work without time pressure on the issue?
thanks!
Comment #66
clemens.tolboomIt broken my tests #211182: Updates run in unpredictable order too :(
Comment #67
andypostsubscribe :(
Comment #68
jrchamp CreditAttribution: jrchamp commentedThis appears to resolve the installation issues, but is definitely a step backward in performance. The static isn't being reset after a new module is installed, and the system module behaves differently than all the others.
Comment #69
Dries CreditAttribution: Dries commentedSorry, I was on a plane yesterday. I'll wait for the test results on #68 and either go with #68 or rollback the previous patch.
Comment #70
Dries CreditAttribution: Dries commentedI committed the patch in #68 as the tests came back green. Thanks @jrchamp.
Marking this 'code needs work' so we can brainstorm on better solutions, and so we can work on better tests. The latter seems to be a requirement given that none of us reviewers spotted the brokenness in the previous patch.
Comment #71
jrchamp CreditAttribution: jrchamp commentedThis patch resolves the installation issues, maintains the performance gains in the admin, but may have unexpected issues with modules added after installation. Also, this doesn't add any tests, something that Dries requested in #70 (which is why I'm not marking this needs review).
Need some feedback here, as this may not be the best solution. For reference, the reason that we don't use the installed modules list is that the 'system' "module" is not in the system table during installation before it calls this function.
Comment #72
catchSince the last patch broke the tests at http://drupal.org/node/211182#comment-2073754 - and it looks like that already covers which updates to expect for which module, it'd make sense to consolidate the testing efforts between the two issues.
Comment #73
catchYep, #211182: Updates run in unpredictable order has copious tests which we don't need to duplicate, seeing if this still applies.
Comment #74
Crell CreditAttribution: Crell commentedSubscribing.
Comment #76
sun.core CreditAttribution: sun.core commented#71: schema_versions-521838-71.patch queued for re-testing.
Comment #77
Gábor Hojtsy@jrchamp: can you qualify those "unexpected issues" which "might" appear with the patch. That's not too clear.
Comment #78
jrchamp CreditAttribution: jrchamp commented#71 attempts to achieve the performance benefits of not re-caching the module updates unless we know that the list of modules is changing. The two locations where the list is refreshed as identified by the patch are required for a successful installation. However, my concern about unexpected issues are for situations where modules are loaded after the first time that this function executes. Any location which loads a module after the first time this function is run will need to clear the static for this function.
Rather, as the situation that would cause a problem for CVS HEAD and #71 is when a module is freshly loaded, it would probably be worthwhile to alter this function to set entries in the array which are "empty" even for modules which have no updates. As such, this function would have performance similar to #71 (assuming the module_list is already cached or will be cached during normal execution), but without the need to reset the static in multiple places within the codebase.
I have attached the patch, which assumes that module_list provides the list of "loaded modules" as the documentation implies. I did not get a chance to benchmark this, but did perform a couple fresh installations to verify basic functionality.
Comment #79
catch@jrchamp - it might be worth looking at using system_list() directly here instead of module_list() (that's where the data is got from, and also cached now) - would that help at all? Doesn't have the same bootstrap / fixed_list / normal craziness of module_list()
Comment #80
Gábor HojtsyHow does this improve performance? What I'm seeing is that by checking the return value a bit earlier, we save calling max(FALSE) and comparing the resulting 0 to another number. This looks like little code reorganization for barely noticeable performance benefits at best. Am I missing something?
Comment #81
jrchamp CreditAttribution: jrchamp commentedThe max(FALSE) check is really a bugfix that I noticed during testing. It prints an error to the screen when it is doing updates saying that the parameters to max() are incorrect.
The one small change is the following: if we assume that the modules loaded into memory (thus, the functions that would show up in the get_defined_functions() call) are returned by the module_list() / system_list() function call, then we can say "don't recache the updates array when you're checking to see if updates exist for a module which doesn't define any updates". The switch from isset() to empty() is to take into account the situation where the module either: !isset() || (is_array() && count() == 0)
Example:
If there was a module named 'mymodule' which had no updates, the updates array might look like this:
$updates = array(
'system' => array(7000, 7001, 7002),
'mymodule' => array(),
);
When we call drupal_get_schema_versions for 'mymodule', it will used the cached version of the updates array, rather than refreshing it simply because 'mymodule' didn't define a list of updates.
drupal_get_schema_versions('mymodule');
Honestly, I don't think that module_list() or system_list() are the functions that I want. I'm pretty sure that I want to be able to ask the static within drupal_load what modules it has already loaded, because it looks like system_list() does a query against the database to see what modules it thinks are loaded rather than giving me the list of modules whose module_update_number functions have already been loaded. Do either of you know where I can get that data? I realize that if module_load_all has already been run, then module_list will at least match the items in memory, but I wasn't sure if I could rely on it.
@catch: If system_list is what I want, you're saying that I should call: system_list('module_enabled') correct?
Comment #82
jrchamp CreditAttribution: jrchamp commented@Gábor Hojtsy - Does #81 make sense?
Comment #83
catch@jrchamp: system_list('module_enabled') ought to work yeah.
Comment #84
jrchamp CreditAttribution: jrchamp commented@catch: After comparing http://api.drupal.org/api/function/module_list/7 and http://api.drupal.org/api/function/system_list/7, it looks like module_list is going to be faster. After all, it's the only place in core that called system_list('module_enabled'), and doesn't have the drupal_static / cache_get calls.
Quick summary for others just getting here: where #68 (HEAD) plays it safe and only remembers which modules define update functions, #78 attempts to "remember" all modules which have been loaded so that modules which do not define update functions will not force a cache refresh on drupal_get_schema_versions(). module_list() gets called many times during a normal page load, so its output is going to be cached already. #78 is only really worthwhile if drupal_get_schema_versions() is refreshing its cache unnecessarily when using #68 (HEAD).
Comment #85
catch@jrchamp: are we sure that this function is only ever called after a full bootstrap, if so that sounds fine, if not system_list() guarantees that boostrap modules aren't returned but yeah there'd be no other reason to use it.
Comment #86
jrchamp CreditAttribution: jrchamp commented@catch: That's the problem. Because module_list relies on system_list, which pulls its data from the database, it only works after all modules have been loaded. Really, what I would want is for a function like the following to be called whenever a module is loaded into memory (10-20 calls of a very simple function):
function loaded_module_list($module = NULL) {
static $modules = array();
if (isset($module)) {
$modules[$module] = $module;
}
return $modules;
}
The nice thing is that we won't even have to have a reset parameter, because the active process can't "unlearn" the code it has loaded.
Does this seem valuable? Other suggestions?
Comment #87
grendzy CreditAttribution: grendzy commentedI tested #78 with all core modules enabled. There was no measurable performance difference on /admin, using `ab -C`. Is there something else I should be testing?
Comment #88
catchafaik there is no measurable performance improvement with the latest patch, the 30% was knocked down to 1 or 2%, this is just clean up now (and maybe saves a few cycles in the process, but nothing you could measure with ab). Because of that I'm downgrading from critical.
Comment #89
grendzy CreditAttribution: grendzy commentedcatch, thanks for clarifying. #78 seems like a sensible cleanup. It does result in fewer loops through get_defined_functions().
Comment #90
grendzy CreditAttribution: grendzy commentedas pointed out above, hook_update_dependencies() added tests so we should be covered.
Comment #91
webchickIt looks like Gábor's comments were addressed. The patch looks fairly harmless (which probably means it will break everything ;)).
Committed #78 to HEAD.
Comment #93
David_Rothstein CreditAttribution: David_Rothstein commentedInteresting followup issue here: #894530: Update system calls drupal_get_schema_versions() for lots of uninstalled modules, leading to a performance hit
Comment #94
seanburlington CreditAttribution: seanburlington commentedHi,
I've been looking into why the admin page on my clients site is so slow
This function seems to be the culprit - as far as I can see no patch has yet been provided for Drupal 6 - so here's mine
It's a very quick piece of code with the aim of making minimal changes
What it does it to cache an array of functions with the string 'update' in them
This is a much smaller list that all functions
So iterating over this list for every module is less painful.
I make no claim to this being anything like a perfect solution - but it performs bearably well on a site with 200 modules !
Comment #95
seanburlington CreditAttribution: seanburlington commentedbah! last patch was too quick and didn't work!
This once still has a speed up - but also works
iterate over 600 functions with _update_ in then instead of 7000 total user functions - for each module
Comment #96
drewish CreditAttribution: drewish commentedThere's some obvious code standards issues (tabs rather than spaces) and I wonder if it makes sense to just move that functionality into drupal_get_schema_versions() rather than having the _drupal_get_update_functions() helper function.
Comment #97
jrchamp CreditAttribution: jrchamp commented@seanburlington: Try this patch out. It is much faster on my system.
Comment #99
jrchamp CreditAttribution: jrchamp commentedI must have missed the Git memo. That might explain the other problems I was having. The newly diffed patch attached to this comment should work.
Comment #101
kenorb CreditAttribution: kenorb commented+1
Comment #102
jrchamp CreditAttribution: jrchamp commented#99: drupal6_schema_versions-521838-99.patch queued for re-testing.
Comment #104
jrchamp CreditAttribution: jrchamp commentedIt looks like the test bot wasn't applying patches back then.
Comment #105
mikeytown2 CreditAttribution: mikeytown2 commentedDid some benchmarks on our large site
D6 Core: 2112ms
Patch #104: 159ms
D7 backport (this patch): 10.7ms
Added link to this patch to the D6 Performance wiki as well.
Comment #106
BrockBoland CreditAttribution: BrockBoland commentedNeeds issue summary (partial in #84)
Comment #107
softescu CreditAttribution: softescu commentedWe have two different patches for D6 and D7. Patch #105 for D6 approach is different than patch #71 for D7 (summary at #78). I am going to recheck now to see if the patches still applies to D6 core and D7 core.
Comment #108
mikeytown2 CreditAttribution: mikeytown2 commented@softescu
#91 committed #78 which is based off of #71. #105 is a backport from whats in D7.