Regression: Unify and rewrite module_rebuild_cache() and system_theme_data()
dww - May 26, 2007 - 08:39
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | task |
| Priority: | normal |
| Assigned: | Dave Reid |
| Status: | closed |
| Issue tags: | Needs Documentation, Patch Spotlight, Performance |
Description
in working on http://drupal.org/node/146910, i noticed that module_rebuild_cache() and system_theme_data() do almost exactly the same thing, but have wildly different names. we should clean this up. a couple of options:
$modules = system_data('modules');
$themes = system_data('themes');$modules = system_module_data();
$themes = system_theme_data();

#1
This cleanup/unification would also be a good place to add the hook_alter_info() I'm proposing over at http://drupal.org/node/152926.
#2
subscribing. personally i like #1 better.
#3
maybe we should call it
drupal_system_data()to defend ourselves from system.module needing to implement hook_data() at some point in the future?#4
or maybe "system_get_data($type)"? too vague? maybe my original plan for "system_theme_data()" and "system_module_data()" (which would obviously share lots of the same helper functions) is the way to go? i'm too exhausted now to think clearly.... someone please tell me the right name(s) to use. ;)
#5
digging to the two functions i'm having a bit of trouble figuring out how to unify them into a single system_data() function. i think shared helper functions might be a better way to go.
#6
yeah, i agree. sad to say it, but i think s/module_rebuild_cache/system_module_data/ is about the best we can do here. there are a few things that could be shared, but not many. :( it's possible we could reorganize where all the code lives, etc, but that doesn't seem particularly urgent. i think i'm just going to put this on hold and worry about http://drupal.org/node/152926 first.
#7
I'm going to take this up as I'm also working on rewriting system_theme_data in #289369: Deleted modules are not removed from system table.
#8
Small request: Let's get a verb in that function name (
{scope}_{verb}_{thing_that's_being_verbed}()). I don't like either 1. or 2. for that reason, although I agree that unifying their names is a good move.#9
Here's my first effort on this. Summary of changes:
- module_rebuild_cache() has been moved from ./includes/module.inc to ./system/system.module for consistency
- module_rebuild_cache() has been renamed to system_get_module_data()
- system_theme_data has been renamed to system_get_theme_data()
- Created helper function _system_get_module_data
- Renamed helper function _system_get_theme_data
- Refactored system_get_files_database(&$files, $type) to system_get_files_database(&$files, $type, $update = FALSE)
- When $update = TRUE, the function will update the system table based on the information provided in the $files database (insert new records for new discovered modules/themes in filesystem, updates changed records, deletes records for modules/themes no longer found in the filesystem)
I tested install, update, module enabling/disabling/installing/deleting, theme switching, basically anything involving these functions. Tests all pass for me. The only thing I'm unsure of is if I should split the update functionality out of system_get_files_database() and make a new function called system_update_files_database().
#10
Here's my second revision. I split my original system_get_files_database into two functions. The original system_get_files_database and system_update_files_database. I updated the logic so only changes found in module or theme information will be actually updated into the database instead of blindly updating every single record every time. After testing compared to the original methods, the amount of insert/update/delete queries to the database were reduced to barely nothing with no side effects or issues:
Page: admin/build/modules
Task: viewing page, no changes being made
Original D7 code: 136 write queries
Patched D7 code: 0 write queries
Page: admin/build/modules
Task: after uninstalling and removing devel module from file system
Original: 121 write queries
Patched: 5 write queries
Page: admin/build/modules
Task: after installing devel module to file system
Original: 136 write queries
Patched: 5 write queries
Page: admin/build/themes
Task: viewing page, no changes being made
Original: 57 write queries
Patched: 0 write queries
I'll try to gather some data on read queries.
#11
Very cool, thanks. #10 sounds especially promising, though I haven't had a chance to look at the patch. Keep up the great work!
#12
subscribing
#13
Making a note to myself to keep an eye out for when #259412: Total module system revamp lands in case I need to re-roll patch.
#14
Backported to version 6.4 here: http://drupal.org/node/307756
#15
Rerolled and bumped for review.
#16
Re-rolled for HEAD, but the patch is causing a errors on install, so I need to take a look at what's going wrong.
#17
Can't test for D7, but my backport to D6 also caused errors, which I fixed. Maybe you can compare with my (working) D6 patch at http://drupal.org/node/307756
#18
Fixed some comment coding standards, will need to test if this still applies. Too tired right now...
#19
Re-rolled for head. I also did some actual ab performance testing. I like the results, and I bet you will too!
admin/build/modules before patch:
Requests per second: 0.96 [#/sec] (mean)
Time per request: 1042.654 [ms] (mean)
Time per request: 1042.654 [ms] (mean, across all concurrent requests)
Transfer rate: 34.40 [Kbytes/sec] received
admin/build/modules after patch:
Requests per second: 1.01 [#/sec] (mean)
Time per request: 988.747 [ms] (mean)
Time per request: 988.747 [ms] (mean, across all concurrent requests)
Transfer rate: 36.27 [Kbytes/sec] received
admin/build/themes before patch:
Requests per second: 3.48 [#/sec] (mean)
Time per request: 287.024 [ms] (mean)
Time per request: 287.024 [ms] (mean, across all concurrent requests)
Transfer rate: 71.66 [Kbytes/sec] received
admin/build/themes after patch:
Requests per second: 3.82 [#/sec] (mean)
Time per request: 261.681 [ms] (mean)
Time per request: 261.681 [ms] (mean, across all concurrent requests)
Transfer rate: 78.60 [Kbytes/sec] received
With only a bit more processing, we're getting a huge decrease in the hits to the database (see # 10) and a very useful performance improvement!
#20
The last submitted patch failed testing.
#21
subscribe. i got to looking at module_rebuild_cache when wondering how to make tests run faster.
its a beast, and we call it too often.
#22
The patch looks good by itself.
A couple of issues:
<?php+ // Need to make a safe, modifiable copy of the $files array since PHP
+ // automatically makes references to objects instead of copies.
+ $files = array();
+ foreach($files_updated as $key => $file) {
+ $files[$key] = clone $file;
+ $files[$key]->info = serialize($files[$key]->info);
+ }
?>
I don't buy it. If the only difference is serialize() the info key or not... we can easily do better then doubling memory usage.
<?php+ // Scan fields to find only the updated values.
+ $file->updated_fields = array();
+ foreach ($file as $key => $value) {
+ if (isset($files[$file->name]->$key) && $files[$file->name]->$key != $value) {
+ $file->updated_fields[$key] = $files[$file->name]->$key;
+ }
+ }
+
+ // Update the record.
+ if (count($file->updated_fields)) {
+ db_update('system')
+ ->fields($file->updated_fields)
+ ->condition('filename', $file->old_filename)
+ ->execute();
+ }
?>
It would probably be faster to just look if *at least one* field has changed, pass the whole $file to db_update, and let the database deal with it.
<?php+ // File is not found in filesystem, so delete record from the system table.
+ db_delete('system')
+ ->condition('filename', $file->filename)
+ ->execute();
+ }
?>
Two words: file system.
<?php+ // All remaining files are not in the system table, so we need to add them.
+ foreach($files as $file) {
?>
A space is missing after foreach.
<?php+ db_insert('system')
+ ->fields(array(
+ 'filename' => $file->filename,
+ 'name' => $file->name,
+ 'type' => $type,
+ 'owner' => isset($file->owner) ? $file->owner : '',
+ 'bootstrap' => isset($file->bootstrap) ? $file->bootstrap : 0,
+ 'info' => $file->info)
+ )
+ ->execute();
?>
This is badly formatted.
<?php+ // Set defaults for module info
?>
Dot missing at the end of the sentence.
<?php+ // Read info files for each theme
+ foreach ($modules as $key => $module) {
?>
I guess that's "Read info files for each module". Plus the final dot is missing.
<?php+ // Log the critical hooks implemented by this module.
+ $modules[$key]->bootstrap = 0;
+ foreach (bootstrap_hooks() as $hook) {
+ if (module_hook($modules[$key]->name, $hook)) {
+ $modules[$key]->bootstrap = 1;
+ break;
+ }
}
?>
Do we still need this? I thought we removed the notion of bootstrap module.
<?php+ // Set defaults for theme info
?>
Same remark as above.
#23
Awesome review DamZ! I think this was one of my first core patches (seeing as the bootstrap stuff is still there), and it obviously showed. :) I'll have to revisit this to get it back in tip-top shape.
#24
#25
#26
Subscribe.
#27
Anyone working on this? Looks like it would be a critical performance improvement...
#28
Re-roll... that was.. erm... fun... NOT ;)
About DamZ's review..
Changed that function quite a bit, removed whole additional $files array, beneath others things. There is some special casing now for the info attribute, but it's quite simple and the whole function is shorter.
Not sure about that. how would we do that? a break after the first mis-match?
Changed the other things...
One thing that I don't like is filepath vs. filename. We use filename in {system} but filepath everywhere else. Imho, that's quite confusing. and filepath is the proper name, because the path is included too. We should rename the column in {system} imho..
#29
From an API point of view, this looks like a nice clean-up.
-
@param $files_updatedshould be $files.- The following code looks a little funny:
+ system_get_files_database($themes, 'theme');+ system_update_files_database($themes, 'theme');
#30
- Changed $files_updated to $files
Yes it does, but it does make sense.. that is the workflow (same for themes):
1. Read the file system and looks what's there (_system_get_module_data())
2. Extend that information with things that are only in the database, for example, status and schema_version (drupal_get_files_database)
3. Update/Insert/Delete {system} based on the new information we have. (drupal_update_files_database())
Maybe we could combine step #2 and #3, but I tried and failed (resulted in the ~10th drupal re-install for that patch ;)) or rename step #2 to extend instead of get or something like that.
- Introduced a static cache for system_get_module_data(). It is currently called twice, and we did remove a call from the registry (#429132: Remove unecessary module_rebuild_cache() call from _registry_rebuild()). If we have a static cache, we can revert that change, because using the statically cached information is obviously faster. It does need a bit more memory, but we save that atleast on admin/build/modules because we don't have to load and parse all those .info files three times. The static cache can currently not be reset except of using drupal_static_reset() directly..
Some devel "benchmarks"
Without patch:
Executed 919 queries in 152.26 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 990.35 ms.Executed 919 queries in 170.06 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 1116.68 ms.
Executed 920 queries in 164.28 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 1034.99 ms.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.79 MB, PHP peak usage=27 MB.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.79 MB, PHP peak usage=27 MB.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.79 MB, PHP peak usage=27 MB.
Patch without static cache:
Executed 796 queries in 134.05 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 946.37 ms.Executed 798 queries in 135.79 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 931.35 ms.
Executed 798 queries in 137.66 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 919.3 ms.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.17 MB, PHP peak usage=25.5 MB.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.14 MB, PHP peak usage=25.5 MB.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.17 MB, PHP peak usage=25.75 MB.
Path with static cache and #429132: Remove unecessary module_rebuild_cache() call from _registry_rebuild() reverted:
Executed 796 queries in 137.8 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 880.64 ms.Executed 795 queries in 136.18 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 855.15 ms.
Executed 795 queries in 138.14 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 875.1 ms.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.56 MB, PHP peak usage=24.75 MB.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.56 MB, PHP peak usage=24.5 MB.
Memory used at: devel_boot()=0.78 MB, devel_shutdown()=15.56 MB, PHP peak usage=24.5 MB.
#31
The last submitted patch failed testing.
#32
Forgot to remove a debug call...
#33
Bad update.module fooled my :)
#34
Passed: 149 passes..
That's not exactly enough ;) I assume the tests don't like the static cache..
#35
I don't think I can get the static cache working without harming kittens, removed those things again. Maybe we can thing about this again after #481498: Update simpletest module to use drupal_static()
#36
Looks good. Committed to CVS HEAD. I'm marking this 'code needs work' because we'll want to update the upgrade documentation. Please mark as fixed once the handbook is updated.
#37
Thanks Berdir for picking this up where I left off. It was much needed.
#38
This was not actually marked "needs work" :)
#39
- // Set defaults for module info- $defaults = array(
- 'dependencies' => array(),
- 'dependents' => array(),
- 'description' => '',
- 'package' => 'Other',
- 'version' => NULL,
- 'php' => DRUPAL_MINIMUM_PHP,
- 'files' => array(),
- );
+ // Set defaults for module info.+ $defaults = array(
+ 'dependencies' => array(),
+ 'dependents' => array(),
+ 'version' => NULL,
'php' => DRUPAL_MINIMUM_PHP,
);
Notice: Undefined index: package in system_modules() (line 648 of modules/system/system.admin.inc).I'd guessed that this is a design change, but when I mentioned it on IRC Heine said it might have been an accidental regression.
Is it intended that modules must now set a package in their .info files, and the 'Other' package no longer exists?
#40
I'd say that's a bug. For themes, there were 2 defaults missing too, but these caused errors and I corrected them.
I will provide a fix tomorrow.
#41
Also, the committed patch contains
- $themes = _system_theme_data();- $modules = module_rebuild_cache();
+ $themes = _system_get_theme_data();
+ $modules = system_get_module_data();
Shouldn't _system_get_theme_data() be system_get_theme_data() ?
[edit: actually _system_get_theme_data() is called directly in a few other places - not sure that is intended ?]
#42
@yched/41: No not at all, since we don't want to load the cached data. That's why the original call is _system_theme_data() instead of system_theme_data().
#43
@yched
_system_get_theme_data() is called at a few places when the database shouldn't be updated. That function does just return what's on the file system while system_get_theme_data() does extend that with the additional information from the database and stores there what has changed.
IIRC, for themes, this lead to a bug when system_get_theme_data() disabled all themes when it is called in update.php
Edit: Maybe a different name would help to indicate this? Something like system_scan_theme_data() ?
#44
#45
Tested Berdir's latest patch, and it seems to do the trick.
#46
Committed. Thanks Berdir, and thanks for testing brianV.
#47
Added a short note about the renamed functions: http://drupal.org/node/224333#system_get_module_data
#48
The re-roll done in comment #28 broke #351487: Remove default values for stylesheet and scripts includes from system module. It looks like an 8-month-old, out-of-date patch was used to move code around instead of actually moving the up-to-date code to the new spots.
From a quick glance at the patch, I also see "comment_user_verification" is gone from the theme settings defaults.
What else was broken?
#49
Sorry for breaking all these things, you start to rely too much on tests if you have them ;) And it probably went in a bit too fast....
The attached patch re-adds comment_user_verifcation and removes mission and css/js stuff. This was all I could find.
#50
The last submitted patch failed testing.
#51
The script regressions part of the regression has already landed in #351487: Remove default values for stylesheet and scripts includes from system module.
Also marking #497442: Module uninstall leaves system table row as a duplicate of this issue.
#52
Re-roll with the remaining changes.
#53
Berdir, sorry about the conflicting patches! :-)
This patch is RTBC!
#54
Good catch, John. Committed to CVS HEAD. Thanks.
#55
Automatically closed -- issue fixed for 2 weeks with no activity.