This is a parallel issue to #340723: Make modules and installation profiles only require .info.yml files, and comes from the same discussion.
Despite the size of this patch, all it does is move all implementations of hook_theme and hook_menu to a $module.registry.inc file. These are "registry style" hooks; that is, they get called once in a blue moon, cached or saved to the database, and then never used again except during development or reinstalling a module or such. They are completely wasted bits. By moving them to a separate file, we no longer waste bits on every single page load. And thanks to the registry, all we have to do is tell the info file about the new code file and the rest is handled magically.
The structure here was already discussed with webchick, and she said she was OK with $module.registry.inc as a name format so that's what I used. Although each hook is fairly small, the total size spread out across all modules a typical site will have is non-small. It's also the "low-hanging fruit" toward making .module files nearly empty in most cases, which makes the bootstrap process enormously cheaper.
This is exactly what the D7 registry was written to enable. :-)
There are no doubt other hooks that can be easily moved out to this or other files, but this is a good first start. Baby steps...
I'm marking this as critical because it is 100% guaranteed to collide with something simply due to its size, so the sooner it gets in the better.
I hope the bot doesn't have a problem with it. If it doesn't, then there's really not much else to comment on as this is just moving code around in ways that have already been discussed. (*prays that he doesn't have to reroll this behemoth too many times.)
| Comment | File | Size | Author |
|---|---|---|---|
| #107 | apc_head.png | 49.95 KB | catch |
| #107 | apc_patch.png | 57.98 KB | catch |
| #106 | drupal-split-total2.patch | 849.84 KB | sun |
| #106 | registry.php.txt | 14.16 KB | sun |
| #106 | rebuild.php.txt | 473 bytes | sun |
Comments
Comment #1
Crell commentedAs support for why we're doing this, I point people toward the benchmarks for the limited splitting we did in Drupal 6: http://www.garfieldtech.com/blog/benchmark-page-split
This is the natural evolution of that work, via the registry.
Comment #3
swentel commentedYeah, go Crell! :) Subscribing to test this patch.
Comment #4
swentel commentedNote: install completely fails, wsod, looking into it now.
Comment #5
swentel commentedThis patch should make the testbot happy - at least for install - hope I didn't forgot any module.
Update still gives wsod, that's for later this day.
Edit: including menu.registry seemed to be needed because otherwhise, your initial menu will look very weird :)
Comment #6
catchNice. Will run some benchmarks later.
Comment #7
swentel commentedChanging status to get results from bot.
Comment #8
Crell commentedI knew I was going to forget one. Thanks, swentel.
Comment #9
moshe weitzman commentedLovely. This can be justified just by code cleanliness. Any benchmarked speed bonus is nice but not a prerequisite IMO.
Comment #10
Anonymous (not verified) commentedsubscribe.
Comment #11
Crell commentedEnough with the subscribing. Someone mark this RTBC before it breaks. :-)
Comment #12
swentel commented@crell update.php still gives a wsod, I'll try to look into it today - but if someone else catches it, be my guest :)
Comment #13
catchMarking to needs work for update.php
Comment #14
swentel commentedOk, did some investigation, _theme_build_registry in theme.maintenance.inc returns no theming functions at all with this patch. I'm off for two hours now, I have some time later, again, that shouldn't stop anyone else of course :)
Comment #15
panchoSorry, I didn't catch the update.php bug. :( But I see another point:
Do we really need menu.registry.inc in install.php?
I threw it out and my install worked well.
If there should be a reason to include it though, the $module_list array keys should be consistent:
instead of:
Comment #16
panchoDidn't mean to change status.
Comment #17
swentel commented@pancho: I made the code more consistent. If you leave menu out, the install will be ok, but links to Main menu, Navigation, Secondary menu will be in your root navigation tree.
Pach attached fixes update.php and also a small typo in profile.info.
Comment #18
panchoCool! You caught that little bastard! :)
However, wouldn't it be cleaner to add
require_once DRUPAL_ROOT . '/modules/system/system.registry.inc';to update.php (line 650) instead of havinginclude_once DRUPAL_ROOT . '/modules/system/system.registry.inc';in _module_implements_maintenance()?We obviously need it only for update.php and not in other cases of maintenance mode. It seems to work here.
Concerning menu.registry.inc in install.php:
I'm not sure if I got what you mean, but on my freshly installed site, everyting seems to be fine, though I had menu_registry thrown out of install.php. Maybe you or someone else can check that?
Comment #19
swentel commentedMoving include to update.php, makes sense and still works :)
Someone else should test play with leaving out menu.registry from install.php and see what happens.
There is a png attachment which shows what I get.
Comment #20
damien tournoud commentedHum. We have system_registry and a menu_registry modules, now?
This needs a comment at the minimum. But a discussion would be better. Why not moving the theme function defined in
system_theme()somewhere else? For example: automatically add them in theme.inc, where it would make full sense? The less we keep of that absurd system module, the better.This looks like a bug in the module installation code.
Comment #21
damien tournoud commentedOk, this is linked to the fact that module_implements() has a special case in maintenance mode in which it doesn't use the registry.
The special case for the menu module is just wrong: this can potentially break *any* module hook_enable(), including contrib ones if installed in an installation profile.
What we would need is to change the
if (defined('MAINTENANCE_MODE'))check in module_implements() to something likeif (defined('MAINTENANCE_MODE') && drupal_get_bootstrap_phase() < DRUPAL_BOOTSTRAP_FULL), and insure that the registry is fully built as soon as we enter the BOOTSTRAP_FULL mode (ie. we will probably need to add a registry_rebuild() in drupal_system_install().Comment #22
swentel commentedSimply adding following line in install.php on line 37 also fixes install, anyway, do install & update need more than the theming functions defined in system_theme() ?
Comment #23
damien tournoud commented@swentel: In #21, I was discussing the menu.registry.inc case.
Comment #24
Crell commentedAs long as the registry is active theme functions should be able to live anywhere, shouldn't they? We only need to special case theme functions that are needed pre-database. I'd actually prefer to not see more stuff in theme.inc, as that gets loaded and parsed on every page load.
Comment #25
recidive commentedThose two pseudo modules system_registry and menu_registry look like hacks IMO. Are they really needed?
Also
// $Id: $should be// $Id$on new files.Other than that, the patch looks nice.
Comment #27
dropcube commentedAlso would be good to rename those "registry style" hooks following this pattern: {module}_register_{items}. For example:
{module}_register_blocks() for blocks
{module}_register_menu_items() for menu items
{module}_register_permissions() for permissions
{module}_register_themes() for theme hooks
...
this would add consistent name conventions and would differentiate those "registry style" hooks from other hooks.
Issue: #357321: Rename hook_permission() into hook_permission_info() for consistency
Comment #28
Crell commentedThere's already a building standard, IMO, for using the _info() suffix for such hooks. I proposed that on the dev list months ago, and the only disagreement was on just how much caching was appropriate for such hooks. I'll follow-up on the other thread momentarily, but for this thread I was very very specifically JUST trying to move code around, since I figured that would be easier. Apparently it isn't quite that easy thanks to the silliness that is the installer. Someone really should get back to figuring that out...
Comment #29
Anonymous (not verified) commentedin discussion of catch's performance talk, bjaspan asked whether using an opcode cache would mean that we wouldn't gain any real memory reduction from the registry.
we need to test this, because memory reduction is one of the justifications for the registry and for this patch. if we don't get any/a very small gain, we need to know this.
attached is a simple bash script to generate some files to be included, and two files that load them. one loads them all, one half. put this in a webroot in your dev environment and run it to get a simple way to test loading lots of files.
i've been testing these scripts with ab, and watching memory usage of apache processes with and without APC, and even with APC there is a large growth in the memory size of apache child processes. this would seem to indicate that reducing the number of files loaded will be a gain even with an opcode cache, but i'm sure we can refine this test.
Comment #30
bjaspan commentedsubscribe
Comment #31
Anonymous (not verified) commentedtagging with Registry
Comment #32
sunThis is one of those patches we will likely get much faster in when we
1) start with one prototype implementation
2) get that committed
3) roll sub-sequent patches (in this issue) until all of core is converted (perhaps 2-3 modules at a time)
Otherwise, this beast of a patch needs to be _carefully_ re-rolled all over the time, ensuring that no other commit changed any of those hook implementations.
Comment #33
Crell commentedI believe the only issue was with system.module, so I'll agree with sun in #32. Time for module-at-a-time patches. Although like DBTNG and the original menu split I'd prefer to do them in separate issues rather than all in one issue. Sounds to me like a new "community initiative" page. :-) Or maybe just a tag...
Comment #34
sunSo let's start with the key module.
- Added hook_perm(). Equally invoked on two pages only (admin and admin/user/permissions).
- Only added manual inclusion of system.registry.inc in install.php and update.php. Tested installation manually and could not replicate that previously mentioned menu error (was most likely a bug caused somewhere else).
Comment #35
sun- system.registry.inc needs to be manually included, because install.php and update.php rely on the path 'batch', and we can't rely on the registry during installation and upgrades.
- FWIW, by appending
drupal_set_message('IN');to system.registry.inc you will see that the registry works and the message is only output on pages that invoke one of the registry hooks. (oh wow! ;)Comment #36
Crell commentedHm, well, that was easy... botsnack!
I'm not entirely convinced that hook_perm() is appropriate as a registry hook, but I'll defer to Dries and webchick on that one. (Registry or info hook are called and cached somehow, which is different than "just kinda rare".) It's not like we can't move it again later, right? :-)
Let's get this in before someone else touches those hooks, then we can move on to other modules.
Comment #37
sunBefore someone assumes this could be set back to needs work... ;)
I'm not sure either, but I tend to prefer #34 though.
Comment #38
sunbtw: Equally fun: #467606: Move system's trigger functionality away
Comment #39
chx commentedWe really need to benchmark the memory usage. Shared hosts and dedicated servers alike use a code cache so ...if we make the life of a code cache harder then that's not good.
Comment #40
catchVia devel. There may be better ways to do this.
To me it looks like it makes 0 difference with APC. Without APC there's a very small consistent memory improvement on the front page.
On admin/build/modules the memory usage varies between requests and I can't get useful information - included below though.
Given this is just a couple of hooks the numbers seem about right to me. I think we need to proceed with this, and then do this again (or ideally something a bit more robust if someone knows how) after 5-6 patches have landed.
Restarted apache before and after applying the patch (and obviously to switch APC on and off).
Comment #41
dries commentedTo me, this looks like a micro-optimization. It would be great if we could illustrate the memory usage improvement before adding new rules and complexity.
Comment #42
catchDries. It's a micro-optimisation because we're only moving system_menu() and system_theme() for an initial patch. As far as I can see this saves us in the region of 100k of memory when APC isn't enabled.
Total file size parsed (via PQP) is 1.59MB vs. 1.57Mb with the patch applied.
Both PQP and devel report about 100k memory saved.
If we trust those numbers, then we're getting 100kb of memory saving for 20kb of code moved and it ought to add up very quickly. The patch tested is 30k, Crell's first patch on this issue was 225k. Looks to me like just by moving hook_menu() and hook_theme() for core modules we'd save in the region of 3-500kb. With 50-100 contrib modules installed (assuming they all conformed to the pattern), then that easily begins to add up to several MB of memory, and just by moving hook_menu() and hook_theme().
There's also the trigger patch which sun posted, hook_perm() a few posts above, and probably most CUD and some other hooks could be moved before we start to see too much trade off in terms of file inclusion.
The only way to test this is to try it. Either we roll and commit 5-10 similar patches quickly so that they can be maintained and rolled back individually, or someone will have to roll another monster patch, and we hope it doesn't break in time to test it and commit it. I'll try to test either, but I'm not volunteering to roll the 250k patch.
If we don't commit this, or it turns out not to have these gains, then I think we need to reconsider the registry entirely. The complexity is there, not in cut and pasting code around.
Comment #43
dries commentedYou're right. I reconsider my position. :)
Comment #44
catchOK so if you're happy committing this. I'll do a similar test once we've committed 4-5 similar patches. Then we'll 1. find out if those numbers were right in the first place 2. see if it actually adds up to anything worth having.
4-5 patches should be easy to revert if it doesn't look good, but it's probably enough to be comfortable going ahead with more conversions if we can get better verification of the gains.
Comment #45
sunComment #46
Crell commentedJust a reminder that I did benchmark the Drupal 6 page split work last year: http://www.garfieldtech.com/blog/benchmark-page-split
This issue is exactly the same as that one; we're just moving code around to make it conditionally included. For those who didn't read the article above, moving page callbacks and many theme callbacks out of the .module file got us a 20% performance boost in D6 compared to had we not done so, and earlier benchmarks in the issue queue showed no impact on opcode-cache-using sites either way. If we can slice off our big definition hooks, that can easily get us another 20% boost, I suspect. If we go even farther and move, say, node hooks off to separate files, we can get even more of a boost.
One of the GHOP projects I mentored last year did a performance analysis of Drupal core, and it identified our two most expensive phases as the theme system (complex and lots of stack calls) and bootstrap (most of it being in the phase where we load all .module files). Code weight is our easiest optimization point right now, and we desperately need optimization points. :-)
I suppose Dries doesn't need any more convincing after #42, but I just wanted to make it clear that this *has* been benchmarked before, repeatedly, and IMO has been clearly shown to be an "easy win" for performance.
Comment #47
sunI reconsidered. I want to make it one shot. But only if you let me... :P
Comment #48
sunThere are things that should only be done once. ;)
Constantly re-rolling something that can be done by a machine is not the type of job we should have to deal with.
Put attached registry.php into the root directory of D7 and invoke it, i.e. http://yoursite/registry.php. Afterwards create a patch.
Now I want to see some benchmarks.
Both attached patches need to be applied to make this work.
Comment #49
sunSorry - since I only later found out that I could also auto-add files to CVS and invoked that unconditionally, field_sql_storage.registry.inc was mistakenly added to the .info file.
New patch without that. However, lessons learned: Except of one module, all modules in core implement registry functions.
Comment #51
catchAll core modules enabled (thanks for the reminder sun).
79 files included.
Front page and node/n are the same for file includes. We make the default files loaded in HEAD c. 800k smaller. This equates to just over 500k memory savings.
Very small increase in requests per second - might be margin of error, certainly not less.
HEAD:
2.07 MB total file size
Memory used at: devel_boot()=4.56 MB, devel_shutdown()=26.85 MB, PHP peak usage=27.75 MB.
3.48 [#/sec]
Patch:
1.99 MB total file size.
Memory used at: devel_boot()=5.14 MB, devel_shutdown()=26.32 MB, PHP peak usage=27.25 MB.
3.51 [#/sec]
I also did a quick poll in #drupal, asking people to use PQP to get no. files + total file size for some D6 sites with lots of contrib:
Michelle's site:
366 files 23.93mb
beeradb's site:
28.04mb, 224 files. 106 modules
agentrickard's site:
20.94 MB 180 files. This is 82 modules.
Moonshine's site:
site with 91 active modules (+ devel and pqp) 16.15MB 264 files
If we're dealing with c.2mb with just core, the multiplying that by 10 or 15 means we could end up with savings of around 5-7.5MB on sites with lots of contrib. Additionally, we're only removing a small percentage of core (800kb out of 2mb), whereas in contrib it's likely that hook_menu() and similar are a higher proportion overall (since contrib generally doesn't have files like system.module or common.inc to worry about with a few exceptions).
Comment #52
Crell commentedLet's also note that we're talking about just the registry hooks at the moment. There are plenty of other hooks and functions that could be moved out of .module, too, for further savings. It sounds to me like it's definitely worth it.
Comment #53
catchproper status.
Comment #54
Anonymous (not verified) commentedCatch asked me to post the details of my PQP test to this issue queue.
I'm hitting the home page and I get these results:
418 Total Files
5.48 MB Total Size
I have 155 active modules.
Comment #55
Anonymous (not verified) commentedForm defaults overwrote Crell's changes. oops!
Comment #56
sunThe results of these benchmarks (thanks, catch!) made me a bit disappointed, but also made me "play" and think about the entire challenge.
What I did in the past hours is trial & error only, to get a feeling of the maximum limits we can achieve here.
If you want to skip this evaluation, read on where it says "BREAK". If you don't skip, bear in mind that you need to reverse the logic and meaning of everything that is said after you've read the evaluation.
Now, to test the maximum performance increase, I tried a different approach:
- Enable all core modules + admin_menu + devel + pqp.
- Don't load all modules by default.
- Only load the module that "owns" the active menu item.
- Move hook_boot, hook_init, hook_help, hook_exit into $module.bootstrap.inc files.
So essentially, and this is utopian (!), I wanted to load the bare minimum only. To achieve this, I had to additionally put some drupal_function_exists() resp. module_invoke() [which still seems to be the proper function to me] calls into those $module.bootstrap.inc files and shuffle a few other functions around where required.
To start with, system.module and filter.module (primarily because of
filter_xss_bad_protocol()) are required on all pages. Thus, I couldn't completely removemodule_load_all()in_drupal_bootstrap_full(). I alteredmenu_execute_active_handler()to determine the owning module of a menu router item via preg_match(), since we do not seem to store that info yet.Afterwards, I quickly realized that almost all modules were still loaded on all pages, because of *_alter hooks. So, we are loading many modules for hooks that test whether they may apply, but most often, they do not. Conclusion: Move them, so we load those, but not the modules.
That made an improvement, but one thing we really have to be careful about when doing benchmarks is that there must not be any PHP notice/warning during the request. Otherwise, watchdog() along with error handling is triggered, which invokes and loads at least dblog.module (more, if all core modules are enabled).
However, some modules were still being loaded without any obvious reason. To identify from where and why they were loaded, I put the following directly into .module files:
That uncovered that most modules were loaded because of
- arbitrary info hooks, such as hook_node_info() or hook_field_info()
- hook_elements()
I originally assumed that those would be cached already (because they end in _info() ?), but they are not. To eliminate those:
Obviously, ^^ that's wrong. Should be $module.registry.inc in reality (if they were cached).
Again, this approach is not feasible, because it requires too many drupal_function_exists() calls to load files for required functions.
Effectively, we want to reverse what I did here.
BREAK
The maximum performance gain with all core modules being enabled (but not really loaded) is:
Note that 12 MB is a magic number. IIRC, shared hosts use that default PHP memory_limit setting.
Crell mentioned a master plan in IRC that I wasn't aware of yet, but makes perfectly sense:
So, effectively, the stuff that remains in my .module files is cruft we can move into includes. Everything else is more often needed. This is where you need to reverse the thinking if you read the above.
What we want in .module files is:
- hook_boot()
- hook_init()
- hook_exit()
- hook_help()
- hook_*alter()
What we still don't want to have in .module files is:
- hook_elements(); we can cache that info.
- hook_*info(); should be cached already, probably a bug.
Those should live in $module.registry.inc instead.
As of now, we still have a few exceptions:
- system.module: ships with many essential functions, but also many that are not needed on regular pages (e.g. trigger/actions)
- filter.module: is a bloat of its own. I don't want to release D7 without cleaning up that mess. The entire module should not have to be loaded on most pages.
- Locale module is like system.module. Note that Translation module is not.
Additionally, in terms of file size, we unconditionally load:
- common.inc: 143 KB
- theme.inc: 67 KB
- menu.inc: 90 KB
- file.inc: 68 KB
- form.inc: 107 KB (that would be a very easy one)
as well as (which don't make sense to me, since they're infrequently needed):
- pager.inc: 17 KB
- tablesort.inc: 9 KB
- mail.inc: 17 KB
- actions.inc: 12 KB
If you do the math, then you'll see that we're needlessly loading 530 KB on all pages that may only need 10% of that stuff. So there is another 25% performance increase (in addition to the 30%) only waiting for us. (On that note, core-committer note: We introduced this fine registry for D7. Let's not release with a half-baked registry. That would make no sense at all. It's about properly introducing features, not deadlines.)
The last paragraph and math becomes more important if you consider Drupal sites that do not have, for example, any publicly facing forms. Ultimately, we have to find a mean usage of functions and hooks throughout all Drupal sites, while not hindering "the more advanced" ones too much.
Overall conclusion: This is what we want to do. Starting with registry hooks and other, rarely used hooks (such as actions). It follows the principle of what I am trying to do in http://drupal.org/project/js for AJAX/AHAH callbacks that only need certain functions and/or data, but in a much cleaner way and for all requests Drupal serves.
I'm appending a list of issues that we badly need to tackle, because they disturb the whole functionality:
- drupal_function_exists() does not use drupal_load() - bad for debugging, very inconsistent.
- hook_node_info(), hook_field_info() is not cached.
- node_invoke('form') is called on all pages for administrative users due to menu_tree_page_data('management').
- hook_forms() is invoked for all forms, loads entire modules, possibly having no effect at all.
- Stuff like node_types_rebuild() does not have to be loaded all the time:
Comment #57
Anonymous (not verified) commentedawesome work sun! i had a similar 'wtf' moment when i realised just how much code drupal loads for every request.
"- hook_init()
- hook_exit()
- hook_help()
- hook_*alter()"
why do we need these hooks in .module files? they're all called after the database is loaded, so their implementations can be loaded via the registry.
i thought the idea was that we'd only leave code that wasn't going to be loaded via the form, theme or module system, but needed be available in .module files.
for modules that don't have any such code, that means no .module file.
the other issue that needs to be tackled is #340723: Make modules and installation profiles only require .info.yml files, which is nastier than it looks, and makes me wish that drupal's system table was thrown away and replaced two new tables - module and theme.
Comment #58
catch@justinrandell: I think the only reason to leave those files in, is to avoid the extra file include - on the assumption that both the .module and the .inc are going to get loaded on most requests anyway. But it's probably a detail we can sort out down the line.
Comment #59
sunTo understand why we want to leave certain, essentials hooks in .module files, I took the time to prepare a readable backtrace of what is being called from where and why.
I intentionally chose a simple, administrative settings page, because that highlights a page that - basically - involves 1 module only. We are loading countless instead. I believe that this is where we should start to look at what should be moved where and why -- and which hook invocations we should simply eliminate (by caching data of _info hooks), and which hooks are essential to all pages (given that one would think that a simple settings page like this shouldn't require them).
Comment #60
Anonymous (not verified) commentedi think we're talking at cross-purposes a bit here, even though we have the same goal.
to restate some assumptions:
1. if a module has code that needs to be loaded unconditionally on every request, it should go in a .module
2. anything that can be loaded by the registry for theme, form or module hooks shouldn't live in .module
3. hopefully, this will mean many modules will have no .module, and the rest will have small .module
Comment #61
sunLet's see what the bot thinks about this.
Also attached:
- registry.php: Fairly completely rewritten to do all of core in one shot. Contains the current proposal for .module splits into .inc files.
- rebuild.php: Required if you want to run registry.php manually and play with it. Rebuilds the registry.
Comment #63
sunWhile I'm trying to get the bot pass: Please have a look at registry.php and the hook => target associations in there. I think they're pretty solid already, so please let me know if I missed something.
Another point up for debate is: registry.php optionally allows us to re-order hooks while moving them. Enabled for a few hook sets already.
Comment #64
mikey_p commentedA few small things (I haven't even got to running tests yet).
The menus are out of place and the navigation menu is showing up with edit links for each menu. See: http://img.skitch.com/20090522-e8a67k6g6485ty69pna8rqpuhw.png
After clearing the cache I still get an extra menu with management and user menu items in the right place: http://img.skitch.com/20090522-kut468frrua4fy89m69ceuuc4h.png
Comment #66
webchickHere's a summary of what is changed by registry.php:
module.registry.inc:
- hook_menu()
- hook_theme()
- hook_flush_caches()
module.actions.inc:
- hook_action_info()
- All action-related functions
- hook_mail()
module.user.inc:
- hook_perm()
- hook_user_after_update()
- hook_user_cancel()
- hook_user_categories()
- hook_user_form()
- hook_user_login()
- hook_user_insert()
- hook_user_operations()
- (various user operation callbacks)
- hook_user_register()
- hook_user_submit()
- hook_user_update()
- hook_user_timezone() # system only
- hook_user_validate()
module.block.inc
- hook_block_list()
- hook_block_configure()
- hook_block_save()
module.node.inc
- hook_node_access_records()
- hook_node_delete()
- hook_node_delete_revision()
- hook_node_insert()
- hook_node_prepare()
- hook_node_presave()
- hook_node_type()
- hook_node_update()
- hook_node_validate()
module.comment.inc
- hook_comment_delete()
- hook_comment_insert()
- hook_comment_publish()
- hook_comment_unpublish()
- hook_comment_update()
- hook_comment_validate()
- hook_comment_view()
module.taxonomy.inc
- hook_taxonomy() // @todo Lonely left over, should be (and is) hook_taxonomy_vocabulary_delete() instead.
- hook_taxonomy_term_insert()
- hook_taxonomy_term_update()
- hook_taxonomy_term_delete()
- hook_taxonomy_vocabulary_insert()
- hook_taxonomy_vocabulary_update()
module.search.inc
- hook_node_update_index()
- hook_ranking()
- hook_search()
- hook_search_preprocess()
- hook_search_page()
- hook_update_index()
module.cron.inc
- hook_cron()
module.xmlrpc.inc
- hook_xmlrpc()
- (various Blog API functions specific to blogapi.module)
module.form.inc
- hook_form_FORM_ID_alter()
- (NOT hook_form_alter())
- Do manually afterwards: All form handler functions, including _validate(), _submit(), _after_build(), etc.
In general, this is what is NOT moved:
- hook_init(), hook_boot(), hook_exit()
- hook_X_load()
- hook_X_view()
- API functions that are called directly, such as node_load() or user_cancel().
- hook_link(), required for rendering nodes and comments
- hook_help()
- hook_page_alter(), hook_js_alter(), hook_*_alter(), usually invoked on all pages.
- Everything else that is invoked on most pages.
- hook_*_info() - we need to made them cached. When this is done, they will be moved to module.registry.inc.
- hook_elements() - when it will be cached. When this is done, they will be moved to module.registry.inc.
Comment #67
webchickSo my biggest concern with this patch is balancing the performance improvements (btw, let's see some benchmarks here in the issue, please. I saw you post them on IRC...) with the developer experience aspect. What we want is to establish an easily-identifyable pattern so that module developers always know where to put their hooks. If module developers are confused about what goes where, they'll just opt for sticking everything in .module since that's what they've always done and we won't see the performance improvements replicate throughout contrib.
For the most part, this looks pretty straight-forward:
- If it has to do with loading/viewing, leave it in .module.
- If it's an API function called elsewhere, leave it in .module (this was the same rule as D6)
- If it's a X_alter_X function, leave it in .module.
- If it's a "low-level" function called on every page like hook_init(), hook_boot(), and hook_exit(), leave it in .module.
The other ones though are a bit weird. hook_link() is not intuitive. hook_help() is not intuitive. hook_X_info() is really not intuitive, and I'm tempted to mark this as postponed until those are movable to X.registry.inc.
The other thing that worries me is that this list represents A LOT of possible .inc files to keep in your head as a module developer:
It *almost* maps to a "use a .inc file for each module's hooks you're implementing" except that cron, form, and xmlrpc don't map like that. And admin, pages, and registry are more "meta" file names.
Let's spend some time thinking about developer experience and how we might establish some easier patterns/rules for developers that they can apply to any module they write.
Comment #68
sunThose changes to system.module (only the last, specific to it) - we leave out.
Comment #70
sunwebchick was so kind to enlighten me: We have this nice list on http://api.drupal.org/api/group/hooks/7 already (which I didn't realize we have).
Those 7 fails are due to #469768: PASS.. Once that is in, this will pass. Attached is the new registry.php script that skips this new error_test.module.
Further, I realized that - due to the awesome design of the registry - the registry.php script can be applied at any time to an entire Drupal site, it works recursively, and will thereby also auto-convert any contrib modules that may exist. That aside, catch suggest to also commit this script to Deadwood/Coder module to automate this during module upgrades.
Comment #71
sunBetter title.
Also, on the issues webchick raised:
That said, my development box is a bit broken currently, so I can't do proper benchmarks. I hope someone can take that job over.
Also, I'm re-stating the list of issues we need to tackle independently:
Any takers for these? Let me know here or in IRC, and I'll replace/add issue links in this list.
Comment #72
lilou commentedAdd tag.
Comment #73
Crell commentedThoughts in no special order:
I would put hook_actions_info() in the registry.inc file. If it's not registry-style (return an associative array, has an alter hook, gets cached and never called again) then it shouldn't be called _info(). If it is registry-style, then it belongs in $module.registry.inc. I can deal with it being called $module.info.inc, too, if that's the consensus.
$module.pages.inc and $module.admin.inc were established for D6 because that minimized the amount of code we had to load. Those are all page callbacks (and the theme functions for them), or forms that are used as a page callback. On any given page load, one and only one of those should ever be needed except in very strange circumstances. Arguably we could have gone as far as putting every callback in its own file, which would be the absolute minimum amount of code loaded, but we decided that would make the DX WTF factor way too high. We settled on "user-facing pages" and "admin-facing pages" as the split pattern to use, hence the names. I still believe that is a good split to have for those functions. It's based on the common usage pattern of the code, primarily.
Also, putting all forms in one big file I would disagree with. Forms that are used as a page callback (the majority of them) should go in the page callback files, as they are now.
Alter hooks are an odd case. Some will be called on nearly every page, like hook_form_alter() (since most pages have at least a search form). However, hook_form_$form_id_alter() will only be called in very targeted circumstances; that's one key reason it should be encouraged over the generic hook whenever possible.
hook_menu_alter() and hook_theme_registry_alter() (or whatever it's called, it's an odd name) will also only be called on rebuild; the same holds true for any info-style hook's alter hook. Those can go in the $module.registry.inc file, since that's the only time they'd ever be called.
hook_forms() is not a registry-style hook. It's more of a form-mapping-alter hook. It's kinda weird like that. I don't know that we can make it called only on selective pages without changing its semantics so that it's more registry-esque. That's definitely a separate issue from this one.
Haven't we killed hook_help() yet in favor of the advanced_help model? Please? With sugar on top? I never even use hook_help() it's so horridly designed.
I am not really concerned about the performance difference. We know based on past experience from D6 that this has the potential to be a huge difference, even if we do only some of them. Whether it's 19% or 21% difference doesn't really matter; eliminating even just the info hooks from .module files would be a big win. We already have the benchmarks to tell us that. And please, for the love of god let's not postpone this issue while we refactor other hooks. Again, we can get a big boost right now but just splitting off hook_menu(). We don't need to rewrite all of core in one patch if we know the convention we're working toward.
Having an automated split tool in coder/deadwood gets a big +1 from me. It also helps maintain a pattern, which makes it easier to learn where stuff is when looking at a new module.
I've no idea what hook_mail() is doing in $module.actions.inc...
Form _validate() and _submit() functions should be in the same file as the form they relate to, as they will only be needed if the form building function has already been called anyway.
Otherwise I see no major problems with the breakdown that sun's script generates.
I share webchick's concern about DX WTF of having more files. However, we have a choice: We have a very small number of files for developers to keep track of and a huge amount of wasted CPU cycles and memory, or we minimize the CPU cycles and memory Drupal wastes but have more files to juggle. The nice thing about the registry is that you can decide differently in different modules, because there is no forced organization whatsoever. The registry allows you to have one 20,000 line file or 20 1,000 line files or 1,000 20 line files, depending on what makes sense for your module. All three of those will run; they may just not run optimally. In fact I would recommend that developers start a module in all one file, and then break it up later when it's nearing completion.
I also don't think it's quite as bad as webchick suggests it could be; lots of modules have no node hooks. Very very few have comment hooks. Most do not use actions, or cron. There's less than a half-dozen that implement search, I suspect. I believe the most common indrect functions, based on totally informal checking of my brain from working with lots of modules, are page callbacks/page forms (already adequately split out), theme functions (should live alongside the code that calls them, wherever that is), and form_alter hooks. I tend to also use CCK hooks a fair bit when writing formatters, which if they aren't registry-style-cached now damned well should be.
Comment #74
catchJust to reiterate, the baseline performance of Drupal 7 in PHP is considerably worse than it is in Drupal 6, and there's no other way to make this particular 20% gain. So while we should try to find a good pattern, it's easy to move one or two hooks around once the initial patch is in.
hook_link() needs to go and die #451272: Rename or remove or do *something* with hook_link()
hook_help() needs to go and die #299050: Help System - Master Patch
If we release Drupal 7 with those two still existing then it will overshadow the rest of the release. Please let's not postpone this on either of those (if this gets postponed on the help patch, which is postponed on another issue which doesn't even have a patch at the moment, then we're shooting ourselves in the face).
hook_X_info() - yes we should make them consistent/cached. Is there an issue for that yet?
I also agree with Crell that the number of modules implementing all the various hooks here is likely to be extremely small - hence they'll end up with a handful of includes at most. And with an automated script provided it's easy to sort out anyway.
If no one else does I'll try to run benchmarks by the end of tomorrow, but won't have much time today. We need to benchmark with all core modules enabled, with and without APC + check number of files included and memory usage. If it really is 20% then just hitting a page with devel running ought to be enough.
Comment #75
sun#469768: PASS. went in, so this should pass.
Btw, #310467: Slimmer hook_theme() could eliminate the first 2 hunks in this patch that need to be applied manually. Can I get some reviews/feedback over there?
Note that we still have to solve the Navigation menu issue. I have no clue what happens there, but I can only guess that it must be some code that does not properly use the registry.
Comment #76
sunAlso, one amendment: Since we have this script now and "re-rolling" this patch from scratch takes about 5 minutes, we can discuss a valid pattern for moving hooks and functions pretty lengthy. And I would really like us to do so. The script allows us to think about this split at a higher level before moving any stuff around. Bear in mind that we (more or less) lose CVS history for all functions that are moved. I appreciate that webchick raised this issue. I would like to see more feedback like Crell's - and we can work on fixing all related issues in the meanwhile.
Comment #77
robloachThis is made with a large amount of awesomesauce. I've been pushing some modules in Drupal 6 to use modulename.apimodulename.inc, and this absolutely fixes it for core in Drupal 7. Subscribing and looking forward to seeing the benchmarks. Way to go!
Comment #78
catchBenchmarks on the latest patch, all core modules enabled. 10 nodes on front page with taxonomy terms and path aliases. In all cases restarted apache then hit the front page a few times to ensure caches were fresh.
Notably there's a very small improvement with APC - as long as we don't make things worse with opcode caches I think we're fine there. Without APC it's about 3mb memory saved. Devel says anything up to 30% improvement in page execution times, ab is a bit more conservative at 6-8% - although I only benchmarked node pages for this - we'd probably be loading even less code on user pages for example.
So looks to me like a decent improvement, and closes about 40% of the current gap n PHP performance between D6 and D7.
Comment #79
cburschkaSubscribe.
Summing up, it seems we have three kinds of functions:
- very likely to be called on any page load (eg. hook_form_alter, hook_init)
- only called on a small set of pages (eg. theme_hook, page_callback)
- cached and therefore only called on cache reset (eg. hook_menu)
A pattern that focuses exclusively on performance would put the first kind all into .module. If they're going to be loaded anyway, including a separate code file may cost time instead of saving it. The second kind would be grouped into sets most likely to be called together (eg. hook_form, hook_form_validate, hook_form_submit, theme_hook_form). The third kind would be in one separate file, since most registry rebuilds are run together.
This pattern might not be ideal for DX, however - developers might appreciate a more semantic approach of where to put functions, like $module.forms.inc, or $module.alter.inc or $module.theme.inc.
As Crell stated, we have some tradeoff between semantic refactoring and performance refactoring.
(As a matter of personal taste, I rely on the API function reference and rarely care where functions actually live - also, slow sites have given me far more trouble than APIs. So I tend more toward the "performance" route. But that's opinion.)
Comment #80
sunAs a matter of fact: I was stuck finding functions in HEAD a few times already. You no longer have hook_menu() where you can lookup in which file a function lives. However, the performance increase (by far) outweighs this issue, and we should rather think about helping developers to predict the location of a function (by using a grokable pattern), but also, since we have the registry in place, I think we could add an on-site widget for developers (admin_menu, perhaps) to lookup functions.
Since Arancaytar mixed things a bit up (possibly due to my reversed analysis), I'll try to revamp the proposal.
Comment #81
Crell commentedI'd also lean toward performance over semantic organization, especially for core. Code is executed far more often than it is read by a human.
The difference between devel and ab's reports are, I believe, because they report different things. devel reports the amount of time Drupal is executing (minus a little at the beginning and end); ab reports the time it takes the request to be send, processed, and retrieved. So it's going to be testing a much longer process, and this code refactoring has no impact at all on the HTTP request or response time.
I'm a bit surprised that APC is reporting no memory benefit... but then, APC's memory usage is a sliver of what an uncached site is so as long as we break even on opcode caches we're still coming out ahead. I'd say the benchmarks absolutely conclude that we need to do this.
Comment #82
swentel commentedok, also 10 nodes on front, some taxonomy, 50 users and all core modules except upload which gives a lot of notices. Don't have APC though right now, ab with 100 requests.
/node without patch
3.02 [#/sec]
Page execution time was 327.31 ms. Executed 79 queries in 29.46 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.73 MB, PHP peak usage=11 MB.
/node with patch
3.29 [#/sec]
Page execution time was 286.23 ms. Executed 79 queries in 27.67 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9.51 MB, PHP peak usage=10 MB.
/node/1 without patch
3.31 [#/sec]
Page execution time was 281.03 ms. Executed 45 queries in 27.85 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.56 MB, PHP peak usage=10.75 MB.
/node/1 with patch
3.61 [#/sec]
Page execution time was 261.56 ms. Executed 45 queries in 33.77 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9.35 MB, PHP peak usage=9.75 MB.
/user/1 without patch
3.29 [#/sec]
Page execution time was 271.65 ms. Executed 31 queries in 19.4 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.21 MB, PHP peak usage=10.5 MB
/user/1 with patch
3.88 [#/sec]
Page execution time was 237.97 ms. Executed 31 queries in 17.74 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9 MB, PHP peak usage=9.25 MB.
/admin/settings/site-information without patch
3.53 [#/sec]
Page execution time was 353.78 ms. Executed 29 queries in 67.14 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.91 MB, PHP peak usage=11.25 MB.
/admin/settings/site-information With patch
3.89 [#/sec]
Page execution time was 280.88 ms. Executed 30 queries in 43.02 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9.7 MB, PHP peak usage=10 MB.
Comment #83
catchI agree with Arancaytar and Crell on performance vs. DX. Trying to squeeze cycles out of an overloaded server, memory limit errors - these are terrible developer experience. For that matter, waiting ages for pages to load is bad user experience. Having to grep a folder instead of one file for a function is really, really low in the scale of hard to grok things about Drupal.
Also since the only reason we're doing this is /for/ performance, there's no point doing some weird hybrid version based on what we think might make sense semantically - what makes sense is loading less code.
Comment #84
sunSo far, I really like what I see here on my box.
The pattern gets a bit hairy with Node API, i.e. custom node type hooks... :( I'd consider them all in $module.node.inc, but if you think about it:
- hook_view could also work in $module.pages.inc
- hook_form|_validate|_submit could also work in $module.form.inc
- hook_insert|_update|_delete is what definitely belongs into $module.node.inc
I still prefer $module.node.inc for all. Just wanted to mention that we have one edge case in core. (the only one I was able to determine thus far)
To be continued...
Comment #85
sunEat this.
Comment #87
cburschkaI'd support $module.node.inc. It also sounds better than form.inc and pages.inc from the performance aspect.
When you open a node for editing, you call _load and _form (etc.). When submitting, you call _form again (FAPI rebuilds a form before executing), then _validate, then _view or _submit depending on if you previewed. The page you land on then uses _load and _view and others. It's a group whose members are frquently called together.
I'll patch, test and see if I can sniff out what testbot didn't like.
Comment #88
cburschkaThese are the tests in question:
These are the failures I was able to replicate (35 out of 38):
Comment #89
sunThanks for listing the failing tests, Arancaytar! Let's fix those issues in #471278: Prepare module split and continue the high-level discussion here.
Comment #90
sunNow the patch in #471278: Prepare module split passed (of course). Though field attach/form tests should still fail with the split patch, and I have little idea how to fix them.
I know that this is an unfortunate situation, because #471278: Prepare module split will always pass without the split patch. However, I absolutely do not want to mix any other code changes into this monster patch. IMHO, the .module split patch should be 100% automated, so we can be sure that nothing in those functions is altered. Otherwise, this would mean that someone would have to review those 1.17 MB, which is utopian.
Comment #91
Crell commentedThe D6 page split work was done by hand, module by module. Not the fastest method but it can work if you're doing it piecemeal, which was what I was hoping to do after kick-starting it with the easy registry hooks, since monster patches suck. :-)
Comment #92
dries commentedTo be honest, this patch doesn't sit well with me ... but I'm obviously happy to discuss this more. However, given that my gut has been nagging about this for a while, I figured I'd speak up.
Comment #93
JohnForsythe commentedThoughts:
Use a single .module file for development work.
Create a tool that automates the splitting of the .module file.
Perhaps this splitting happens automatically when you install the module.
Additionally, this splitting could be turned on or off, depending on if you need it. Sort of like disabling css and js aggregation when you're doing development work.
Comment #94
andypostThe most performance is gained if all php-files are joined in one + APC (eAccelerator) it brings near x10 - x20 times
But for common case - shared hosting is much better hold them separated (system.module hold a lot of forms that rarely used - actions, perfomance .... but this one loaded at every request)
ZF use this technique on production - all files joined + op-code cache
here example from russian forum - http://forum.dklab.ru/comments/nablas/49OptimiziruemZagruzkuPhp-kodaV22R...
Comment #95
Crell commentedDries:
Removing rarely used code can have a huge performance boost: http://www.garfieldtech.com/blog/benchmark-page-split
Even with that work in Drupal 6, Drupal 6 is substantially more memory hungry than Drupal 5. My normal web host handled multiple Drupal 5 sites without issue, but keeps choking on Drupal 6 sites unless we move up to their higher-end accounts exclusively for memory issues. When doing a full cache flush, it's not unusual for me to get an out of memory limit at 64 MB with Drupal 6.
Many larger modules have already run into this issue: Views and Panels have their own lazy-load mechanisms with magic file naming (that introduce more developer overhead than the registry does, IMO) and Ubercart keeps blowing out my memory limit, but isn't structured in such a way that it can easily have its own manual split.
The last heavy-duty profiling I am aware of identified two areas of Drupal as the biggest performance sinks, bootstrapping (that is, loading all that code) and the theme system: #196907: GHOP Issue #4: Profiling of D6, viewing the standard page after installation, create a story and view the Admin index (That was after the page split work went in.)
No, we have to acknowledge that all big traffic sites should have APC installed. :-) That doesn't mean they will. And even a not-big-traffic site can have a huge memory footprint simply because they have a lot of large modules installed to support complex functionality. I have built a number of sites that are really not high traffic but because of the functionality required have terrible memory issues, simply due to code weight and large cache arrays. That right there prices Drupal out of an awful lot of web hosts, and makes us uncompetitive in the "cheap hosting" arena; the market that is an easy "gateway drug" to bring more people into Drupal.
The "tremendous performance improvements" from the registry are what we're trying to get in this issue. Of course they've not materialized yet, we've not done the work necessary to get it. :-)
So, the problem: Drupal is simply too damned big. Loading all code on bootstrap is simply not sustainable, and as Drupal grows the problem is only going to continue to get worse. It is already a serious problem for people running large non-Chaos modules (Ubercart is my easy example here). APC is not going to be enabled by default in PHP (the PHP team has been very clear about this, IMO), so we cannot rely on it. "If you want to run Views, Panels, Ubercart, and location then you need a VPS, even if you're not high-traffic" is not an answer.
The solution: Load less code. Really, it's that simple. If we used more classes then we could rely more heavily on PHP's autoload, but there are very good reasons why we do not in most places. So we do it ourselves. Fortunately most of Drupal is called indirectly so that's fairly easy, and has already been implemented. All that's left to do is leverage it.
Look at the benchmarks in #78 again. It's a 6-8% performance boost in overall page delivery time as reported by ab. It is a 30% performance boost in Drupal execution time as reported by devel. There's an enormous amount of time burned in Apache initialization, PHP initialization, HTTP traffic, etc. that we can do next to nothing about here, so that's a red herring. Devel is the measuring tool used for the benchmarks linked at the top of this comment, so the apples to apples comparison is 30% reduction in what we're actually trying to measure. Even if we were only talking 8%, though, we are doing so badly in performance right now that I'd take it. There's also a ~3MB memory savings. And that's just core.
I am not dismissing the DX question. I do, however, believe that it is being over-exaggerated. For one, as has been repeatedly stated there is no mandatory requirement that you break up your module. If we take the registry to its logical conclusion, really-small modules could even collapse their .install files into the .module file and we'd be back to one module, one file. That still works 100% correctly with the registry. You can split up your code later before releasing it, and even break it out along different lines than core does if your use case legitimately calls for it.
Two, it's only a DX problem if you're stupid with it. Has the page split in D6 caused a DX problem? I'm not aware of one, but it was an enormous memory boost. Is moving hook_menu, hook_theme, hook_node_info, and a few other "big-arse arrays" to a separate file, which is then included once in a blue moon, a DX problem? I do not see how.
As I noted in #73, the typical module will likely not have that many hooks in it, so will not have a dozen files. And if we have a common convention for organization that people can learn (which this thread is trying to establish), finding a given function should not be difficult. As a worst case scenario, even the free IDEs have a "jump to function definition" operation. (Yes, not all developers use an IDE. However, I would not be surprised if the IDE using percentage of the developer population is substantially higher than the percentage of sites running an opcode cache. No, I have no data to back that up it's just a gut feeling.)
Modern themes already contain far more files than a module is ever likely to. A half-dozen template files and over a dozen CSS files is normal for the sites we build... and that's because we're very very frugal with template files and rely very heavily on CSS. :-)
Removing the need for manual registry of lazy-loaded code is also a DX boon. I prefer the menu and theme hooks not requiring me to specify a file and path for every entry. I'd love to get rid of the path and parent keys in the views_plugins() hook. That also removes the need for magic file naming, while preserving lazy-load capability.
If we want Drupal to continue to run without APC, then this effort is necessary. No one has come forward with another alternative to reduce code weight... because the only solution to code weight is "load less code", which is exactly what we're doing here.
Comment #96
sunComment #97
Anonymous (not verified) commentedwas writing a reply, but now i mainly just want to say "what Crell said".
memory usage is a big issue - its really the elephant in the room when trying to scale out drupal. the apache children are just so fat on a big drupal site, you need more RAM/boxes to serve a given amount of traffic than just about any other comparable framework/CMS out there. its definitely a big X on the wrong side of the ledger when considering drupal for big sites - all other things being equal (yeah, that's a big assumption), it costs more to run drupal than most of its competitors.
i share Dries concern about how we split the code, but i think we can do it in two stages.
first, get rid of as much of the code that is currently loaded for every single full bootstrap by stripping back or eliminating .module files. we get an immediate memory win for all full bootstraps, and its an extremely simple rule.
second, we profile common paths, find the next low-hanging code-load bloat, and work out the best way to fix that, and wash-rinse-repeat.
Comment #98
moshe weitzman commented@Crell - in order to convince Dries (and others like me who are mostly nodding with him), you have to address his data concern. He said 6-8% improvement under optimal conditions, and 1-2% under real world conditions (not sure if this is page speed or memory).
I'll agree that code registry has so far been disappointing. I think I even said "tremendous performance improvement" when it was under review. The killer patch which made it worthwhile got hung up on some really hard problems - #259412: Total module system revamp. Parts of that patch have been incorporated though so it is not so bad.
The registry adds a lot of complexity to the code, and mostly benefits those sites without an opcode cache. That is to say, it benefits folks who can't possibly care much about performance since they are ignoring a 9x speedup (don't quibble - it is the biggest optimization by far). There is only so much you can do to help the ignorant. Please let's not get into the poverty argument - VPS hosting is so cheap now. Managed hosting is a bit more, but often doable.
Comment #99
andypostand at the third step make joined file with common part of bootstrap-code to load this once not as many files
I glad to try join all core files in one and test but code-registry is mistery for me
Comment #100
catchOn the DX issue, I use vi from terminal for everything. If I'm not sure where a function lives, I do grep -r "function function_name" * from the root of my Drupal install, then vi /foo/bar/bar.module - if it's a huge module like user.module I then have to search within vim for the function itself or scroll down 4000 lines. If instead I can grep -r "function user_node_load" * and go to modules/user/user.node.inc and see all my lovely node related functions right there - that doesn't seem any worse to me at all (not necessarily a lot better, but no worse).
On speaking to sun in irc he suggested we do one more re-roll and benchmark that which makes sense to me. Also if anyone knows how to measure memory usage of apache processes (as opposed to what PHP measures itself), now would be a good time to speak up - I still feel like there should be a small memory gain for APC users if the APC cache itself doesn't have to deal with a lot of unused files.
Also a note that we should move hook_requirements() and probably hook_schema() into their own includes so huge .install dont' get loaded on admin.
Comment #101
sunTwo additional facts that may not be clear to everyone yet:
Comment #102
mikey_p commentedI'll try to run some tests to show Apache memory usage soon. I usually use eAccelerator so I'll run them with and without it enabled.
One additional point (that checking Apache memory usage should show) is that potential memory savings may allow a higher max clients, and ultimately higher concurrency on the same webserver (assuming the database is up to the load).
This patch needs to be evaluated "system as a whole" with more than what
ab2 -c1 -n500will reveal.Comment #103
sunAttached is the latest version of registry.php.
All further fixes that were moved into #471278: Prepare module split have been commented out (and remarked in the script).
Comment #105
kbahey commentedSubscribe.
Comment #106
sunIn the meantime, registry.php contains a list of @todos -- bugs in Drupal core, some of them unrelated to this patch, but only uncovered due to this patch.
Comment #107
catchDid some more testing, with APC only this time, all core modules enabled.
ab -c1 n1000 - benchmarks are the same, but also looked at APC usage and apache processes via top.
I've attached apc.php screenshots - these are from restarting apache, running the benchmarks, looking at apc.php. Obviously as more pages are visited on the site, the patched memory usage is going to increase as the includes get picked up - but if you have files which are very rarely or never used (say actions hooks on some sites) - then it's likely there'd be some long term savings, especially with contrib added in.
To me this looks like about a 1M memory saving with APC, at least in the 'just started up' state. If you double or treble the number of modules installed, then I reckon we're looking at 2-5M per process (given the bulk of code in core isn't contained in hooks, and lot of code in contrib is).
Comment #109
kbahey commentedIf you are running php as mod_php in Apache, you will see the first process to handle a request with inflated memory size, because it is the one that loaded and parsed the initial scripts. Other processes will be smaller because they fetch the cached op codes from shared memory.
A more accurate memory figure would be what the performance module (part of devel) would say.
You can select either summary or detail, in the database, and see a before and after to compare.
Comment #110
bjaspan commentedWrong. I thought that too until I actual tried to verify it. The opcode cache is NOT shared across processes. Every Apache process builds up its own copy. They all end up using the same amount of memory after they've each compiled more or less all the files. I verified this with APC, XCache, and eAccelerator.
Comment #111
catchIf that's the case, then judging by my tests, I think we'll see small memory improvements in APC with this even after apache processes have been running for a while - because dead or very rarely used code isn't going to be loaded at all in the case of individual processes.
Comment #112
bjaspan commentedYes, I think that's true: we'll see small memory usage improvements since very rarely used code won't "ever" be loaded (Apache processes die after a max # of requests so "ever" is a reasonable approximation).
The fact that opcode caches do not share memory like everyone (including me) seems to assume they do does not seem relevant to this issue. Of course, it is extremely relevant to Drupal hosting in general and I sure would like to take the time to dig into the PHP internals to find out why the memory is not being shared...
Comment #113
killes@www.drop.org commentedI am really wondering what advantages this patch would bring me as a person hosting a Drupal site:
1) the idea is to use less memory per process
2) yet, I have to cater for it to use the full amount of memory because if an admin works the site, all code still can be included.
3) From that follows that I can't use php.ini to restrict the available memory because admins shouldn't get WSODs
4) The only use case that I can see is that you force admins to e.g. use admin.mysite.com and normal users stay on mysite.com and configure both sites differently in apache/ php.ini.
5) Isn't that just a fringe use case?
Comment #114
Crell commented1) Less memory and less CPU.
2) Not true. As of Drupal 6, Drupal never loads its entire code base at once. Page callback files are loaded individually as needed and there is no instance in which they are all loaded at once. This patch extends that same logic to other callbacks and hooks. Unless you go well out of your way, there should never be a case when all of the code available to Drupal is called.
3) As #2 is not true, this statement does not logically follow.
4) This statement does not apply in the slightest as #2 is not true and neither is #3.
5) Every single Drupal 6 and later site in existence has a hook_menu, hook_theme, hook_node_info, and various other "call once and don't call again" hooks. That is, at minimum, more code that we do not need to load on most page loads on every single Drupal 7 site if we simply leverage the registry properly. Hardly fringe use case.
6) Even if there are still a few pages that peak at high memory usage, reducing the footprint of a typical page load improves performance and allows more responses to be handled by a single web server. That's still a net win even if we still spike to 64 MB on the modules page, for instance. (Although that page would benefit from this split.)
Comment #115
killes@www.drop.org commentedLarry, you look at the individual page request. I am looking at an apache proess over time.
Comment #116
gerhard killesreiter commentedLarry told me that he considers the non-opcode case.
I don't think we should optimize for the non-opcode case.
Comment #117
owen barton commentedSubscribe
Comment #118
sunI suggest to create a new issue that reverts the page callback split from D6 and moves all functions back into .module files.
Comment #119
webchickEr. What?
Comment #120
sunWhat's so unclear? Dries, moshe, as well as killes don't see any benefit in this split.
If we do not split, then the existing split does not make any sense, too.
And if we do not split, then the registry can be reverted.
The good news: Makes many issues in the queue obsolete.
Comment #121
webchickThen that simply means more work needs to be done to either sway their opinions, or come up with a compromise. Cold, hard data like benchmarks helps. Then we can get out of "what if" land and into "this is what actually happens" land.
Also, on a more personal note... sun, I don't understand this ugly anger from you the last day or so. It feels like you are shutting down discussion everywhere you go, and yet this very discussion is the critical piece for our peer review process to function. If there's something specific bothering you, I'm on IRC all hours that I'm awake, and always open for a chat.
Comment #122
Crell commentedDries, moshe, and killes are wrong. We have benchmarks to prove it. They are linked earlier in this issue.
Ignoring the non-opcode cache case is probably the absolute best way to ensure that Drupal becomes a niche product ignored by the majority of web developers.
Comment #123
sunContrary to your usual habits, just saying that they are wrong and closing comments won't work this time.
...
Comment #124
catchWell I agree with sun here:
The non-APC benchmarks already on this issue show decent memory and page execution savings - apparently not enough of an improvement though for Dries to think it's worth doing the split (and to be honest I wasn't particulary impressed by the results I got in testing - although seems some people got slightly better results than me).
The APC improvements are at best marginal, and may well not make any measurable difference to memory usage on real sites at all.
If we only care about optimizing of the opcode case, then there's no point in doing this, or the registry, or the page split - so we should revert all those changes and just have .module files. If we care about the non-opcode case, but aren't convinced enough by the data provided that it's a sufficient gain to make the change, then we should still revert the registry and the page split.
@webchick there's plenty of benchmarks already on this issue, from different people with different configurations - what other benchmarks do we need?
Comment #125
andypostAs I pointed in #94 there's a big performance degradation if modules splited in small peaces and opcode cache used - mostly caused by memory fragmentation.
So I think better focus on producing 2-tarballs -one joined into .modules only (for opcode) and another for non-opcode is splited as it now.
Comment #126
CalebD commentedUnfortunately it seems like optimizing for one case directly impacts and degrades performance on the other case. There's a lot to factor in here.
We can go forward with the split which will decrease performance in environments that use an opcode cache. It will also save on some memory. How much memory is saved comes down to how much of Drupal's code is loaded over the average lifespan of an Apache child process. Looks like the benchmarks above are placing it at a couple to several megs per process (depends on contrib)
However, going forward with the split will definitely help cases where an opcode cache isn't being used. This can most often be seen in situations where PHP is being invoked through mod_suphp or mod_fastcgi (or straight CGI if you're crazy :P). Personally, the host I'm on has PHP 4 and PHP 5 side-by-side using mod_php for PHP 4 and mod_suphp for PHP 5. Since PHP 4 is a piece of crap and more and more modules require PHP 5, the site I manage is using mod_suphp and would stand to benefit from the split.
I also use mod_suphp on my personal dev server because it reduces the annoyances of having to dance around permissions.
From the developer experience perspective, I'm definitely fine with splitting modules apart as it creates additional organization, but that's just a personal preference.
Running with andypost's idea, in an ideal world we'd be able to do both. I'm not very familiar with the packaging system that goes into creating module tarballs, but it seems like it would be possible to have a module be distributed as it is today but with the addition of a modulename.optimized or something. It would basically just be all the code containing module files concatenated into one file and would be loaded on systems that are using an opcode cache (which we'd have to detect).
That said, I'm just thinking big here. I'm not in-tune enough with the various core systems (registry, module loading routines, etc.) to know if that's possible.
Comment #127
sunImproving opcode cache case as well via registry: I'm not sure whether it would work, but technically, the registry makes Drupal know of all its own code and functions. If it's true that opcode caches work faster with one huge LOB instead of many included files, then I could imagine that production sites - where code rarely changes - could use something like this:
Comment #128
Crell commentedThat depends greatly on the code being run. I don't read Russian so I don't know what their test code is.
There is a cost of a disk hit for every file loaded, true. However, OS level caching ameliorates a great deal of that on modern systems. That also has to be balanced against the time spend loading code that you don't need.
Opcode cache performance is also heavily dependent on whether you use static vs. dynamic includes. Every time Rasmus gives a presentation he whines about people using conditional includes because it doesn't play nice with opcode caches. Well, every single include in Drupal save the bootstrap include in index.php is already dynamic. We're never going to be properly optimized for opcode caches as long as Drupal is a flexible, pluggable system. (And I certainly hope that never changes. :-) )
All of the benchmarks we've run on this actual patch, in all its various iterations, say the same thing:
1) It does help non-opcode cache situations by a notable amount. How notable varies a bit, but it's always considerable.
2) It doesn't really help or hurt opcode caches one way or another.
So on the performance front, I really don't see any argument against moving forward.
Which leaves only the DX question, which I addressed in #95. Short version: It's not the issue people seem to think, and given that some IDEs choke on system.module, content.module, and other multi-thousand-line files (and my brain certainly does) I can see a sensible and predicable organization of code as a DX plus, not a minus.
We can argue all day about whether people *should* run opcode caches on a VPS (although some VPSes do not support opcode caches), and whether people *should* hire a separate sysadmin to run their VPS (many VPSes are unmanaged, which is not acceptable for a great many people). That doesn't change the fact that the vast majority of sites *don't*. A personal blog that wants a calendar and a panels-based layout can break a shared host at this point. I'm running all my D6 sites on 64 MB memory now. This is not acceptable. We either address it or cede the shared hosting market to other projects.
Comment #129
Crell commentedHm, massive cross-posting. Even if we wanted to have a "compiled" version of Drupal for opcode caches, we would still need a sensible split of code in the first place. That makes this patch a prerequisite for that.
Comment #130
kbahey commentedWhatever we do, we can't leave the non-op-code cache case untended to. We have been progressively using more memory. Until recently we could run core in the default 8MB of php. Now 16MB is required. Add a few of the popular contrib and that is not enough.
If we want Drupal to continue to grow, we can't forget about shared hosting.
Personally, I stopped using shared hosting years ago, and all of our clients are on dedicated and VPS, and all have op code caches. But we still need to strike a balance between that and shared hosts.
Op code caches (e.g. APC) still issue lstat() for each file to load, and the split hurts here because there are more files to check, even if the compiled code is in memory. Run strace on a php process at some point to see that in action. Or ping me offline and I will fetch the output for you.
The registry and code split still benefits the shared hosting case, by reducing memory consumption, e.g for non admin pages. So it is not a mutually exclusive case of either/or.
Going forward, I like the idea of "compiling Drupal" into a big BLOB that makes op-code caches happier. Sun's snippet above are promising. It can also be a script that does that. chx had something to that effect earlier.
This way, we keep shared hosting sort of under control, while those who have op code cache can go further with some extra steps.
Comment #131
JohnForsythe commentedTo reprise what I wrote in #93,
It seems like this splitting of module files could be automated, and selectively turned on or off. Include comments in the code that indicate where a split point should occur, and the name of the file to split it to.
Then, on installation, if splitting is enabled, the file can be automatically parsed for split points, and split out into the required parts.
Splitting for those who need it, no splitting for those who don't.
Comment #132
kbahey commented@JohnForsythe
I like that! And it makes writing modules simpler, we all still write everything in a single .module. (but perhaps less readable, because of the sheer size of each file?)
Challenge is that a new hook_menu() will have to be created by the splitting script, to say which callback is in which file.
But all in all seems doable.
Comment #133
Crell commentedAn automated "split compiler" will never get things right in all cases, and when it doesn't you're on your own to figure out how to do it manually. Which of course makes your system impossible to upgrade. That also puts the hard work of reorganizing code onto the people who don't have an opcode cache... who are going to look at Drupal and say "WTF? I need to run a compile script on this command line thingie before I can use it? Let me go try out that new WordPress 2.8 instead..."
Also, I hold that a single 5000 line file is worse DX than 5 1000 line files.
An automated "merge compiler" is an order of magnitude easier to write, since it's just concatenating files and stripping a few PHP open tags. It also then puts the extra step of work onto just those people that are running an opcode cache, and are running a high-enough traffic site that it actually matters (since even without a massive merge, an opcode cache is faster than without), who are almost by definition going to be able to run a merge script to "compile" Drupal before running it and then mess with their .htaccess file.
There are a number of projects that have a "merge everything" compile step as an option. I don't know a single project that does it backward.
But before we have a merge script, we need to have Drupal split up sanely so that it will run on shared hosts at all. Now can we please stop re-debating the same issue all over again? We had this discussion a year ago when the registry first went in. We already committed to this approach. We already have benchmarks to say that it either helps or does nothing to hurt. Anything we can do later to further optimize opcode caches is just gravy, and belongs in another issue.
Comment #134
Nick Lewis commented+1 for for moving hooks that are only called when new things are installed, and when cache is rebuilt. Double plus one. As for anything that goes further, don't make me recite a quote about premature optimization. Hardware is a lot cheeper than drupal devs.
Comment #135
andypostLet's continue this issue with splitting for shared hosts.
Suppose we should file another issue for merge.
Now #459980: Rework admin/settings/performance so we can put merge functionality into this settings or leave for contrib
Comment #136
sunReminder: Properly splitting the code with the registry - and thereby removing all hard-coded filepath inclusions - is the only way to apply any kind of dynamic merging (or splitting) later on.
Comment #137
catchI thought more about sun's comment in #118, and I've opened an issue to examine what the consequences of that would actually look like, whether there's other alternatives etc. #497118: Remove the registry (for functions). fwiw despite the fact I came up with some reasonable arguments on the issue, overall I'd rather see this one marked fixed. What I don't want to see is this one in limbo like it is now without a wider discussion of why we ever committed the registry in the first place if we're not going to use it as it was intended.
Comment #138
smk-ka commentedCiting Crell:
I would agree, if it were a logical split, ie. from an OO point of view it would make totally sense. However, the proposed approach splits modules at rather arbitrary points. In #101 sun wrote we "move all CUD in CRUD to include files". From a developer's POV, this *is* arbitrary. As much as I am for better performance, IMO this approach will make DX worse.
Comment #139
sun#497118: Remove the registry (for functions) is about to happen.
Comment #140
wismoyo commentedregistry-split.patch queued for re-testing.
Comment #141
axyjo commentedResetting status.