This is a parallel issue to #340723: Make modules and installation profiles only require .info.yml files, and comes from the same discussion.

Despite the size of this patch, all it does is move all implementations of hook_theme and hook_menu to a $module.registry.inc file. These are "registry style" hooks; that is, they get called once in a blue moon, cached or saved to the database, and then never used again except during development or reinstalling a module or such. They are completely wasted bits. By moving them to a separate file, we no longer waste bits on every single page load. And thanks to the registry, all we have to do is tell the info file about the new code file and the rest is handled magically.

The structure here was already discussed with webchick, and she said she was OK with $module.registry.inc as a name format so that's what I used. Although each hook is fairly small, the total size spread out across all modules a typical site will have is non-small. It's also the "low-hanging fruit" toward making .module files nearly empty in most cases, which makes the bootstrap process enormously cheaper.

This is exactly what the D7 registry was written to enable. :-)

There are no doubt other hooks that can be easily moved out to this or other files, but this is a good first start. Baby steps...

I'm marking this as critical because it is 100% guaranteed to collide with something simply due to its size, so the sooner it gets in the better.

I hope the bot doesn't have a problem with it. If it doesn't, then there's really not much else to comment on as this is just moving code around in ways that have already been discussed. (*prays that he doesn't have to reroll this behemoth too many times.)

Comments

Crell’s picture

As support for why we're doing this, I point people toward the benchmarks for the limited splitting we did in Drupal 6: http://www.garfieldtech.com/blog/benchmark-page-split

This is the natural evolution of that work, via the registry.

Status: Needs review » Needs work

The last submitted patch failed testing.

swentel’s picture

Yeah, go Crell! :) Subscribing to test this patch.

swentel’s picture

Note: install completely fails, wsod, looking into it now.

swentel’s picture

StatusFileSize
new224.15 KB

This patch should make the testbot happy - at least for install - hope I didn't forgot any module.
Update still gives wsod, that's for later this day.

Edit: including menu.registry seemed to be needed because otherwhise, your initial menu will look very weird :)

catch’s picture

Nice. Will run some benchmarks later.

swentel’s picture

Status: Needs work » Needs review

Changing status to get results from bot.

Crell’s picture

I knew I was going to forget one. Thanks, swentel.

moshe weitzman’s picture

Lovely. This can be justified just by code cleanliness. Any benchmarked speed bonus is nice but not a prerequisite IMO.

Anonymous’s picture

subscribe.

Crell’s picture

Enough with the subscribing. Someone mark this RTBC before it breaks. :-)

swentel’s picture

@crell update.php still gives a wsod, I'll try to look into it today - but if someone else catches it, be my guest :)

catch’s picture

Status: Needs review » Needs work

Marking to needs work for update.php

swentel’s picture

Ok, did some investigation, _theme_build_registry in theme.maintenance.inc returns no theming functions at all with this patch. I'm off for two hours now, I have some time later, again, that shouldn't stop anyone else of course :)

pancho’s picture

Status: Needs work » Needs review

Sorry, I didn't catch the update.php bug. :( But I see another point:

Do we really need menu.registry.inc in install.php?
I threw it out and my install worked well.

If there should be a reason to include it though, the $module_list array keys should be consistent:

+  $module_list['menu_registry']['filename'] = 'modules/menu/menu.registry.inc';
+  drupal_load('module', 'menu_registry');

instead of:

+  $module_list['menu']['filename'] = 'modules/menu/menu.registry.inc';
+  drupal_load('module', 'menu');
pancho’s picture

Status: Needs review » Needs work

Didn't mean to change status.

swentel’s picture

Status: Needs work » Needs review
StatusFileSize
new224.83 KB

@pancho: I made the code more consistent. If you leave menu out, the install will be ok, but links to Main menu, Navigation, Secondary menu will be in your root navigation tree.

Pach attached fixes update.php and also a small typo in profile.info.

pancho’s picture

Cool! You caught that little bastard! :)
However, wouldn't it be cleaner to add require_once DRUPAL_ROOT . '/modules/system/system.registry.inc'; to update.php (line 650) instead of having include_once DRUPAL_ROOT . '/modules/system/system.registry.inc'; in _module_implements_maintenance()?
We obviously need it only for update.php and not in other cases of maintenance mode. It seems to work here.

Concerning menu.registry.inc in install.php:
I'm not sure if I got what you mean, but on my freshly installed site, everyting seems to be fine, though I had menu_registry thrown out of install.php. Maybe you or someone else can check that?

swentel’s picture

StatusFileSize
new11.95 KB
new224.79 KB

Moving include to update.php, makes sense and still works :)
Someone else should test play with leaving out menu.registry from install.php and see what happens.
There is a png attachment which shows what I get.

damien tournoud’s picture

+  drupal_load('module', 'system_registry');
+  drupal_load('module', 'menu_registry');

Hum. We have system_registry and a menu_registry modules, now?

+require_once DRUPAL_ROOT . '/modules/system/system.registry.inc';

This needs a comment at the minimum. But a discussion would be better. Why not moving the theme function defined in system_theme() somewhere else? For example: automatically add them in theme.inc, where it would make full sense? The less we keep of that absurd system module, the better.

@pancho: I made the code more consistent. If you leave menu out, the install will be ok, but links to Main menu, Navigation, Secondary menu will be in your root navigation tree.

This looks like a bug in the module installation code.

damien tournoud’s picture

@pancho: I made the code more consistent. If you leave menu out, the install will be ok, but links to Main menu, Navigation, Secondary menu will be in your root navigation tree.

This looks like a bug in the module installation code.

Ok, this is linked to the fact that module_implements() has a special case in maintenance mode in which it doesn't use the registry.

The special case for the menu module is just wrong: this can potentially break *any* module hook_enable(), including contrib ones if installed in an installation profile.

What we would need is to change the if (defined('MAINTENANCE_MODE')) check in module_implements() to something like if (defined('MAINTENANCE_MODE') && drupal_get_bootstrap_phase() < DRUPAL_BOOTSTRAP_FULL), and insure that the registry is fully built as soon as we enter the BOOTSTRAP_FULL mode (ie. we will probably need to add a registry_rebuild() in drupal_system_install().

swentel’s picture

Simply adding following line in install.php on line 37 also fixes install, anyway, do install & update need more than the theming functions defined in system_theme() ?

  require_once DRUPAL_ROOT . '/modules/system/system.registry.inc';
damien tournoud’s picture

@swentel: In #21, I was discussing the menu.registry.inc case.

Crell’s picture

As long as the registry is active theme functions should be able to live anywhere, shouldn't they? We only need to special case theme functions that are needed pre-database. I'd actually prefer to not see more stuff in theme.inc, as that gets loaded and parsed on every page load.

recidive’s picture

Those two pseudo modules system_registry and menu_registry look like hacks IMO. Are they really needed?

Also // $Id: $ should be // $Id$ on new files.

Other than that, the patch looks nice.

Status: Needs review » Needs work

The last submitted patch failed testing.

dropcube’s picture

Also would be good to rename those "registry style" hooks following this pattern: {module}_register_{items}. For example:

{module}_register_blocks() for blocks
{module}_register_menu_items() for menu items
{module}_register_permissions() for permissions
{module}_register_themes() for theme hooks
...

this would add consistent name conventions and would differentiate those "registry style" hooks from other hooks.

Issue: #357321: Rename hook_permission() into hook_permission_info() for consistency

Crell’s picture

There's already a building standard, IMO, for using the _info() suffix for such hooks. I proposed that on the dev list months ago, and the only disagreement was on just how much caching was appropriate for such hooks. I'll follow-up on the other thread momentarily, but for this thread I was very very specifically JUST trying to move code around, since I figured that would be easier. Apparently it isn't quite that easy thanks to the silliness that is the installer. Someone really should get back to figuring that out...

Anonymous’s picture

StatusFileSize
new504 bytes

in discussion of catch's performance talk, bjaspan asked whether using an opcode cache would mean that we wouldn't gain any real memory reduction from the registry.

we need to test this, because memory reduction is one of the justifications for the registry and for this patch. if we don't get any/a very small gain, we need to know this.

attached is a simple bash script to generate some files to be included, and two files that load them. one loads them all, one half. put this in a webroot in your dev environment and run it to get a simple way to test loading lots of files.

i've been testing these scripts with ab, and watching memory usage of apache processes with and without APC, and even with APC there is a large growth in the memory size of apache child processes. this would seem to indicate that reducing the number of files loaded will be a gain even with an opcode cache, but i'm sure we can refine this test.

bjaspan’s picture

subscribe

Anonymous’s picture

Issue tags: +Registry

tagging with Registry

sun’s picture

This is one of those patches we will likely get much faster in when we

1) start with one prototype implementation

2) get that committed

3) roll sub-sequent patches (in this issue) until all of core is converted (perhaps 2-3 modules at a time)

Otherwise, this beast of a patch needs to be _carefully_ re-rolled all over the time, ensuring that no other commit changed any of those hook implementations.

Crell’s picture

I believe the only issue was with system.module, so I'll agree with sun in #32. Time for module-at-a-time patches. Although like DBTNG and the original menu split I'd prefer to do them in separate issues rather than all in one issue. Sounds to me like a new "community initiative" page. :-) Or maybe just a tag...

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new33.21 KB

So let's start with the key module.

- Added hook_perm(). Equally invoked on two pages only (admin and admin/user/permissions).

- Only added manual inclusion of system.registry.inc in install.php and update.php. Tested installation manually and could not replicate that previously mentioned menu error (was most likely a bug caused somewhere else).

sun’s picture

- system.registry.inc needs to be manually included, because install.php and update.php rely on the path 'batch', and we can't rely on the registry during installation and upgrades.

- FWIW, by appending drupal_set_message('IN'); to system.registry.inc you will see that the registry works and the message is only output on pages that invoke one of the registry hooks. (oh wow! ;)

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Hm, well, that was easy... botsnack!

I'm not entirely convinced that hook_perm() is appropriate as a registry hook, but I'll defer to Dries and webchick on that one. (Registry or info hook are called and cached somehow, which is different than "just kinda rare".) It's not like we can't move it again later, right? :-)

Let's get this in before someone else touches those hooks, then we can move on to other modules.

sun’s picture

Before someone assumes this could be set back to needs work... ;)

I'm not sure either, but I tend to prefer #34 though.

sun’s picture

chx’s picture

Status: Reviewed & tested by the community » Needs review

We really need to benchmark the memory usage. Shared hosts and dedicated servers alike use a code cache so ...if we make the life of a code cache harder then that's not good.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Via devel. There may be better ways to do this.

To me it looks like it makes 0 difference with APC. Without APC there's a very small consistent memory improvement on the front page.

On admin/build/modules the memory usage varies between requests and I can't get useful information - included below though.

Given this is just a couple of hooks the numbers seem about right to me. I think we need to proceed with this, and then do this again (or ideally something a bit more robust if someone knows how) after 5-6 patches have landed.

Restarted apache before and after applying the patch (and obviously to switch APC on and off).

With APC - identical results:
HEAD and patch - front page:
Memory used at: devel_boot()=1.5 MB, devel_shutdown()=5.75 MB, PHP peak usage=6.5 MB.

HEAD and patch on /admin/build/modules:
Memory used at: devel_boot()=1.64 MB, devel_shutdown()=12.09 MB, PHP peak usage=20 MB.

Without APC:

Front page:
HEAD:
61 files included
devel:Memory used at: devel_boot()=3.45 MB, devel_shutdown()=19.96 MB, PHP peak usage=20.75 MB.
PQP: 20.93MB

Patch:
61 files included
Memory used at: devel_boot()=3.45 MB, devel_shutdown()=19.86 MB, PHP peak usage=20.75 MB.
PQP:  20.83MB


/admin/build/modules - not measurable at this point  since it's all over the place:
HEAD:
Memory used at: devel_boot()=3.48 MB, devel_shutdown()=27.94 MB (up to 29.18 MB), PHP peak usage=36.75 MB.
64 files included

Patch:
65 files included
Memory used at: devel_boot()=3.48 MB, devel_shutdown()=27.76 MB, PHP peak usage=35.75 MB.
Memory used at: devel_boot()=3.48 MB, devel_shutdown()=27.93 MB, PHP peak usage=36.5 MB.
dries’s picture

Status: Reviewed & tested by the community » Needs work

To me, this looks like a micro-optimization. It would be great if we could illustrate the memory usage improvement before adding new rules and complexity.

catch’s picture

Status: Needs work » Needs review

Dries. It's a micro-optimisation because we're only moving system_menu() and system_theme() for an initial patch. As far as I can see this saves us in the region of 100k of memory when APC isn't enabled.

Total file size parsed (via PQP) is 1.59MB vs. 1.57Mb with the patch applied.

Both PQP and devel report about 100k memory saved.

If we trust those numbers, then we're getting 100kb of memory saving for 20kb of code moved and it ought to add up very quickly. The patch tested is 30k, Crell's first patch on this issue was 225k. Looks to me like just by moving hook_menu() and hook_theme() for core modules we'd save in the region of 3-500kb. With 50-100 contrib modules installed (assuming they all conformed to the pattern), then that easily begins to add up to several MB of memory, and just by moving hook_menu() and hook_theme().

There's also the trigger patch which sun posted, hook_perm() a few posts above, and probably most CUD and some other hooks could be moved before we start to see too much trade off in terms of file inclusion.

The only way to test this is to try it. Either we roll and commit 5-10 similar patches quickly so that they can be maintained and rolled back individually, or someone will have to roll another monster patch, and we hope it doesn't break in time to test it and commit it. I'll try to test either, but I'm not volunteering to roll the 250k patch.

If we don't commit this, or it turns out not to have these gains, then I think we need to reconsider the registry entirely. The complexity is there, not in cut and pasting code around.

dries’s picture

You're right. I reconsider my position. :)

catch’s picture

Status: Needs review » Reviewed & tested by the community

OK so if you're happy committing this. I'll do a similar test once we've committed 4-5 similar patches. Then we'll 1. find out if those numbers were right in the first place 2. see if it actually adds up to anything worth having.

4-5 patches should be easy to revert if it doesn't look good, but it's probably enough to be comfortable going ahead with more conversions if we can get better verification of the gains.

sun’s picture

Assigned: Crell » sun
Status: Reviewed & tested by the community » Needs work
Crell’s picture

Assigned: sun » Crell
Status: Needs work » Reviewed & tested by the community

Just a reminder that I did benchmark the Drupal 6 page split work last year: http://www.garfieldtech.com/blog/benchmark-page-split

This issue is exactly the same as that one; we're just moving code around to make it conditionally included. For those who didn't read the article above, moving page callbacks and many theme callbacks out of the .module file got us a 20% performance boost in D6 compared to had we not done so, and earlier benchmarks in the issue queue showed no impact on opcode-cache-using sites either way. If we can slice off our big definition hooks, that can easily get us another 20% boost, I suspect. If we go even farther and move, say, node hooks off to separate files, we can get even more of a boost.

One of the GHOP projects I mentored last year did a performance analysis of Drupal core, and it identified our two most expensive phases as the theme system (complex and lots of stack calls) and bootstrap (most of it being in the phase where we load all .module files). Code weight is our easiest optimization point right now, and we desperately need optimization points. :-)

I suppose Dries doesn't need any more convincing after #42, but I just wanted to make it clear that this *has* been benchmarked before, repeatedly, and IMO has been clearly shown to be an "easy win" for performance.

sun’s picture

I reconsidered. I want to make it one shot. But only if you let me... :P

sun’s picture

Assigned: Crell » sun
Status: Reviewed & tested by the community » Needs review
StatusFileSize
new3.01 KB
new1.24 KB
new275.45 KB

There are things that should only be done once. ;)

Constantly re-rolling something that can be done by a machine is not the type of job we should have to deal with.

Put attached registry.php into the root directory of D7 and invoke it, i.e. http://yoursite/registry.php. Afterwards create a patch.

Now I want to see some benchmarks.

Both attached patches need to be applied to make this work.

sun’s picture

StatusFileSize
new274.82 KB

Sorry - since I only later found out that I could also auto-add files to CVS and invoked that unconditionally, field_sql_storage.registry.inc was mistakenly added to the .info file.

New patch without that. However, lessons learned: Except of one module, all modules in core implement registry functions.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

All core modules enabled (thanks for the reminder sun).

79 files included.

Front page and node/n are the same for file includes. We make the default files loaded in HEAD c. 800k smaller. This equates to just over 500k memory savings.

Very small increase in requests per second - might be margin of error, certainly not less.

HEAD:
2.07 MB total file size
Memory used at: devel_boot()=4.56 MB, devel_shutdown()=26.85 MB, PHP peak usage=27.75 MB.
3.48 [#/sec]

Patch:
1.99 MB total file size.
Memory used at: devel_boot()=5.14 MB, devel_shutdown()=26.32 MB, PHP peak usage=27.25 MB.
3.51 [#/sec]

I also did a quick poll in #drupal, asking people to use PQP to get no. files + total file size for some D6 sites with lots of contrib:

Michelle's site:
366 files 23.93mb

beeradb's site:
28.04mb, 224 files. 106 modules

agentrickard's site:
20.94 MB 180 files. This is 82 modules.

Moonshine's site:
site with 91 active modules (+ devel and pqp) 16.15MB 264 files

If we're dealing with c.2mb with just core, the multiplying that by 10 or 15 means we could end up with savings of around 5-7.5MB on sites with lots of contrib. Additionally, we're only removing a small percentage of core (800kb out of 2mb), whereas in contrib it's likely that hook_menu() and similar are a higher proportion overall (since contrib generally doesn't have files like system.module or common.inc to worry about with a few exceptions).

Crell’s picture

Let's also note that we're talking about just the registry hooks at the moment. There are plenty of other hooks and functions that could be moved out of .module, too, for further savings. It sounds to me like it's definitely worth it.

catch’s picture

Status: Needs review » Needs work

proper status.

Anonymous’s picture

Status: Needs work » Needs review

Catch asked me to post the details of my PQP test to this issue queue.

I'm hitting the home page and I get these results:

418 Total Files
5.48 MB Total Size

I have 155 active modules.

Anonymous’s picture

Status: Needs review » Needs work

Form defaults overwrote Crell's changes. oops!

sun’s picture

The results of these benchmarks (thanks, catch!) made me a bit disappointed, but also made me "play" and think about the entire challenge.

What I did in the past hours is trial & error only, to get a feeling of the maximum limits we can achieve here.

If you want to skip this evaluation, read on where it says "BREAK". If you don't skip, bear in mind that you need to reverse the logic and meaning of everything that is said after you've read the evaluation.

Now, to test the maximum performance increase, I tried a different approach:

- Enable all core modules + admin_menu + devel + pqp.
- Don't load all modules by default.
- Only load the module that "owns" the active menu item.
- Move hook_boot, hook_init, hook_help, hook_exit into $module.bootstrap.inc files.

So essentially, and this is utopian (!), I wanted to load the bare minimum only. To achieve this, I had to additionally put some drupal_function_exists() resp. module_invoke() [which still seems to be the proper function to me] calls into those $module.bootstrap.inc files and shuffle a few other functions around where required.

$preg_hooks = array(
  'boot',
  'init',
  'help',
  'exit',
);
$filename_target = '@name.bootstrap.inc';

To start with, system.module and filter.module (primarily because of filter_xss_bad_protocol()) are required on all pages. Thus, I couldn't completely remove module_load_all() in _drupal_bootstrap_full(). I altered menu_execute_active_handler() to determine the owning module of a menu router item via preg_match(), since we do not seem to store that info yet.

Afterwards, I quickly realized that almost all modules were still loaded on all pages, because of *_alter hooks. So, we are loading many modules for hooks that test whether they may apply, but most often, they do not. Conclusion: Move them, so we load those, but not the modules.

$preg_hooks = array(
  '\w+_alter',
);
$filename_target = '@name.alter.inc';

That made an improvement, but one thing we really have to be careful about when doing benchmarks is that there must not be any PHP notice/warning during the request. Otherwise, watchdog() along with error handling is triggered, which invokes and loads at least dblog.module (more, if all core modules are enabled).

However, some modules were still being loaded without any obvious reason. To identify from where and why they were loaded, I put the following directly into .module files:

echo "<pre>"; var_dump(debug_backtrace()); echo "</pre>\n";

That uncovered that most modules were loaded because of

- arbitrary info hooks, such as hook_node_info() or hook_field_info()
- hook_elements()

I originally assumed that those would be cached already (because they end in _info() ?), but they are not. To eliminate those:

$preg_hooks = array(
  '\w+_info',
  'elements',
);
$filename_target = '@name.bootstrap.inc';

Obviously, ^^ that's wrong. Should be $module.registry.inc in reality (if they were cached).

Again, this approach is not feasible, because it requires too many drupal_function_exists() calls to load files for required functions.

Effectively, we want to reverse what I did here.

BREAK

The maximum performance gain with all core modules being enabled (but not really loaded) is:

Plain HEAD:  2.09 MB Total size, 894.180 ms Execution Time, 17.02 MB Memory Used, 75 Files
Hacked HEAD: 1.55 MB Total size, 603.790 ms Execution Time, 11.95 MB Memory Used, 80 Files

Note that 12 MB is a magic number. IIRC, shared hosts use that default PHP memory_limit setting.

Crell mentioned a master plan in IRC that I wasn't aware of yet, but makes perfectly sense:

The plan is to continue to load .module files by default, but not require the presence of a .module. Anything that is called indirectly can be moved out of .module... where the optimal place to put it is, that's still up for debate.
Functions that will be called directly.
node_load() should be in node.module.
node_node_load() can be in node.nodes.inc
That means many many many modules won't need a .module file.

So, effectively, the stuff that remains in my .module files is cruft we can move into includes. Everything else is more often needed. This is where you need to reverse the thinking if you read the above.

What we want in .module files is:

- hook_boot()
- hook_init()
- hook_exit()
- hook_help()
- hook_*alter()

What we still don't want to have in .module files is:

- hook_elements(); we can cache that info.
- hook_*info(); should be cached already, probably a bug.

Those should live in $module.registry.inc instead.

As of now, we still have a few exceptions:

- system.module: ships with many essential functions, but also many that are not needed on regular pages (e.g. trigger/actions)

- filter.module: is a bloat of its own. I don't want to release D7 without cleaning up that mess. The entire module should not have to be loaded on most pages.

- Locale module is like system.module. Note that Translation module is not.

Additionally, in terms of file size, we unconditionally load:

- common.inc: 143 KB
- theme.inc: 67 KB
- menu.inc: 90 KB
- file.inc: 68 KB
- form.inc: 107 KB (that would be a very easy one)

as well as (which don't make sense to me, since they're infrequently needed):

- pager.inc: 17 KB
- tablesort.inc: 9 KB
- mail.inc: 17 KB
- actions.inc: 12 KB

If you do the math, then you'll see that we're needlessly loading 530 KB on all pages that may only need 10% of that stuff. So there is another 25% performance increase (in addition to the 30%) only waiting for us. (On that note, core-committer note: We introduced this fine registry for D7. Let's not release with a half-baked registry. That would make no sense at all. It's about properly introducing features, not deadlines.)

The last paragraph and math becomes more important if you consider Drupal sites that do not have, for example, any publicly facing forms. Ultimately, we have to find a mean usage of functions and hooks throughout all Drupal sites, while not hindering "the more advanced" ones too much.

Overall conclusion: This is what we want to do. Starting with registry hooks and other, rarely used hooks (such as actions). It follows the principle of what I am trying to do in http://drupal.org/project/js for AJAX/AHAH callbacks that only need certain functions and/or data, but in a much cleaner way and for all requests Drupal serves.

I'm appending a list of issues that we badly need to tackle, because they disturb the whole functionality:

- drupal_function_exists() does not use drupal_load() - bad for debugging, very inconsistent.
- hook_node_info(), hook_field_info() is not cached.
- node_invoke('form') is called on all pages for administrative users due to menu_tree_page_data('management').
- hook_forms() is invoked for all forms, loads entire modules, possibly having no effect at all.
- Stuff like node_types_rebuild() does not have to be loaded all the time:

@@ -4242,7 +4244,7 @@ function drupal_flush_all_caches() {
   // Rebuild content types, menu will be rebuilt as well.
-  node_types_rebuild();
+  module_invoke('node', 'types_rebuild');
Anonymous’s picture

awesome work sun! i had a similar 'wtf' moment when i realised just how much code drupal loads for every request.

"- hook_init()
- hook_exit()
- hook_help()
- hook_*alter()"

why do we need these hooks in .module files? they're all called after the database is loaded, so their implementations can be loaded via the registry.

i thought the idea was that we'd only leave code that wasn't going to be loaded via the form, theme or module system, but needed be available in .module files.

for modules that don't have any such code, that means no .module file.

the other issue that needs to be tackled is #340723: Make modules and installation profiles only require .info.yml files, which is nastier than it looks, and makes me wish that drupal's system table was thrown away and replaced two new tables - module and theme.

catch’s picture

@justinrandell: I think the only reason to leave those files in, is to avoid the extra file include - on the assumption that both the .module and the .inc are going to get loaded on most requests anyway. But it's probably a detail we can sort out down the line.

sun’s picture

StatusFileSize
new3.8 KB

To understand why we want to leave certain, essentials hooks in .module files, I took the time to prepare a readable backtrace of what is being called from where and why.

I intentionally chose a simple, administrative settings page, because that highlights a page that - basically - involves 1 module only. We are loading countless instead. I believe that this is where we should start to look at what should be moved where and why -- and which hook invocations we should simply eliminate (by caching data of _info hooks), and which hooks are essential to all pages (given that one would think that a simple settings page like this shouldn't require them).

Anonymous’s picture

i think we're talking at cross-purposes a bit here, even though we have the same goal.

to restate some assumptions:

1. if a module has code that needs to be loaded unconditionally on every request, it should go in a .module
2. anything that can be loaded by the registry for theme, form or module hooks shouldn't live in .module
3. hopefully, this will mean many modules will have no .module, and the rest will have small .module

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new661.33 KB
new466 bytes
new7.73 KB

Let's see what the bot thinks about this.

Also attached:

- registry.php: Fairly completely rewritten to do all of core in one shot. Contains the current proposal for .module splits into .inc files.

- rebuild.php: Required if you want to run registry.php manually and play with it. Rebuilds the registry.

Status: Needs review » Needs work

The last submitted patch failed testing.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new664.79 KB

While I'm trying to get the bot pass: Please have a look at registry.php and the hook => target associations in there. I think they're pretty solid already, so please let me know if I missed something.

Another point up for debate is: registry.php optionally allows us to re-order hooks while moving them. Enabled for a few hook sets already.

mikey_p’s picture

A few small things (I haven't even got to running tests yet).

The menus are out of place and the navigation menu is showing up with edit links for each menu. See: http://img.skitch.com/20090522-e8a67k6g6485ty69pna8rqpuhw.png

After clearing the cache I still get an extra menu with management and user menu items in the right place: http://img.skitch.com/20090522-kut468frrua4fy89m69ceuuc4h.png

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

Here's a summary of what is changed by registry.php:

module.registry.inc:
- hook_menu()
- hook_theme()
- hook_flush_caches()

module.actions.inc:
- hook_action_info()
- All action-related functions
- hook_mail()

module.user.inc:
- hook_perm()
- hook_user_after_update()
- hook_user_cancel()
- hook_user_categories()
- hook_user_form()
- hook_user_login()
- hook_user_insert()
- hook_user_operations()
- (various user operation callbacks)
- hook_user_register()
- hook_user_submit()
- hook_user_update()
- hook_user_timezone() # system only
- hook_user_validate()

module.block.inc
- hook_block_list()
- hook_block_configure()
- hook_block_save()

module.node.inc
- hook_node_access_records()
- hook_node_delete()
- hook_node_delete_revision()
- hook_node_insert()
- hook_node_prepare()
- hook_node_presave()
- hook_node_type()
- hook_node_update()
- hook_node_validate()

module.comment.inc
- hook_comment_delete()
- hook_comment_insert()
- hook_comment_publish()
- hook_comment_unpublish()
- hook_comment_update()
- hook_comment_validate()
- hook_comment_view()

module.taxonomy.inc
- hook_taxonomy() // @todo Lonely left over, should be (and is) hook_taxonomy_vocabulary_delete() instead.
- hook_taxonomy_term_insert()
- hook_taxonomy_term_update()
- hook_taxonomy_term_delete()
- hook_taxonomy_vocabulary_insert()
- hook_taxonomy_vocabulary_update()

module.search.inc
- hook_node_update_index()
- hook_ranking()
- hook_search()
- hook_search_preprocess()
- hook_search_page()
- hook_update_index()

module.cron.inc
- hook_cron()

module.xmlrpc.inc
- hook_xmlrpc()
- (various Blog API functions specific to blogapi.module)

module.form.inc
- hook_form_FORM_ID_alter()
- (NOT hook_form_alter())
- Do manually afterwards: All form handler functions, including _validate(), _submit(), _after_build(), etc.

In general, this is what is NOT moved:

- hook_init(), hook_boot(), hook_exit()
- hook_X_load()
- hook_X_view()
- API functions that are called directly, such as node_load() or user_cancel().
- hook_link(), required for rendering nodes and comments
- hook_help()
- hook_page_alter(), hook_js_alter(), hook_*_alter(), usually invoked on all pages.
- Everything else that is invoked on most pages.
- hook_*_info() - we need to made them cached. When this is done, they will be moved to module.registry.inc.
- hook_elements() - when it will be cached. When this is done, they will be moved to module.registry.inc.

webchick’s picture

So my biggest concern with this patch is balancing the performance improvements (btw, let's see some benchmarks here in the issue, please. I saw you post them on IRC...) with the developer experience aspect. What we want is to establish an easily-identifyable pattern so that module developers always know where to put their hooks. If module developers are confused about what goes where, they'll just opt for sticking everything in .module since that's what they've always done and we won't see the performance improvements replicate throughout contrib.

For the most part, this looks pretty straight-forward:

- If it has to do with loading/viewing, leave it in .module.
- If it's an API function called elsewhere, leave it in .module (this was the same rule as D6)
- If it's a X_alter_X function, leave it in .module.
- If it's a "low-level" function called on every page like hook_init(), hook_boot(), and hook_exit(), leave it in .module.

The other ones though are a bit weird. hook_link() is not intuitive. hook_help() is not intuitive. hook_X_info() is really not intuitive, and I'm tempted to mark this as postponed until those are movable to X.registry.inc.

The other thing that worries me is that this list represents A LOT of possible .inc files to keep in your head as a module developer:

  1. actions
  2. admin
  3. block
  4. cron
  5. form
  6. node
  7. pages
  8. registry
  9. search
  10. user
  11. taxonomy
  12. xmlrpc

It *almost* maps to a "use a .inc file for each module's hooks you're implementing" except that cron, form, and xmlrpc don't map like that. And admin, pages, and registry are more "meta" file names.

Let's spend some time thinking about developer experience and how we might establish some easier patterns/rules for developers that they can apply to any module they write.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new660.07 KB

Those changes to system.module (only the last, specific to it) - we leave out.

Status: Needs review » Needs work

The last submitted patch failed testing.

sun’s picture

StatusFileSize
new7.56 KB

webchick was so kind to enlighten me: We have this nice list on http://api.drupal.org/api/group/hooks/7 already (which I didn't realize we have).

Those 7 fails are due to #469768: PASS.. Once that is in, this will pass. Attached is the new registry.php script that skips this new error_test.module.

Further, I realized that - due to the awesome design of the registry - the registry.php script can be applied at any time to an entire Drupal site, it works recursively, and will thereby also auto-convert any contrib modules that may exist. That aside, catch suggest to also commit this script to Deadwood/Coder module to automate this during module upgrades.

sun’s picture

Title: Move registry hooks away » Performance: Split .module files by moving hooks

Better title.

Also, on the issues webchick raised:

  • It is important that we take these optimizations serious. This is not micro-optimization - I'm expecting a performance increase of approx. 20% by this patch. We simply load too much code on all requests without needing it at all.

    That said, my development box is a bit broken currently, so I can't do proper benchmarks. I hope someone can take that job over.

  • Often asked: What do others do? - Exactly the same. I would say that many developers coming from phpBB, vBulletin, Joomla and others are already used to code that's splitted into functionality.
  • hook_link() belongs to loading/viewing/rendering of something, so I think it matches the pattern for leaving it in .module files.
  • hook_help() is a bit hard. If Help module is disabled on sites, then we load many code on those sites that is not used at all. OTOH, if it is enabled, then we are more or less doubling the amount of include files loaded on all pages. That's a very tough issue.
  • $module.registry.inc might not be a good name at all. $module.info.inc might be easier to grasp - less "meta", because actually, all hooks in there should really be suffixed with _info().
  • includes/form.inc, btw, is the second largest include file we load (after common.inc). I wonder whether we really need everything in there on all requests.
  • After working on this, $module.pages.inc and $module.admin.inc do not look right to me. Because they both serve the same purpose, but in different areas. $module.pages-admin.inc would be more logical (and also group them nicely together). In reality, those most often map to menu callbacks, so I think to continue the logic, they would have to be called $module.menu.inc and $module.menu-admin.inc. That way, menu handlers would also be more in line with $module.form.inc, which equally contains form handlers.
  • This is not the end. As stated in #56, we can do a lot more after this patch. Specifically, and in correlation with #469698: Establish naming convention for form IDs, we should move all form handler functions into $module.form.inc. Form API uses drupal_function_exists() for all handlers already, so those can effectively live whereever we want (but just not in the .module file).

Also, I'm re-stating the list of issues we need to tackle independently:

  • drupal_function_exists() does not use drupal_load() - bad for debugging, very inconsistent.
  • hook_node_info(), hook_field_info() is not cached.
  • node_invoke('form') is called on all pages for administrative users due to menu_tree_page_data('management').
  • hook_forms() could be moved into $module.form.inc, but is invoked for system_theme_form, and thus, loaded on all pages, which makes no sense at all.
  • node_types_rebuild() and related functions should live either in node.registry.inc or in node.node.inc.

Any takers for these? Let me know here or in IRC, and I'll replace/add issue links in this list.

lilou’s picture

Issue tags: +Performance

Add tag.

Crell’s picture

Thoughts in no special order:

I would put hook_actions_info() in the registry.inc file. If it's not registry-style (return an associative array, has an alter hook, gets cached and never called again) then it shouldn't be called _info(). If it is registry-style, then it belongs in $module.registry.inc. I can deal with it being called $module.info.inc, too, if that's the consensus.

$module.pages.inc and $module.admin.inc were established for D6 because that minimized the amount of code we had to load. Those are all page callbacks (and the theme functions for them), or forms that are used as a page callback. On any given page load, one and only one of those should ever be needed except in very strange circumstances. Arguably we could have gone as far as putting every callback in its own file, which would be the absolute minimum amount of code loaded, but we decided that would make the DX WTF factor way too high. We settled on "user-facing pages" and "admin-facing pages" as the split pattern to use, hence the names. I still believe that is a good split to have for those functions. It's based on the common usage pattern of the code, primarily.

Also, putting all forms in one big file I would disagree with. Forms that are used as a page callback (the majority of them) should go in the page callback files, as they are now.

Alter hooks are an odd case. Some will be called on nearly every page, like hook_form_alter() (since most pages have at least a search form). However, hook_form_$form_id_alter() will only be called in very targeted circumstances; that's one key reason it should be encouraged over the generic hook whenever possible.

hook_menu_alter() and hook_theme_registry_alter() (or whatever it's called, it's an odd name) will also only be called on rebuild; the same holds true for any info-style hook's alter hook. Those can go in the $module.registry.inc file, since that's the only time they'd ever be called.

hook_forms() is not a registry-style hook. It's more of a form-mapping-alter hook. It's kinda weird like that. I don't know that we can make it called only on selective pages without changing its semantics so that it's more registry-esque. That's definitely a separate issue from this one.

Haven't we killed hook_help() yet in favor of the advanced_help model? Please? With sugar on top? I never even use hook_help() it's so horridly designed.

I am not really concerned about the performance difference. We know based on past experience from D6 that this has the potential to be a huge difference, even if we do only some of them. Whether it's 19% or 21% difference doesn't really matter; eliminating even just the info hooks from .module files would be a big win. We already have the benchmarks to tell us that. And please, for the love of god let's not postpone this issue while we refactor other hooks. Again, we can get a big boost right now but just splitting off hook_menu(). We don't need to rewrite all of core in one patch if we know the convention we're working toward.

Having an automated split tool in coder/deadwood gets a big +1 from me. It also helps maintain a pattern, which makes it easier to learn where stuff is when looking at a new module.

I've no idea what hook_mail() is doing in $module.actions.inc...

Form _validate() and _submit() functions should be in the same file as the form they relate to, as they will only be needed if the form building function has already been called anyway.

Otherwise I see no major problems with the breakdown that sun's script generates.

I share webchick's concern about DX WTF of having more files. However, we have a choice: We have a very small number of files for developers to keep track of and a huge amount of wasted CPU cycles and memory, or we minimize the CPU cycles and memory Drupal wastes but have more files to juggle. The nice thing about the registry is that you can decide differently in different modules, because there is no forced organization whatsoever. The registry allows you to have one 20,000 line file or 20 1,000 line files or 1,000 20 line files, depending on what makes sense for your module. All three of those will run; they may just not run optimally. In fact I would recommend that developers start a module in all one file, and then break it up later when it's nearing completion.

I also don't think it's quite as bad as webchick suggests it could be; lots of modules have no node hooks. Very very few have comment hooks. Most do not use actions, or cron. There's less than a half-dozen that implement search, I suspect. I believe the most common indrect functions, based on totally informal checking of my brain from working with lots of modules, are page callbacks/page forms (already adequately split out), theme functions (should live alongside the code that calls them, wherever that is), and form_alter hooks. I tend to also use CCK hooks a fair bit when writing formatters, which if they aren't registry-style-cached now damned well should be.

catch’s picture

Just to reiterate, the baseline performance of Drupal 7 in PHP is considerably worse than it is in Drupal 6, and there's no other way to make this particular 20% gain. So while we should try to find a good pattern, it's easy to move one or two hooks around once the initial patch is in.

hook_link() needs to go and die #451272: Rename or remove or do *something* with hook_link()

hook_help() needs to go and die #299050: Help System - Master Patch

If we release Drupal 7 with those two still existing then it will overshadow the rest of the release. Please let's not postpone this on either of those (if this gets postponed on the help patch, which is postponed on another issue which doesn't even have a patch at the moment, then we're shooting ourselves in the face).

hook_X_info() - yes we should make them consistent/cached. Is there an issue for that yet?

I also agree with Crell that the number of modules implementing all the various hooks here is likely to be extremely small - hence they'll end up with a handful of includes at most. And with an automated script provided it's easy to sort out anyway.

If no one else does I'll try to run benchmarks by the end of tomorrow, but won't have much time today. We need to benchmark with all core modules enabled, with and without APC + check number of files included and memory usage. If it really is 20% then just hitting a page with devel running ought to be enough.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new658.09 KB

#469768: PASS. went in, so this should pass.

Btw, #310467: Slimmer hook_theme() could eliminate the first 2 hunks in this patch that need to be applied manually. Can I get some reviews/feedback over there?

Note that we still have to solve the Navigation menu issue. I have no clue what happens there, but I can only guess that it must be some code that does not properly use the registry.

sun’s picture

Also, one amendment: Since we have this script now and "re-rolling" this patch from scratch takes about 5 minutes, we can discuss a valid pattern for moving hooks and functions pretty lengthy. And I would really like us to do so. The script allows us to think about this split at a higher level before moving any stuff around. Bear in mind that we (more or less) lose CVS history for all functions that are moved. I appreciate that webchick raised this issue. I would like to see more feedback like Crell's - and we can work on fixing all related issues in the meanwhile.

robloach’s picture

This is made with a large amount of awesomesauce. I've been pushing some modules in Drupal 6 to use modulename.apimodulename.inc, and this absolutely fixes it for core in Drupal 7. Subscribing and looking forward to seeing the benchmarks. Way to go!

catch’s picture

Benchmarks on the latest patch, all core modules enabled. 10 nodes on front page with taxonomy terms and path aliases. In all cases restarted apache then hit the front page a few times to ensure caches were fresh.

Notably there's a very small improvement with APC - as long as we don't make things worse with opcode caches I think we're fine there. Without APC it's about 3mb memory saved. Devel says anything up to 30% improvement in page execution times, ab is a bit more conservative at 6-8% - although I only benchmarked node pages for this - we'd probably be loading even less code on user pages for example.

No APC.
HEAD: 
2.88 [#/sec]
Executed 106 queries in 45 milliseconds.Page execution time was 455.5 ms.
Memory used at: devel_boot()=2.68 MB, devel_shutdown()=25.06 MB, PHP peak usage=25.75 MB.

Patch.
3.06 [#/sec]
Executed 106 queries in 44.25 milliseconds.  Page execution time was 394.02 ms.
Memory used at: devel_boot()=2.67 MB, devel_shutdown()=22.47 MB, PHP peak usage=23.25 MB.

With APC:
HEAD:
5.51 [#/sec]
Executed 106 queries in 56.05 milliseconds.  Page execution time was 241.29 ms.
Memory used at: devel_boot()=0.71 MB, devel_shutdown()=5.76 MB, PHP peak usage=6.5 MB.

Patch.
5.56 [#/sec]
Executed 106 queries in 60.41 milliseconds. Page execution time was 279.38 ms.
Memory used at: devel_boot()=0.75 MB, devel_shutdown()=5.7 MB, PHP peak usage=6.5 MB.

node/1
HEAD
3.53 [#/sec]
Executed 50 queries in 23.01 milliseconds. Page execution time was 311.08 ms.
Memory used at: devel_boot()=2.68 MB, devel_shutdown()=24.57 MB, PHP peak usage=25.25 MB.


Patch:
3.8 [#/sec]
Executed 51 queries in 23.96 milliseconds.  Page execution time was 280.2 m
Memory used at: devel_boot()=2.68 MB, devel_shutdown()=21.98 MB, PHP peak usage=22.75 MB.


To illustrate just how important this is - the Drupal 6 site my D7 benchmarking site is upgraded from:
DRUPAL 6 no APC:
node/1
4.68 [#/sec]
Executed 53 queries in 24.88 milliseconds. Page execution time was 249.94 ms.
Memory used at: devel_init()=2.24 MB, devel_shutdown()=19.42 MB.

So looks to me like a decent improvement, and closes about 40% of the current gap n PHP performance between D6 and D7.

cburschka’s picture

Subscribe.

Summing up, it seems we have three kinds of functions:

- very likely to be called on any page load (eg. hook_form_alter, hook_init)
- only called on a small set of pages (eg. theme_hook, page_callback)
- cached and therefore only called on cache reset (eg. hook_menu)

A pattern that focuses exclusively on performance would put the first kind all into .module. If they're going to be loaded anyway, including a separate code file may cost time instead of saving it. The second kind would be grouped into sets most likely to be called together (eg. hook_form, hook_form_validate, hook_form_submit, theme_hook_form). The third kind would be in one separate file, since most registry rebuilds are run together.

This pattern might not be ideal for DX, however - developers might appreciate a more semantic approach of where to put functions, like $module.forms.inc, or $module.alter.inc or $module.theme.inc.

As Crell stated, we have some tradeoff between semantic refactoring and performance refactoring.

(As a matter of personal taste, I rely on the API function reference and rarely care where functions actually live - also, slow sites have given me far more trouble than APIs. So I tend more toward the "performance" route. But that's opinion.)

sun’s picture

As a matter of fact: I was stuck finding functions in HEAD a few times already. You no longer have hook_menu() where you can lookup in which file a function lives. However, the performance increase (by far) outweighs this issue, and we should rather think about helping developers to predict the location of a function (by using a grokable pattern), but also, since we have the registry in place, I think we could add an on-site widget for developers (admin_menu, perhaps) to lookup functions.

Since Arancaytar mixed things a bit up (possibly due to my reversed analysis), I'll try to revamp the proposal.

Crell’s picture

I'd also lean toward performance over semantic organization, especially for core. Code is executed far more often than it is read by a human.

The difference between devel and ab's reports are, I believe, because they report different things. devel reports the amount of time Drupal is executing (minus a little at the beginning and end); ab reports the time it takes the request to be send, processed, and retrieved. So it's going to be testing a much longer process, and this code refactoring has no impact at all on the HTTP request or response time.

I'm a bit surprised that APC is reporting no memory benefit... but then, APC's memory usage is a sliver of what an uncached site is so as long as we break even on opcode caches we're still coming out ahead. I'd say the benchmarks absolutely conclude that we need to do this.

swentel’s picture

ok, also 10 nodes on front, some taxonomy, 50 users and all core modules except upload which gives a lot of notices. Don't have APC though right now, ab with 100 requests.

/node without patch
3.02 [#/sec]
Page execution time was 327.31 ms. Executed 79 queries in 29.46 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.73 MB, PHP peak usage=11 MB.

/node with patch
3.29 [#/sec]
Page execution time was 286.23 ms. Executed 79 queries in 27.67 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9.51 MB, PHP peak usage=10 MB.

/node/1 without patch
3.31 [#/sec]
Page execution time was 281.03 ms. Executed 45 queries in 27.85 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.56 MB, PHP peak usage=10.75 MB.

/node/1 with patch
3.61 [#/sec]
Page execution time was 261.56 ms. Executed 45 queries in 33.77 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9.35 MB, PHP peak usage=9.75 MB.

/user/1 without patch
3.29 [#/sec]
Page execution time was 271.65 ms. Executed 31 queries in 19.4 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.21 MB, PHP peak usage=10.5 MB

/user/1 with patch
3.88 [#/sec]
Page execution time was 237.97 ms. Executed 31 queries in 17.74 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9 MB, PHP peak usage=9.25 MB.

/admin/settings/site-information without patch
3.53 [#/sec]
Page execution time was 353.78 ms. Executed 29 queries in 67.14 milliseconds.
Memory used at: devel_boot()=1.3 MB, devel_shutdown()=10.91 MB, PHP peak usage=11.25 MB.

/admin/settings/site-information With patch
3.89 [#/sec]
Page execution time was 280.88 ms. Executed 30 queries in 43.02 milliseconds.
Memory used at: devel_boot()=1.31 MB, devel_shutdown()=9.7 MB, PHP peak usage=10 MB.

catch’s picture

I agree with Arancaytar and Crell on performance vs. DX. Trying to squeeze cycles out of an overloaded server, memory limit errors - these are terrible developer experience. For that matter, waiting ages for pages to load is bad user experience. Having to grep a folder instead of one file for a function is really, really low in the scale of hard to grok things about Drupal.

Also since the only reason we're doing this is /for/ performance, there's no point doing some weird hybrid version based on what we think might make sense semantically - what makes sense is loading less code.

sun’s picture

So far, I really like what I see here on my box.

The pattern gets a bit hairy with Node API, i.e. custom node type hooks... :( I'd consider them all in $module.node.inc, but if you think about it:

- hook_view could also work in $module.pages.inc

- hook_form|_validate|_submit could also work in $module.form.inc

- hook_insert|_update|_delete is what definitely belongs into $module.node.inc

I still prefer $module.node.inc for all. Just wanted to mention that we have one edge case in core. (the only one I was able to determine thus far)

To be continued...

sun’s picture

StatusFileSize
new1.17 MB

Eat this.

Status: Needs review » Needs work

The last submitted patch failed testing.

cburschka’s picture

I'd support $module.node.inc. It also sounds better than form.inc and pages.inc from the performance aspect.

When you open a node for editing, you call _load and _form (etc.). When submitting, you call _form again (FAPI rebuilds a form before executing), then _validate, then _view or _submit depending on if you previewed. The page you land on then uses _load and _view and others. It's a group whose members are frquently called together.

I'll patch, test and see if I can sniff out what testbot didn't like.

cburschka’s picture

These are the tests in question:

Advanced search form	32	1	0
Bike shed	17	1	0
Block availability	50	3	0
Block functionality	144	3	0
Field form tests	114	21	2
Trigger content (node) actions	423	9	6

These are the failures I was able to replicate (35 out of 38):

Fail       Other      field.test                     975    Entity was created
Fail       Other      field.test                     1064    The response is an object
Fail       Other      field.test                     1065    Response status is true
Fail       Browser    field.test                     1034    Parsed page successfully.
Fail       Browser    field.test                     1034    Widget 0 is displayed and has the right value
Fail       Browser    field.test                     1035    Parsed page successfully.
Fail       Browser    field.test                     1035    Widget 0 has the right weight
Fail       Browser    field.test                     1034    Parsed page successfully.
Fail       Browser    field.test                     1034    Widget 1 is displayed and has the right value
Fail       Browser    field.test                     1035    Parsed page successfully.
Fail       Browser    field.test                     1035    Widget 1 has the right weight
Fail       Browser    field.test                     1034    Parsed page successfully.
Fail       Browser    field.test                     1034    Widget 2 is displayed and has the right value
Fail       Browser    field.test                     1035    Parsed page successfully.
Fail       Browser    field.test                     1035    Widget 2 has the right weight
Fail       Other      field.test                     1039    Widgets are displayed in the correct order
Fail       Browser    field.test                     1040    Parsed page successfully.
Fail       Browser    field.test                     1040    New widget is displayed
Fail       Browser    field.test                     1041    Parsed page successfully.
Fail       Browser    field.test                     1041    New widget has the right weight
Fail       Browser    field.test                     1042    Parsed page successfully.
Fail       Other      search.test                    286    Article node is not found with POST query and type:article.
Fail       Other      search.test                    223    Help text is displayed when search returns no results.
Fail       Other      search.test                    419    "Your search yielded no results" found
Fail       Other      search.test                    423    "Your search yielded no results" found
Fail       Other      search.test                    429    "Your search yielded no results" found
Fail       Other      trigger.test                   45    Make sure the page has actually been created
Fail       Other      trigger.test                   48    Make sure the publish post action fired.
Fail       Other      trigger.test                   45    Make sure the page has actually been created
Fail       Other      trigger.test                   45    Make sure the page has actually been created
Fail       Other      trigger.test                   48    Make sure the make post sticky action fired.
Fail       Other      trigger.test                   45    Make sure the page has actually been created
Fail       Other      trigger.test                   45    Make sure the page has actually been created
Fail       Other      trigger.test                   48    Make sure the promote post to front page action fired.
Fail       Other      trigger.test                   45    Make sure the page has actually been created
sun’s picture

Thanks for listing the failing tests, Arancaytar! Let's fix those issues in #471278: Prepare module split and continue the high-level discussion here.

sun’s picture

Now the patch in #471278: Prepare module split passed (of course). Though field attach/form tests should still fail with the split patch, and I have little idea how to fix them.

I know that this is an unfortunate situation, because #471278: Prepare module split will always pass without the split patch. However, I absolutely do not want to mix any other code changes into this monster patch. IMHO, the .module split patch should be 100% automated, so we can be sure that nothing in those functions is altered. Otherwise, this would mean that someone would have to review those 1.17 MB, which is utopian.

Crell’s picture

The D6 page split work was done by hand, module by module. Not the fastest method but it can work if you're doing it piecemeal, which was what I was hoping to do after kick-starting it with the easy registry hooks, since monster patches suck. :-)

dries’s picture

To be honest, this patch doesn't sit well with me ... but I'm obviously happy to discuss this more. However, given that my gut has been nagging about this for a while, I figured I'd speak up.

  1. We only see a 6-8% improvement in optimal conditions. You only get that if you don't have APC enabled and when you can't cache the page. So all things considered, for most real-life scenarios, it is probably only going to speed up the site by 1-2% across page views. Plus, we have to acknowledge that all big traffic sites have APC enabled. Unlike settings up reverse proxies and whatnot, enabling APC is easy.
  2. When we worked on the registry patch, we were promised "tremendous performance improvements" but so far these have not materialized yet. I don't have proof for this but if you subtract the extra overhead _introduced_ by the registry from the 6-8% gain, this might be a null operation. See #325665: Cache registry lookups and #298600: DBTNG: make module_implements work regardless of bootstrap phase for examples of overhead introduced by the registry. I realize my comment might tick people off, but I do think we need to be critical about the extra complexity (being) introduced by the registry, and weigh that with the gains. I can't help but feel like we introduced complexity, and now we're scrambling to truly take advantage of it. Yes, I think extensive splitting is scrambling (but normal intuitive splitting is not) ...
  3. This patch takes an incremental step to more splitting, but it sounds like follow-up patches would split things even more. Personally, I disagree that performance is more important than developer experience in this case. I work on core almost every day, and I don't like the idea of having to deal with lots of files and scattered functions -- frankly, it makes my Drupal life less fun. It's not like it is a bad API hidden in dark corner that only frustrates me once in a while, no, this is going to affect all my Drupal work. There is a sane limit to how you should break a module's functions. My personal preference is 1 files for small modules, 2-3 files for medium sized modules, and 3-6 files for large modules. The proposal in this issue seems to take things beyond the sane limits, and it might only be a start. Some of the DX comments in this issue seem to brush this off as minor but I disagree. I'll be affected by this every day and I want Drupal work to be fun and intuitive.
  4. To be frank, there might be more important performance patches to work on. Some of the other performance patches can give us a bigger bang for our buck, and they do not degrade the developer experience.
JohnForsythe’s picture

Thoughts:

Use a single .module file for development work.
Create a tool that automates the splitting of the .module file.

Perhaps this splitting happens automatically when you install the module.

Additionally, this splitting could be turned on or off, depending on if you need it. Sort of like disabling css and js aggregation when you're doing development work.

andypost’s picture

The most performance is gained if all php-files are joined in one + APC (eAccelerator) it brings near x10 - x20 times

But for common case - shared hosting is much better hold them separated (system.module hold a lot of forms that rarely used - actions, perfomance .... but this one loaded at every request)

ZF use this technique on production - all files joined + op-code cache

here example from russian forum - http://forum.dklab.ru/comments/nablas/49OptimiziruemZagruzkuPhp-kodaV22R...

# ab -n 100 -c 10

Clean PHP :
- separate files loaded:
Requests per second: 3.69 [#/sec] (mean)

- joined 5-Mb:
Requests per second: 4.85 [#/sec] (mean)

xCache
- separate files loaded:
Requests per second: 9.41 [#/sec] (mean)

- joined 5-Mb:
Requests per second: 34.54 [#/sec] (mean)

eAccelerator
- separate files loaded:
Requests per second: 14.04 [#/sec] (mean)

- joined 5-Mb:
Requests per second: 42.86 [#/sec] (mean)

Crell’s picture

Dries:

Removing rarely used code can have a huge performance boost: http://www.garfieldtech.com/blog/benchmark-page-split

Even with that work in Drupal 6, Drupal 6 is substantially more memory hungry than Drupal 5. My normal web host handled multiple Drupal 5 sites without issue, but keeps choking on Drupal 6 sites unless we move up to their higher-end accounts exclusively for memory issues. When doing a full cache flush, it's not unusual for me to get an out of memory limit at 64 MB with Drupal 6.

Many larger modules have already run into this issue: Views and Panels have their own lazy-load mechanisms with magic file naming (that introduce more developer overhead than the registry does, IMO) and Ubercart keeps blowing out my memory limit, but isn't structured in such a way that it can easily have its own manual split.

The last heavy-duty profiling I am aware of identified two areas of Drupal as the biggest performance sinks, bootstrapping (that is, loading all that code) and the theme system: #196907: GHOP Issue #4: Profiling of D6, viewing the standard page after installation, create a story and view the Admin index (That was after the page split work went in.)

Plus, we have to acknowledge that all big traffic sites have APC enabled. Unlike settings up reverse proxies and whatnot, enabling APC is easy.

No, we have to acknowledge that all big traffic sites should have APC installed. :-) That doesn't mean they will. And even a not-big-traffic site can have a huge memory footprint simply because they have a lot of large modules installed to support complex functionality. I have built a number of sites that are really not high traffic but because of the functionality required have terrible memory issues, simply due to code weight and large cache arrays. That right there prices Drupal out of an awful lot of web hosts, and makes us uncompetitive in the "cheap hosting" arena; the market that is an easy "gateway drug" to bring more people into Drupal.

The "tremendous performance improvements" from the registry are what we're trying to get in this issue. Of course they've not materialized yet, we've not done the work necessary to get it. :-)

So, the problem: Drupal is simply too damned big. Loading all code on bootstrap is simply not sustainable, and as Drupal grows the problem is only going to continue to get worse. It is already a serious problem for people running large non-Chaos modules (Ubercart is my easy example here). APC is not going to be enabled by default in PHP (the PHP team has been very clear about this, IMO), so we cannot rely on it. "If you want to run Views, Panels, Ubercart, and location then you need a VPS, even if you're not high-traffic" is not an answer.

The solution: Load less code. Really, it's that simple. If we used more classes then we could rely more heavily on PHP's autoload, but there are very good reasons why we do not in most places. So we do it ourselves. Fortunately most of Drupal is called indirectly so that's fairly easy, and has already been implemented. All that's left to do is leverage it.

Look at the benchmarks in #78 again. It's a 6-8% performance boost in overall page delivery time as reported by ab. It is a 30% performance boost in Drupal execution time as reported by devel. There's an enormous amount of time burned in Apache initialization, PHP initialization, HTTP traffic, etc. that we can do next to nothing about here, so that's a red herring. Devel is the measuring tool used for the benchmarks linked at the top of this comment, so the apples to apples comparison is 30% reduction in what we're actually trying to measure. Even if we were only talking 8%, though, we are doing so badly in performance right now that I'd take it. There's also a ~3MB memory savings. And that's just core.

I am not dismissing the DX question. I do, however, believe that it is being over-exaggerated. For one, as has been repeatedly stated there is no mandatory requirement that you break up your module. If we take the registry to its logical conclusion, really-small modules could even collapse their .install files into the .module file and we'd be back to one module, one file. That still works 100% correctly with the registry. You can split up your code later before releasing it, and even break it out along different lines than core does if your use case legitimately calls for it.

Two, it's only a DX problem if you're stupid with it. Has the page split in D6 caused a DX problem? I'm not aware of one, but it was an enormous memory boost. Is moving hook_menu, hook_theme, hook_node_info, and a few other "big-arse arrays" to a separate file, which is then included once in a blue moon, a DX problem? I do not see how.

As I noted in #73, the typical module will likely not have that many hooks in it, so will not have a dozen files. And if we have a common convention for organization that people can learn (which this thread is trying to establish), finding a given function should not be difficult. As a worst case scenario, even the free IDEs have a "jump to function definition" operation. (Yes, not all developers use an IDE. However, I would not be surprised if the IDE using percentage of the developer population is substantially higher than the percentage of sites running an opcode cache. No, I have no data to back that up it's just a gut feeling.)

Modern themes already contain far more files than a module is ever likely to. A half-dozen template files and over a dozen CSS files is normal for the sites we build... and that's because we're very very frugal with template files and rely very heavily on CSS. :-)

Removing the need for manual registry of lazy-loaded code is also a DX boon. I prefer the menu and theme hooks not requiring me to specify a file and path for every entry. I'd love to get rid of the path and parent keys in the views_plugins() hook. That also removes the need for magic file naming, while preserving lazy-load capability.

If we want Drupal to continue to run without APC, then this effort is necessary. No one has come forward with another alternative to reduce code weight... because the only solution to code weight is "load less code", which is exactly what we're doing here.

sun’s picture

Status: Needs work » Needs review
  1. Not all hosts have an opcode cache. One of our goals is to bring more Drupal installations to the masses. My own benchmark results on my development box have been vastly different (between 20-30%) and the overall memory consumption decreased by 4 MB (24%) on most pages. Crell mentioned that ab is testing the full time for an entire HTTP request, while Devel and other tools measure the processing of PHP.
  2. The registry is the successor of the menu page callback registry and theme registry we already introduced in D6. #310467: Slimmer hook_theme() removes that remaining duplicate registry for theme functions, so developers no longer need to define 'file' in hook_theme().
  3. The last patch is no longer incremental. It also moves functions that follow no naming pattern, which the script could not match. Btw, we have no benchmarks on that one yet. Only theme functions are remaining - though I could improve the script to allow to move them as well (into the file of their corresponding content builder functions).
  4. I would be cautious with a statement that having everything in one .module file makes work with Drupal better. For example, it is absolutely no fun to work in bloated modules like node.module and user.module. Most often you need to grep those 100 kB files to find a function anyway. I would counter-argue: Separating functions into includes that follow a logical pattern can also improve DX with Drupal.
  5. After all, I'm still happy that we really discuss this in a broader scope now. We wouldn't have if we had moved registry-style functions only. And, if the resulting decision should be to ignore the performance on hosts that don't use an opcode cache, then I would argue that we have to move functions in $module.admin.inc and $module.pages.inc back into their respective module files as well. Currently, those partially moved functionalities are the most confusing thing to me when doing Drupal development work.
Anonymous’s picture

Status: Needs review » Needs work

was writing a reply, but now i mainly just want to say "what Crell said".

memory usage is a big issue - its really the elephant in the room when trying to scale out drupal. the apache children are just so fat on a big drupal site, you need more RAM/boxes to serve a given amount of traffic than just about any other comparable framework/CMS out there. its definitely a big X on the wrong side of the ledger when considering drupal for big sites - all other things being equal (yeah, that's a big assumption), it costs more to run drupal than most of its competitors.

i share Dries concern about how we split the code, but i think we can do it in two stages.

first, get rid of as much of the code that is currently loaded for every single full bootstrap by stripping back or eliminating .module files. we get an immediate memory win for all full bootstraps, and its an extremely simple rule.

second, we profile common paths, find the next low-hanging code-load bloat, and work out the best way to fix that, and wash-rinse-repeat.

moshe weitzman’s picture

@Crell - in order to convince Dries (and others like me who are mostly nodding with him), you have to address his data concern. He said 6-8% improvement under optimal conditions, and 1-2% under real world conditions (not sure if this is page speed or memory).

I'll agree that code registry has so far been disappointing. I think I even said "tremendous performance improvement" when it was under review. The killer patch which made it worthwhile got hung up on some really hard problems - #259412: Total module system revamp. Parts of that patch have been incorporated though so it is not so bad.

The registry adds a lot of complexity to the code, and mostly benefits those sites without an opcode cache. That is to say, it benefits folks who can't possibly care much about performance since they are ignoring a 9x speedup (don't quibble - it is the biggest optimization by far). There is only so much you can do to help the ignorant. Please let's not get into the poverty argument - VPS hosting is so cheap now. Managed hosting is a bit more, but often doable.

andypost’s picture

and at the third step make joined file with common part of bootstrap-code to load this once not as many files

I glad to try join all core files in one and test but code-registry is mistery for me

catch’s picture

On the DX issue, I use vi from terminal for everything. If I'm not sure where a function lives, I do grep -r "function function_name" * from the root of my Drupal install, then vi /foo/bar/bar.module - if it's a huge module like user.module I then have to search within vim for the function itself or scroll down 4000 lines. If instead I can grep -r "function user_node_load" * and go to modules/user/user.node.inc and see all my lovely node related functions right there - that doesn't seem any worse to me at all (not necessarily a lot better, but no worse).

On speaking to sun in irc he suggested we do one more re-roll and benchmark that which makes sense to me. Also if anyone knows how to measure memory usage of apache processes (as opposed to what PHP measures itself), now would be a good time to speak up - I still feel like there should be a small memory gain for APC users if the APC cache itself doesn't have to deal with a lot of unused files.

Also a note that we should move hook_requirements() and probably hook_schema() into their own includes so huge .install dont' get loaded on admin.

sun’s picture

Two additional facts that may not be clear to everyone yet:

  1. This .module split (currently) moves all CUD in CRUD to include files. This means: On regular page views, we load the same amount of files as we do now. Only when something "happens", we load more (but only the appropriate code for "something").
  2. Opcode caches usually support garbage collection. Whenever a site does not need certain hooks, and those hooks are moved away, the opcode cache will flush that garbage over time. Trivial examples are triggers/actions and XMLRPC implementations.
mikey_p’s picture

I'll try to run some tests to show Apache memory usage soon. I usually use eAccelerator so I'll run them with and without it enabled.

One additional point (that checking Apache memory usage should show) is that potential memory savings may allow a higher max clients, and ultimately higher concurrency on the same webserver (assuming the database is up to the load).

This patch needs to be evaluated "system as a whole" with more than what ab2 -c1 -n500 will reveal.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new13.67 KB
new959.86 KB

Attached is the latest version of registry.php.

All further fixes that were moved into #471278: Prepare module split have been commented out (and remarked in the script).

Status: Needs review » Needs work

The last submitted patch failed testing.

kbahey’s picture

Subscribe.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new473 bytes
new14.16 KB
new849.84 KB

In the meantime, registry.php contains a list of @todos -- bugs in Drupal core, some of them unrelated to this patch, but only uncovered due to this patch.

catch’s picture

StatusFileSize
new57.98 KB
new49.95 KB

Did some more testing, with APC only this time, all core modules enabled.

ab -c1 n1000 - benchmarks are the same, but also looked at APC usage and apache processes via top.

With APC:
HEAD:

Apache processes (from top) - virt 249m, 29M res, 16m shr

(one process is res 56m, shr 25m, rest same as above)


Patch:

Apache processes virt 249m, res  28m, shr 14m

(one process is res 50, shr 22, rest same as above)

I've attached apc.php screenshots - these are from restarting apache, running the benchmarks, looking at apc.php. Obviously as more pages are visited on the site, the patched memory usage is going to increase as the includes get picked up - but if you have files which are very rarely or never used (say actions hooks on some sites) - then it's likely there'd be some long term savings, especially with contrib added in.

To me this looks like about a 1M memory saving with APC, at least in the 'just started up' state. If you double or treble the number of modules installed, then I reckon we're looking at 2-5M per process (given the bulk of code in core isn't contained in hooks, and lot of code in contrib is).

Status: Needs review » Needs work

The last submitted patch failed testing.

kbahey’s picture

If you are running php as mod_php in Apache, you will see the first process to handle a request with inflated memory size, because it is the one that loaded and parsed the initial scripts. Other processes will be smaller because they fetch the cached op codes from shared memory.

A more accurate memory figure would be what the performance module (part of devel) would say.

You can select either summary or detail, in the database, and see a before and after to compare.

bjaspan’s picture

If you are running php as mod_php in Apache ... Other processes will be smaller because they fetch the cached op codes from shared memory.

Wrong. I thought that too until I actual tried to verify it. The opcode cache is NOT shared across processes. Every Apache process builds up its own copy. They all end up using the same amount of memory after they've each compiled more or less all the files. I verified this with APC, XCache, and eAccelerator.

catch’s picture

If that's the case, then judging by my tests, I think we'll see small memory improvements in APC with this even after apache processes have been running for a while - because dead or very rarely used code isn't going to be loaded at all in the case of individual processes.

bjaspan’s picture

Yes, I think that's true: we'll see small memory usage improvements since very rarely used code won't "ever" be loaded (Apache processes die after a max # of requests so "ever" is a reasonable approximation).

The fact that opcode caches do not share memory like everyone (including me) seems to assume they do does not seem relevant to this issue. Of course, it is extremely relevant to Drupal hosting in general and I sure would like to take the time to dig into the PHP internals to find out why the memory is not being shared...

killes@www.drop.org’s picture

I am really wondering what advantages this patch would bring me as a person hosting a Drupal site:

1) the idea is to use less memory per process
2) yet, I have to cater for it to use the full amount of memory because if an admin works the site, all code still can be included.
3) From that follows that I can't use php.ini to restrict the available memory because admins shouldn't get WSODs
4) The only use case that I can see is that you force admins to e.g. use admin.mysite.com and normal users stay on mysite.com and configure both sites differently in apache/ php.ini.
5) Isn't that just a fringe use case?

Crell’s picture

1) Less memory and less CPU.

2) Not true. As of Drupal 6, Drupal never loads its entire code base at once. Page callback files are loaded individually as needed and there is no instance in which they are all loaded at once. This patch extends that same logic to other callbacks and hooks. Unless you go well out of your way, there should never be a case when all of the code available to Drupal is called.

3) As #2 is not true, this statement does not logically follow.

4) This statement does not apply in the slightest as #2 is not true and neither is #3.

5) Every single Drupal 6 and later site in existence has a hook_menu, hook_theme, hook_node_info, and various other "call once and don't call again" hooks. That is, at minimum, more code that we do not need to load on most page loads on every single Drupal 7 site if we simply leverage the registry properly. Hardly fringe use case.

6) Even if there are still a few pages that peak at high memory usage, reducing the footprint of a typical page load improves performance and allows more responses to be handled by a single web server. That's still a net win even if we still spike to 64 MB on the modules page, for instance. (Although that page would benefit from this split.)

killes@www.drop.org’s picture

Larry, you look at the individual page request. I am looking at an apache proess over time.

gerhard killesreiter’s picture

Larry told me that he considers the non-opcode case.

I don't think we should optimize for the non-opcode case.

owen barton’s picture

Subscribe

sun’s picture

Status: Needs work » Closed (won't fix)

I suggest to create a new issue that reverts the page callback split from D6 and moves all functions back into .module files.

webchick’s picture

Status: Closed (won't fix) » Needs work

Er. What?

sun’s picture

What's so unclear? Dries, moshe, as well as killes don't see any benefit in this split.

If we do not split, then the existing split does not make any sense, too.

And if we do not split, then the registry can be reverted.

The good news: Makes many issues in the queue obsolete.

webchick’s picture

Then that simply means more work needs to be done to either sway their opinions, or come up with a compromise. Cold, hard data like benchmarks helps. Then we can get out of "what if" land and into "this is what actually happens" land.

Also, on a more personal note... sun, I don't understand this ugly anger from you the last day or so. It feels like you are shutting down discussion everywhere you go, and yet this very discussion is the critical piece for our peer review process to function. If there's something specific bothering you, I'm on IRC all hours that I'm awake, and always open for a chat.

Crell’s picture

Dries, moshe, and killes are wrong. We have benchmarks to prove it. They are linked earlier in this issue.

Ignoring the non-opcode cache case is probably the absolute best way to ensure that Drupal becomes a niche product ignored by the majority of web developers.

sun’s picture

Contrary to your usual habits, just saying that they are wrong and closing comments won't work this time.

...and yet this very discussion is the critical piece for our peer review process to function. ...

catch’s picture

Well I agree with sun here:

The non-APC benchmarks already on this issue show decent memory and page execution savings - apparently not enough of an improvement though for Dries to think it's worth doing the split (and to be honest I wasn't particulary impressed by the results I got in testing - although seems some people got slightly better results than me).

The APC improvements are at best marginal, and may well not make any measurable difference to memory usage on real sites at all.

If we only care about optimizing of the opcode case, then there's no point in doing this, or the registry, or the page split - so we should revert all those changes and just have .module files. If we care about the non-opcode case, but aren't convinced enough by the data provided that it's a sufficient gain to make the change, then we should still revert the registry and the page split.

@webchick there's plenty of benchmarks already on this issue, from different people with different configurations - what other benchmarks do we need?

andypost’s picture

As I pointed in #94 there's a big performance degradation if modules splited in small peaces and opcode cache used - mostly caused by memory fragmentation.

So I think better focus on producing 2-tarballs -one joined into .modules only (for opcode) and another for non-opcode is splited as it now.

CalebD’s picture

Unfortunately it seems like optimizing for one case directly impacts and degrades performance on the other case. There's a lot to factor in here.

We can go forward with the split which will decrease performance in environments that use an opcode cache. It will also save on some memory. How much memory is saved comes down to how much of Drupal's code is loaded over the average lifespan of an Apache child process. Looks like the benchmarks above are placing it at a couple to several megs per process (depends on contrib)

However, going forward with the split will definitely help cases where an opcode cache isn't being used. This can most often be seen in situations where PHP is being invoked through mod_suphp or mod_fastcgi (or straight CGI if you're crazy :P). Personally, the host I'm on has PHP 4 and PHP 5 side-by-side using mod_php for PHP 4 and mod_suphp for PHP 5. Since PHP 4 is a piece of crap and more and more modules require PHP 5, the site I manage is using mod_suphp and would stand to benefit from the split.

I also use mod_suphp on my personal dev server because it reduces the annoyances of having to dance around permissions.

From the developer experience perspective, I'm definitely fine with splitting modules apart as it creates additional organization, but that's just a personal preference.

Running with andypost's idea, in an ideal world we'd be able to do both. I'm not very familiar with the packaging system that goes into creating module tarballs, but it seems like it would be possible to have a module be distributed as it is today but with the addition of a modulename.optimized or something. It would basically just be all the code containing module files concatenated into one file and would be loaded on systems that are using an opcode cache (which we'd have to detect).

That said, I'm just thinking big here. I'm not in-tune enough with the various core systems (registry, module loading routines, etc.) to know if that's possible.

sun’s picture

Improving opcode cache case as well via registry: I'm not sure whether it would work, but technically, the registry makes Drupal know of all its own code and functions. If it's true that opcode caches work faster with one huge LOB instead of many included files, then I could imagine that production sites - where code rarely changes - could use something like this:

/**
 * Implementation of hook_registry_alter().
 */
function apc_registry_alter() {
  $files = db_query("SELECT DISTINCT filename FROM {registry_file}");
  $lob = 'opindex.php';
  array_unshift($files, 'index.php');
  foreach ($files as $file) {
    file_put_contents(DRUPAL_ROOT . '/' . $lob, file_get_contents(DRUPAL_ROOT . '/' . $file), FILE_APPEND);
  }
  db_query("UPDATE {registry} SET filename = 'opindex.php'");
}
<IfModule mod_rewrite.c>
  # Rewrite URLs of the form 'x' to the form 'opindex.php?q=x'.
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^(.*)$ opindex.php?q=$1 [L,QSA]
</IfModule>
Crell’s picture

That depends greatly on the code being run. I don't read Russian so I don't know what their test code is.

There is a cost of a disk hit for every file loaded, true. However, OS level caching ameliorates a great deal of that on modern systems. That also has to be balanced against the time spend loading code that you don't need.

Opcode cache performance is also heavily dependent on whether you use static vs. dynamic includes. Every time Rasmus gives a presentation he whines about people using conditional includes because it doesn't play nice with opcode caches. Well, every single include in Drupal save the bootstrap include in index.php is already dynamic. We're never going to be properly optimized for opcode caches as long as Drupal is a flexible, pluggable system. (And I certainly hope that never changes. :-) )

All of the benchmarks we've run on this actual patch, in all its various iterations, say the same thing:

1) It does help non-opcode cache situations by a notable amount. How notable varies a bit, but it's always considerable.

2) It doesn't really help or hurt opcode caches one way or another.

So on the performance front, I really don't see any argument against moving forward.

Which leaves only the DX question, which I addressed in #95. Short version: It's not the issue people seem to think, and given that some IDEs choke on system.module, content.module, and other multi-thousand-line files (and my brain certainly does) I can see a sensible and predicable organization of code as a DX plus, not a minus.

We can argue all day about whether people *should* run opcode caches on a VPS (although some VPSes do not support opcode caches), and whether people *should* hire a separate sysadmin to run their VPS (many VPSes are unmanaged, which is not acceptable for a great many people). That doesn't change the fact that the vast majority of sites *don't*. A personal blog that wants a calendar and a panels-based layout can break a shared host at this point. I'm running all my D6 sites on 64 MB memory now. This is not acceptable. We either address it or cede the shared hosting market to other projects.

Crell’s picture

Hm, massive cross-posting. Even if we wanted to have a "compiled" version of Drupal for opcode caches, we would still need a sensible split of code in the first place. That makes this patch a prerequisite for that.

kbahey’s picture

Whatever we do, we can't leave the non-op-code cache case untended to. We have been progressively using more memory. Until recently we could run core in the default 8MB of php. Now 16MB is required. Add a few of the popular contrib and that is not enough.

If we want Drupal to continue to grow, we can't forget about shared hosting.

Personally, I stopped using shared hosting years ago, and all of our clients are on dedicated and VPS, and all have op code caches. But we still need to strike a balance between that and shared hosts.

Op code caches (e.g. APC) still issue lstat() for each file to load, and the split hurts here because there are more files to check, even if the compiled code is in memory. Run strace on a php process at some point to see that in action. Or ping me offline and I will fetch the output for you.

The registry and code split still benefits the shared hosting case, by reducing memory consumption, e.g for non admin pages. So it is not a mutually exclusive case of either/or.

Going forward, I like the idea of "compiling Drupal" into a big BLOB that makes op-code caches happier. Sun's snippet above are promising. It can also be a script that does that. chx had something to that effect earlier.

This way, we keep shared hosting sort of under control, while those who have op code cache can go further with some extra steps.

JohnForsythe’s picture

To reprise what I wrote in #93,

It seems like this splitting of module files could be automated, and selectively turned on or off. Include comments in the code that indicate where a split point should occur, and the name of the file to split it to.

Then, on installation, if splitting is enabled, the file can be automatically parsed for split points, and split out into the required parts.

Splitting for those who need it, no splitting for those who don't.

kbahey’s picture

@JohnForsythe

I like that! And it makes writing modules simpler, we all still write everything in a single .module. (but perhaps less readable, because of the sheer size of each file?)

Challenge is that a new hook_menu() will have to be created by the splitting script, to say which callback is in which file.

But all in all seems doable.

Crell’s picture

An automated "split compiler" will never get things right in all cases, and when it doesn't you're on your own to figure out how to do it manually. Which of course makes your system impossible to upgrade. That also puts the hard work of reorganizing code onto the people who don't have an opcode cache... who are going to look at Drupal and say "WTF? I need to run a compile script on this command line thingie before I can use it? Let me go try out that new WordPress 2.8 instead..."

Also, I hold that a single 5000 line file is worse DX than 5 1000 line files.

An automated "merge compiler" is an order of magnitude easier to write, since it's just concatenating files and stripping a few PHP open tags. It also then puts the extra step of work onto just those people that are running an opcode cache, and are running a high-enough traffic site that it actually matters (since even without a massive merge, an opcode cache is faster than without), who are almost by definition going to be able to run a merge script to "compile" Drupal before running it and then mess with their .htaccess file.

There are a number of projects that have a "merge everything" compile step as an option. I don't know a single project that does it backward.

But before we have a merge script, we need to have Drupal split up sanely so that it will run on shared hosts at all. Now can we please stop re-debating the same issue all over again? We had this discussion a year ago when the registry first went in. We already committed to this approach. We already have benchmarks to say that it either helps or does nothing to hurt. Anything we can do later to further optimize opcode caches is just gravy, and belongs in another issue.

Nick Lewis’s picture

+1 for for moving hooks that are only called when new things are installed, and when cache is rebuilt. Double plus one. As for anything that goes further, don't make me recite a quote about premature optimization. Hardware is a lot cheeper than drupal devs.

andypost’s picture

Let's continue this issue with splitting for shared hosts.

Suppose we should file another issue for merge.

Now #459980: Rework admin/settings/performance so we can put merge functionality into this settings or leave for contrib

sun’s picture

Reminder: Properly splitting the code with the registry - and thereby removing all hard-coded filepath inclusions - is the only way to apply any kind of dynamic merging (or splitting) later on.

catch’s picture

I thought more about sun's comment in #118, and I've opened an issue to examine what the consequences of that would actually look like, whether there's other alternatives etc. #497118: Remove the registry (for functions). fwiw despite the fact I came up with some reasonable arguments on the issue, overall I'd rather see this one marked fixed. What I don't want to see is this one in limbo like it is now without a wider discussion of why we ever committed the registry in the first place if we're not going to use it as it was intended.

smk-ka’s picture

Citing Crell:

Also, I hold that a single 5000 line file is /worse/ DX than 5 1000
line files.

I would agree, if it were a logical split, ie. from an OO point of view it would make totally sense. However, the proposed approach splits modules at rather arbitrary points. In #101 sun wrote we "move all CUD in CRUD to include files". From a developer's POV, this *is* arbitrary. As much as I am for better performance, IMO this approach will make DX worse.

sun’s picture

Status: Needs work » Closed (won't fix)
wismoyo’s picture

Status: Closed (won't fix) » Needs review

registry-split.patch queued for re-testing.

axyjo’s picture

Status: Needs review » Closed (won't fix)

Resetting status.