This is a follow-up to this issue and somewhat to this one.

This patch adds another field to menu items, includes. The "includes" field specifies an extra file to be loaded when the system executes that page handler. The intent is to allow module authors to shove page callbacks off into their own files -- either one file for all callbacks for the module or multiple files split by task at the module author's discretion -- and then include that file only if those page handlers will actually be used. Merlin and webchick found (second issue above) that many modules, particularly in core, are very page handler heavy in terms of the shear number of lines of code they have, the majority of which is never used.

This patch also splits up the system, user, block, and comment modules to show how it can be used, mostly along the lines that merlin and webchick suggested (again, second issue above). I had to move a few things around that were actually used elsewhere after all, and I did not do anything to split out "API" functions; it's just page handlers and "other" in .module. Even with that simplistic split, though, we save a lot of bytes:

block.module: 14956
block.pages.inc: 15662
user.module: 28781
user.pages.inc: 89190
system.module: 31052
system.pages.inc: 66896
comment.module: 19330
comment.pages.inc: 57746

That's a total of 229,494 bytes that we don't have to load on every uncached page load, just for those 4 modules. Nifty.

The degenerate case if no extra file is specified is that there's just the .module file that has everything, just like we do now. How a module author chooses to break up code beyond that is up to him. (One .pages.inc file, a pages.inc and pages.admin.inc file, a views.pages.inc and views.pages.ui.inc and views.pages.themewizard.inc file, whatever.) That can/should be handled on a case by case basis. Once this is in we can see about breaking up the rest of core along similar lines.

Other notes:

  • The includes key takes the full path to a file. That's because we do have places in core where a menu hook in module A points to a page callback in module B (eg, the Admin/Content Management path). Some contribs do so as well, I think. Pushing the path definition out to hook_menu() gives us the most flexibility there.
  • The key is currently called "includes" but takes only a single file. That's because I was originally intending to make it an array so that one could have multiple includes per-menu callback. That would allow the menu item to define, for instance, a library file that was needed from some other module (eg, token) or core (eg, tablesort). It would also allow other modules to menu_alter in their own additions, such as the form_alters they need for that particular page's form in a very runtime-efficient manner since there's no extra lookup process. However, as I write this merlinofchaos and chx are in IRC telling me that's a stupid idea for various reasons. :-) If it ends up staying a single file restriction then the key will probably be renamed to "file" or something ("include" being reserved word).
  • When we split up the rest of core, I suggest that there are some page callbacks we leave in the main .module file to avoid the extra disk hit, such as node_view. Again, that's something to determine case-by-case.
  • Please don't be scared at the size of the patch file. It's 98% just cutting and pasting code around. :-) The actual new code added to the menu system is less than 10 lines, I think.

Comments

chx’s picture

Let's stay simple for now. I am absolutely against making this an array but not now. Let's not make this issue even more clouded by hammering everything in. Small moves (tells the man who has written the kitchen sink...).

dries’s picture

Also take a look at the locale clean-up patch -- it could take advantage of this patch.

The main motivation of this patch is performance, IMO, so let's much sure to run some benchmarks.

Crell’s picture

StatusFileSize
new516.25 KB

One of these days I'll learn how to use CVS properly. *sigh* This one should actually include the comment.pages.inc file, and moves a few small functions from comment.pages.inc back to comment.module since they're needed elsewhere.

Crell’s picture

StatusFileSize
new516.58 KB

And this one should have a working admin/settings page!

merlinofchaos’s picture

Simple benchmarks:

Timing:

Without APC, without patch. Drupal HEAD front page, 10 nodes, menu block. 10 reloads, time is ms from devel.module
697.36
701.08
696.91
694.29
694.51
693.44
694.14
712.12
694.59
693.39

Average: 697.18 ms
Median: 694.55 ms

Without APC, with patch. Drupal HEAD front page, 10 nodes, menu block. 10 reloads, time is ms from devel.module
632.03
635.61
632.31
634.34
631.92
631.11
631.37
630.79
633.11
641.22

Average: 633.381 ms
Median: 632.17 ms

Savings using average: 9.15%
Savings using median: 8.98%

Without APC, without patch. Drupal HEAD 'create content' page. 5 reloads, time is ms from devel.module

379.94
375.25
375.34
379.4
374.92

Average: 376.97 ms
median: 375.34 ms

Without APC, with patch. Drupal HEAD 'create content' page. 5 reloads, time is ms from devel.module

309.57
308.25
309.73
310.21
310.41

Average: 309.63 ms
Median: 309.73 ms

Savings using average: 17.86%
Savings using median: 17.48%

With APC on, the page times were close enough to identical to not bother posting.

merlinofchaos’s picture

My conclusions from the statistics above:

The savings is static; meaning the biggest gains will be seen on the pages doing the least work. This is not at all surprising, but the statistics confirm that.

On a page that basically does very little, we see an 18% performance increase from splitting up just 4 modules. There is still significant work to be done to node.module, taxonomy.module and perhaps filter.module for just the core installation.

For sites not using APC, this is a major benefit. Not all sites can use APC.

dries’s picture

These savings are significant which makes this patch really worth pursuing.

Do we want guidelines how these files need to be split? And do we want to programatically enforce (some of) these guidelines? I'm not saying we should, but it might help to get consistency and predictability across all modules. Food for thought.

dman’s picture

:(
The metrics are not as impressive as I'd have hoped.
Funny thing is, I'd built my module exactly like this over a year ago, specifically shoving the admin/* options off into their own little include to avoid them coming in for every page load.

Looks like it wasn't really worth the effort. Premature optimization :)

Pity, it sounds like a clever idea. Managing this bifurcated code however is probably more trouble than the win it may produce.

moshe weitzman’s picture

subscribe

moshe weitzman’s picture

My gut doesn't is a bit worried about this direction. I think that modules use each other in lots of neat ways right now. And we'll start finding in our callbacks that certain functions are not available when we want them. And then modules will start loading each others includes and it is gonna get messy.

Also, it is currently really simple and easy to write a small include script which does anything you want just by including bootstrap.php and drupal_bootstrap(), and then your own logic. This script introduces a dependency on menu system. This risk would be mitigated if we had a single function call like "load all" that setup the current environment for us.

moshe weitzman’s picture

Perhaps we introduce a new bootstrap phase DRUPAL_BOOTSTRAP_NORMAL which loads the includes as suggested in this patch. This is the standard bootstrap. Any script that needs access to unloaded stuff can do a DRUPAL_BOOTSTRAP_FULL in order to get it.

This will require some changes from this patch. Core needs to know how to load everything. I think the way toward that end is to standardize on a few include names and situations when they are loaded.

kbahey’s picture

We need to also be careful about overcomplicating stuff.

Our modules used to be a single file (plus a .mysql file) that had everything in it. Then we introduced .install, then .info. Some modules already have .inc.

With this patch, we can overcomplicate things and have 7 includes for a module, ...etc.

Having admin and other makes sense. Anything over that is overkill, at least initially. Think of the entry level developers, ...etc.

This is not to say that I am against the patch. No, I am for it.

But as Dries said we need guidelines, and for those we can say start by having admin and non-admin incs only, and introduce others only as needed, and only a few advanced use cases should do that.

Then we see how it goes for one release cycle and revise the guidelines and/or the code as necessary.

One other thing that this patch should benefit is memory utilization. It is no longer possible to run Drupal with a few modules in 8M like we used to be able to. This patch should help in this direction on shared hosts.

merlinofchaos’s picture

See. I actually disagree with the fears of making things overly complicated.

Now, this may be the environ I come from -- and Dries is pretty familiar with that environ as well -- where things are broken up MUCH more granularly.

I find it very difficult to organize the code in .module files; and to find code once a module gets beyond a certain point. I think it's actually *very* good to split things up logically; it becomes a great deal more clear.

Moshe: It's very important that 'API' functions remain available, or that there are ways for API to become available. It may be as simple as modules exposing a function that loads in appropriate includes (this is probably optimal) such as Views has been doing with the 'views_load_cache()' function, or it may be wrappers that auto-load things as well (this can create extra performance problems though when files try to be included too often).

Since this particular patch only includes pages, the guidelines are fairly simple:

Your pages go in a modulename.pages.inc -- if your module provides a lot of pages and there is an easy way to split them obviously, do that. The only obvious split for existing core modules I can think of (and 99% of the modules I know of) is to separate the admin pages from the regular pages. But even that split isn't very useful for core modules. The only one I can think of that provides significant non-admin pages is...node.module.

dman: I think these numbers are quite significant. I think it's important to remember that what we have in this patch is a highly incomplete solution. It doesn't address all of the modules that are in use, nor does it take the next step and separate out hooks and such, nor does it remove theme_* functions to other files (or templates) which is work that also needs to be done.

I found an 18% savings on a page that essentially does nothing; we know from Rasmus' talk (and my own profiling which agrees with his numbers) that 50% of a simple page is taken up by loading modules. That means that, with a little more work, we could probably turn that 18% into 35%, simple ensuring that modules are as trim as possible. We might be able to further enhance things so that the core include/ files are slimmer too. I think this patch is a big step but it is not the whole trip, and we have to keep in mind that seeing a 35% to 40% load time reduction will take a lot more effort.

I also think that what I didn't look into was the amount of memory Drupal uses per load, and that's another thing we should be looking at. Drupal uses too much memory, and reducing the amount of code loaded will also help that problem.

BTW, I want to throw in my vote that 'includes' become 'page file' which matches the syntax used in other, similar patches that allow registration of something to include a filename, and be singular. I don't see much benefit to multiple-include-at-registration, it only complicates matters. Also, I think we should make pathing consistent with the theme registry, where the path adding is automatic but another variable can be used to override the guessing.

merlinofchaos’s picture

Memory usage, via devel:

Prior to patch:
* Memory used at devel_init(): 6.27 MB
* Memory used at devel_shutdown(): 6.49 MB

With patch:
* Memory used at devel_init(): 4.74 MB
* Memory used at devel_shutdown(): 4.97 MB

I think we can get that down a whole lot farther without adding significant complexity.

moshe weitzman’s picture

Earl said:

It may be as simple as modules exposing a function that loads in appropriate includes (this is probably optimal) such as Views has been doing with the 'views_load_cache()' function

Right. Thus all your API functions are going to have to live in .module, right?

Perhaps the menu part of this patch isn't needed at all? Couldn't modules organize themselves so that less code is included on every request? I can see some benefit to the menu system auto-loading an admin include but thats about it.

Assuming we proceed, I like Earl's suggestion about having menu system guess the right path. We could go further and have it guess the right include file. For example, a callback function named aggregator_page_sources would trigger an automatic load attempt of page.inc in the block directory. 'page' comes from the second part of the callback function name.same for aggregator_admin_settings(). if the guessed filename does not exist, it is not loaded (obviously).

wim leers’s picture

It seems to me that together with the revamped menu system (and also, but in lesser degree, FAPI 3) will make Drupal 6 much, much faster than Drupal 5. Throw in JS aggregation and then Drupal 6 will blow Drupal 5's performance away :)

Subscribing!

Crell’s picture

merlin is right about the bootstrap issue. The only way to make the bootstrap process considerably faster for non-APC sites is to simply have less code to stat/compile on each load. I wasn't really thinking about memory use as much, but if trimming 200 KB from the source gives us a ~1.5 MB reduction in memory usage, that sounds like a huge win, too. And remember, this is just from 4 modules.

The "oops, that other module was badly factored" problem moshe mentioned is one of the reasons I think multiple includes are important. It lets module A's author say "well bah, module B's author is hiding too much from me, but I can grab that without having to go modify his module."

I'm also not wild about the auto-guessing. I want to keep the degenerate case one big module file so that smaller modules don't become overly complex and over-factored. I also make no claim of knowing exactly how a module "should" be broken up. That's something a module author should be deciding. If we don't make it mandatory, then we have an extra stat or function_exists() call on every page load, which is a waste of cycles. It also imposes even more restrictions on function names, something I'm not wild about doing.

moshe weitzman’s picture

@Crell - You are perpetuating a myth about file_exists(). It is very fast - we have benchmarked this many times. Bust out your profiler and take a look at 1000 requests or so. The OS caches this. Anyway, I proposed adding exactly one file_exists() call. That a drop in the bucket compared to our current number. Further, noone is imposing function names. These are conventions that a module author can choose to use. That's how Rails works, and I rather like it. Conventions are powerful. We can agree to disagree on this though.

My earlier point stands - we can decrease the amount of code loaded on each page without touching menu system. Just requires some refactoring into include files.

moshe weitzman’s picture

@Crell - You are perpetuating a myth about file_exists(). It is very fast - we have benchmarked this many times. Bust out your profiler and take a look at 1000 requests or so. The OS caches this. Anyway, I proposed adding exactly one file_exists() call. That a drop in the bucket compared to our current number. Further, noone is imposing function names. These are conventions that a module author can choose to use. That's how Rails works, and I rather like it. Conventions are powerful. We can agree to disagree on this though.

My earlier point stands - we can decrease the amount of code loaded on each page without touching menu system. Just requires some refactoring into include files.

chx’s picture

Let me spin Moshe's idea further for a bit! If we impose a naming proposal modulename_includename_whatever_you_want then module_invoke can easily check for includename.inc besides the module. So modules can still call each other, there is very, very little additional code needed but we still gain what we wanted. What say you?

chx’s picture

StatusFileSize
new975 bytes

I can't help loving this idea.

dries’s picture

- Wim: as it stands, I don't think Drupal 6 is faster than Drupal 5. I might be wrong though.

- all: what do to others think about chx's proposals.

fago’s picture

I really like this idea. It's cleaner than the 'include' menu property and much more powerful.
If modules have to use module_invoke it would enable us to even put the API of an module in an include and it will be only loaded if used! really great! That could really save us a lot of memory!

However I think the current patch misses a module_exists() check or disabled modules might be also invoked.

Crell’s picture

I must be missing something in chx's patch, because I don't see how it does anything for non-hooks. The bulk of the modules in question are page callbacks, not hook implementations. That's what we're trying to split off. How exactly can one split page handlers off into a separate file without doing something in the menu system, since the menu system is the page dispatcher?

Also, chx's patch seems to work only for hooks that have an underscore in their name. Most hooks do not.

I'm not against having auto-loading hooks, if we can do it without increasing disk IO too much, but that's separate from page handlers, which are not hooks. That should end up in a separate issue if desired. The goal here is to split off page handlers/callbacks, since in any given page load the vast majority are never used. (Presumably only one is ever used.)

Ways that I can see page callbacks factored out:

  1. Callbacks explicitly specify what file(s) to load, if any. This is the method used in this patch. Upside is flexibility, since module authors can factor out their code however is best for their use case. Downside is that, unless we allow multiple includes, it's possible a module author can over-factor and hide a useful functions from another module that needs it.
  2. Put all callbacks in a modulename.pages or modulename.pages.inc file. Upside is that module authors don't need to specify an include file explicitly. Downside is module authors can't split up their includes beyond one file, and the menu system would then need to know which module provided the callback to know what file to include. it also still allows module authors to over-factor and hide functions from other modules. There's also the problem that some modules use page handlers from other modules. (Eg, node.module calls a system.module callback.)
  3. Specify some cascading naming system for auto-including. Eg, if you name your page callback mymodule_foo_bar_dostuff() the menu system will try to load mymodule/mymodule.foo.inc and mymodule/mymodule.foo.bar.inc and mymodule.foo.bar.dostuff.inc before calling the page callback. Upside is no changes are needed to hook_menu(). Downside is module authors need to rename their functions in order to take advantage of it, and it still allows module authors to inadvertently hide functions that other modules could need. If more than one of those files exists, it could also mean an extra disk hit to load the file.
  4. Page callback is always stored function-per-file. So a page callback mymodule_foo_bar_dostuff() would always live in mymodule/mymodule_foo_bar_doostuff.inc. Upside is the file gets really really small. Downside is that we then up with a lot of files in some modules.

I favor the flexibility of option 1, as it give module authors the most flexibility and retains one-big-file as the degenerate case. It also is most able to allow module authors to work around another module author's bad factoring.

Hm, something else that just occurred to me, too. Half our page callbacks are drupal_get_form callbacks, and the "page handler" is really the form definition function(s). But those also need to be available for drupal_execute(), which could be called at any time. That's a pickle. I'm not sure how we want to deal with that one.

dman’s picture

I like the concept. It has a feel-good code factor to it.
I'd specifically be happy to see the admin chunks split from the display chunks of module code so pageloads are cheaper.
However I wouldn't want to have to know so much about the feature I'm calling in someone elses module to know I should have pre-loaded their dependancies. This is probably an issue further beyond the current 'callback' refactoring proposal.

The mem stats merlinofchaos posted are very significant, as we know that mem problems have been plaguing shared-host users for a while.

It can easily get complex, so this discussion on conventions is very important.
Like I say. it's a feel-good approach, so I'm keen.

chx’s picture

Some notes about the patch. If you look at the dates of the posts it was written in ten minutes. It's a proof of a concept, showing how easy to code it. It is specifically written for non-hook invokes as hooks are handled by http://drupal.org/node/116165 . So what you do is, say module_invoke('modulename', 'includename_whatever_you_want'). Moving this function out of module_invoke into to a separate function and getting menu_execute_active_handler to use it is peanuts. Becuase of 'by module' admin pages we already considered storing the defining module in the menu table...

Crell’s picture

StatusFileSize
new21.12 KB

So after talking with chx I realized what it was he was talking about, so here's a new version of the patch.

New logic:

  • We introduce a new function, drupal_include(), which takes a module and a component (function name minus the leading module name). It will then try to load the file $module/$module.[part of component before the first _ ].inc.
  • menu_rebuild() manually invokes each menu hook so that it can track which items are coming from which module just long enough to assign the module to the item. If it's already set in hook_menu(), the explicit module takes precedence.
  • menu_execute_active_handler(), drupal_retrieve_form(), module_invoke(), and module_invoke_all() now use drupal_include() to ensure that whatever they're about to call is actually loaded if it isn't already.

What this gives us is the following functionality:

  • A module foo that defines a menu item thing/stuff with callback foo_bar_baz() will automatically trigger an include of foo/foo.bar.inc when called with no other intervention by the developer.
  • A module fake that defines a menu item that also uses foo_bar_baz() can simply add a 'module'=>'foo' key to the corresponding menu item, which will also trigger an include of foo/foo.bar.inc.
  • If a form foo_admin_bar is defined in foo_forms(), even if it has no parameters, calling drupal_get_form() or drupal_execute(), as a page handler or otherwise, will trigger an include of foo/foo.admin.inc. The hook_forms() declaration is necessary in order to know what module the form comes from.
  • Calling module_invoke('foo', 'bar_baz') will try to load foo/foo.bar.inc as well, so we can even separate out API functions if we really really want to.

The attached patch demonstrates most of the above with some minor factoring out of system.module, but nowhere near as much as the original patch did. It also renames system_main_admin_page() to system_admin_main_page() so that it can live in the "admin" include file. More will be factored out later once the logic is nailed down, but that can be done in multi-patch stages I think.

One thing that kinda bugs me is the fact that we end up with pages, forms, and api calls that share a "section" in the same file. I'm not sure I like that. What I'm considering is adding a third parameter to drupal_include() that specifies the "type": page, form, or inc. So you'd have system.admin.page for pages and system.admin.form for form callbacks. In the vast majority of cases only one or the other would ever be loaded. Thoughts?

I'm open to also having a "file" key if, for some reason, a module author wanted to store a page callback or form in a file that doesn't follow the magic naming convention. Thoughts?

profix898’s picture

I'm not convinced that this is the right way. In your example of 'system_admin_theme_settings' the .inc file is supposed to be 'module/admin.inc', BUT ...
1. the usual naming pattern nowadays is module/module_admin.inc and this patch would result in heavy renaming in cvs (what actually is not possible, we will loose the whole history of our files in cvs) and
2. you can't have 2 different admin includes - e.g. module_admin_theme.inc and module_admin_block.inc. Thats IMO a big -1 on this approach.

I think we shouldnt try to 'guess' the include file from the function name or we would need to try different include files on each call of this function to solve the problem of function names with multiple underscores in it.

kbahey’s picture

How many extra includes are we talking about for a typical page load in that latest patch?

Despite assurances that this a fast operation, I am concerned. We need benchmarks.

Crell’s picture

@ kbahey: It's a theoretical maximum of one include for the page callback and one include per form on the page (although as I've said before the most common forms and page callbacks should probably be left in the main .module file to avoid the extra disk hit). For the module_invoke() API calls it's variable. It will depend heavily on how module authors factor out their code. I personally probably won't do so; that's more of a convenient side-effect than the main thrust of this patch. In actual practice, I expect this to be 1-2 extra includes on a typical page, but it will depend heavily on individual module authors and on how we break up core.

@ profix: I deliberately used the $module.$section.inc naming setup rather than just $section.inc specifically because it's common practice to include the module name in the file anyway. merlin had suggested the double-dot convention. I have no special preference between that and and underscore if that's what people would prefer.

Regarding not having multiple includes, I wanted multiple includes myself but chx and merlin shot that down with a vengeance. I believe the new theme system, though, has its own including mechanism where necessary.

chx’s picture

I seriously hate repeating myself. I have not shown down anything with a vengeance. I said "small moves" -- not now, please. We (will) have enough problems without implementing everything in one patch.

Crell’s picture

I stand corrected, then.

Other thoughts on this approach or the code therein?

chx’s picture

I am not entitled to comment much on my own idea, aren't I :) ? There is no need for a system update at this point, HEAD->HEAD updates are unsupported.

fago’s picture

I think this comment is wrong?

+ *   is available, call drupal_include('system', 'admin_theme_settings').  This function will then
+ *   include system/admin.inc if it exists.

It would include system/system.admin.inc, wouldn't it ?

Furthermore it would be neat, if it would be possible to just use drupal_include($function) - and the right include will be included..
If we do that, one could use drupal_include('system_admin') to include a file - which is quite intuitive if the file is also called 'system_admin'.inc. As an affect we could also remove all the 'module' properties from the menu and form code as it it could do just drupal_include($callback). Furthermore we could stay using module_invoke_all there.

The problem with that is - that there might be conflicting module names like node, node_foo, .. - perhaps drupal_include could try through all possible module combinations until it finds a file (or not..)? Usually there shouldn't be much possible module names, so the behavior won't change much in comparison to the proposed one.

What do you think about that?

Crell’s picture

Yeah, that comment is wrong. I forgot to update it. :-) I'll do that in the next patch. Thanks.

As for a simpler syntax, the problem is, as you noted, introspecting module names from strings is hard. Take, for instance, the function eek_ack_urk_ugh_bah. (And there are modules that have function names that long.) The logic for finding that introspectively would be as follows:

if function already exists, stop.
if eek/eek_ack.inc exists, include it and stop.
if eek_ack/eek_ack_urk.inc exists, include it and stop.
if eek_ack_urk/eek_ack_urk_ugh.inc exists, include it and stop.

That means that if module eek_ack exists and has a library urk, it works fine. If, however, module eek then adds its own section library called ack, eek_ack's additional libraries are now blocked. That effectively blocks parent modules from implementing a library with the same name as some other module they have no control over for fear of breaking those sub-modules. And of course since drupal_include() would be called on every single page load, we have to be very careful with performance. (There's a drupal_get_path() call in each one of those, for instance, so it's not just three if-exists calls.) While we could go through the entire process and include all of them until the function exists, that means potentially including extra files we have no use for.

One option, I suppose, is to say "if that happens, it's your problem" and include the "file" property as I suggested above. That's the "out" for sub-modules if they get caught in that. The problem is that would only work for menu items and forms (and forms would then still need a hook_forms() implementation in which to specify the file property), not for API calls since there's no place then to actually declare the API function. And I still want to add in the $type property so that forms and pages don't have to live in the same file.

What do people think of that? Sub-modules have potential name collisions but we let them "out" for pages and forms with explicit declaration, and for API calls "just be careful"? I'm iffy.

fago’s picture

I think, people shouldn't have to worry about this much.
What about using module_list() to find matching module names? That would minimize the possible matching module names..

if function already exists, stop.
if eek is a module and eek/eek_ack.inc exists, include it and stop.
if eek_ack is a module and eek_ack/eek_ack_urk.inc exists, include it and stop.

or we go through the whole list of modules and determine if the function name begins with the module name..
or [...]
I don't know what's faster, but I think there are ways to get this done efficient - I agree that performance is important here.

For the possibility of overlapping include names, I think they are not possible as drupal_include only uses the substring until the first '_' - so there is now way to include node_admin_bla.inc for the node module.
In the end we would have some more logic in drupal_include(), but things are simpler for anyone using it.

moshe weitzman’s picture

I don't understand the confusion about module names. If everyone uses module_invoke(), there is no ambiguity. Thats what was in chx' original patch.

I haven't read every message in this thread closely - sorry. Though I wish some people were more terse. Brevity is beautiful.

fago’s picture

sry for confusing you..
We were talking about the possibility of doing drupal_include($module_include) as well as drupal_include($function) instead of drupal_include($module, $include). That would make identifying the right module for menu and form callbacks unneeded.

sime’s picture

With some guidance from Crell and merlinofchaos, I've started doing some splitting of ecommerce files and I'd like to offer some feedback.

I would love to see more code split up into .page.inc, .form.inc and .admin.inc and so on. The obvious benefit being a smaller load. I also agree with merlinofchaos (#13), ie. "context delineated" code is much easier to learn from.

To address the problem of functions-on-demand, why not a much simpler approach?:

  drupal_include_all($module);

The called module implements a hook:

function foo_include_all() {
  include_once(drupal_get_path('module','foo') .'/foo.admin.inc');
  include_once(drupal_get_path('module','foo') .'/foo.form.inc');
}  

Sure, you could extend this to:

  drupal_include_all($module, $type = 'form'); // or whatever suits you

but I can't see how adding that complexity is very useful. Because:

  1. The .module files should still contain the most useful functions
  2. Many modules may have no more than 1 additional .inc
  3. Arguably the .page.inc and .admin.inc (that way I see them being used) would rarely be of interest to other modules.
chx’s picture

@moshe, I suggested moving my code out to a separate function. But then again, I suggested storing module names for menus so there we do not have a problem.

forms are a problem. but even that does not mandate writing so many words that was written in this issue, that's true, I can't follow. This issue is getting to the point where I just close up, sum up where we are and open another one. If forms are a problem then either use hook_forms are use two underscores after the module name, there is no other solution. What other problems we have in short?

Crell’s picture

Category: feature » task
StatusFileSize
new134.73 KB

OK, after sleeping on this for a while, I don't think I want to play with magic naming at this point and opaque APIs. As others have said, incremental steps. We can think about magic later. At this point, hooks and API functions should just stay in the .module file. That should still get us a sizable performance boost for D6.

Attached patch is essentially the same as the first one, except it now uses two keys: file and 'file path'. They work essentially the same as file and path from hook_theme(), but I didn't use 'path' here to avoid confusion with the page path. I have also split up just system.module for demonstration purposes, specifically into 3 logical groupings. In all system.module is about 40% its previous size.

node and user modules also had their menu hooks tweaked slightly to handle the split, which also demonstrates the out-of-module handling.

Once this goes in, I'll start breaking up other core modules one at a time.

profix898’s picture

I didnt have time for a review, but I took a quick look at the code. The concept of 'file, filepath' is very flexible and should help us to easily split up larger .module files. +1 on the idea! Its also generic enough (in contrast to the 'magical' patch above) that we can convert modules already broken into smaller pieces (e.g. views, cck, ... in contrib) to use the new concept without the need to reorganize all the code. There are various modules which dont fit 100% into the '.forms.inc/.amin.inc/etc.' concept. So I'm really thankful that we still have the chance to organize the code ourselves and therefore optimize it for a particular module. I also like that the 'file' field inherits from the parent items and the 'filepath' field can be omitted for callbacks within the same module. I will try to provide a full review as time permits.

However supporting this patch doesnt mean that we should stop thinking about a solution of inter-module API calls.

pwolanin’s picture

FYI, a version of this patch: http://drupal.org/node/137767 should go in soon (I hope) which will require a complete overhall of the menu system code you have.

Crell’s picture

@pwolanin: Yeah, I know I'm going to have to rewrite it again once that goes in. I don't want to have to rewrite it AGAIN beyond that, however. What do you think of the approach and code methodology? Good/bad/ugly?

Given the CPU and memory savings this offers, I don't think we can afford to not implement something like this in Drupal 6 if we want to get any serious performance boost. I'm wary of marking my own issue critical, though. :-)

pwolanin’s picture

@Crell - I think it would be very useful to have a standardized method for putting callbacks in separate files like this. Set it critical if you think it is- someone else can change it back if they disagree.

However, I think that not too much should be separated out. Admin pages, yes, certainly. Page callbacks or other functions that might be re-used by other modules might better stay in the main module file. Some suggestions/guidelines need to be documented too.

Crell’s picture

Priority: Normal » Critical

OK, I'm going to risk it and call this critical, but it still can't proceed until the multi-menu patch lands. (I'd rather rewrite this patch again than force you and chx to rewrite yours again. Yours is bigger. :-) )

And the "file path" key is intended for exactly that use. If you want to us a page callback that's in another module, you just specify the other module's path as the file path. See the patch in node and user modules, which do that. Utility and API functions at this stage should not be factored out. That's still a huge performance boost, though.

pwolanin’s picture

One thing to make sure of it that the file path is rebuilt when menu_rebuild is called (I'll look at the patch in more detail after the rewrite for multiple menus). Also, hard coding of the path should not be accepted, since otherwise everything will break if the module is moved.

Crell’s picture

Yes, the current version of this patch simply adds a field to the menu table that gets rebuild along with the rest of it in menu_rebuild(). The path, if specified, is arbitrary but the only sane way to do it is drupal_get_path('module', 'foo'). It's up to the module developer to not be an idiot about that. :-)

Is there anything I can do to help get the multi-menu patch in? This issue is rather stalled until that goes in.

pwolanin’s picture

I think there is an alternative for filenames: the filename can be set as something relative to the module file.

So, if you set in something like hook_menu:
"module" => "foo", "filename" => "foo_admin.inc"
OR
"module" => "foo", "filename" => "includes/foo_admin.inc"

Then when the system module (via menu.inc) needs to include a file it can do:
include_once(drupal_get_path('module', $item->module) .'/'. $item->filename);

Note, I think it's useful in this context to be able to set "module" in hook_menu - obviously it could default to the module implementing the hook.

Crell’s picture

pwlanin: That's sort of what the second version did with the magic naming. I decided it was better for now to go back to explicit modularization. It gives the module author complete control over how to organize things. The default value of file path is the root of the current module, which is determined introspectively in menu_rebuild().

$items['foo'] = array(
'file' => 'foo.pages.inc',
);
$items['foo'] = array(
'file' => 'system.pages.inc',
'file path' => drupal_get_path('module', 'system'),
);
$items['foo'] = array(
'file' => 'foo.pages.inc',
'file path' => drupal_get_path('module', 'foo') . '/pages',
);

This also, I believe, parallels the functionality in the new theming system, so it's lower mental overhead.

profix898’s picture

@Crell: That's sort of what the second version did with the magic naming
No, its not. As you explicitly specify the module, there is no 'magic' in pwolanin's idea! The only difference to your last patch is, that the developer doesnt need to call drupal_get_path('module', $item->module) himself. We would still have "complete control over how to organize things". We should eliminate this source of trouble where module developers can hardcode the full path (/modules/mymodule/...) instead of using drupal_get_path(). No more no less.

Crell’s picture

StatusFileSize
new11.76 KB

And here we are again, rerolled for the new menu code from chx and pwolanin. I only split off two callbacks this time (it's really boring cutting and pasting functions for each patch :-) ), but it's enough to show how it all works.

I've left the API as before, with file and "file path" keys rather than also adding a module key. The reason is simple: This parallels the theme registry structure, which means less mental overhead.

Are we getting close? Please? :-)

pwolanin’s picture

Minor comment- to be more similar to other .inc files, why not name system.admin.inc as system_admin.inc? For example, node module has content_types.inc.

Also, it seems that even for this initial example, more should go in the .inc file, like function system_admin_menu_block()?

Crell’s picture

merlin suggested the double-dot version for clarity, and because you then have the module name itself clearly defined. I personally have no strong preference, and will defer to whatever Dries/Steven/Gabor prefer here. (The code doesn't enforce a format; it's strictly convention.)

I left out splitting off more handlers for now simply to save time. system.module is still in flux, and spending an hour moving stuff around again for every patch is not a fun way to spend a Saturday night. :-) Once the mechanism is in I will start splitting up modules more completely. Again, I will defer to the committers for how much they want split in the mechanism patch. (See #41 for a more complete breakdown of system.module.)

owen barton’s picture

Subscribing - great patch :)

dries’s picture

+    if (! $item->access) {
+      return MENU_ACCESS_DENIED;
+    }
+    else {
+      if ($item->file) {
+        include_once($item->file);
+      }
+      return call_user_func_array($item->page_callback, $item->page_arguments);
+    }

Please turn this around:

if ($item->access) { // instead of !$item->access
}
else {
}

Other than that, this looks ready. (And yes, let us use dots, please -- it's consistent with database.mysql.inc etc.)

Crell’s picture

StatusFileSize
new11.4 KB

Reversed that code now is. Ready Dries says this patch is, but set RTBC myself I will not. </Yoda>

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me, though it probably means I'll have to do a manual merge with this patch to get more of the menu system working: http://drupal.org/node/145058

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed! :)

dries’s picture

Status: Fixed » Needs work

Marking this 'code needs work' until this has been properly documented in the handbook. Thanks guys! :-)

pwolanin’s picture

Oops - missed this one and another is system module I think.

  $items['node/add'] = array(
    'title' => 'Create content',
    'page callback' => 'system_admin_menu_block_page',
    'access callback' => '_node_add_access',
    'weight' => 1,
  );

I can add this to my menu.inc patch: http://drupal.org/node/145058, or do you want a separate patch?

pwolanin’s picture

StatusFileSize
new654 bytes

here's the patch for node module alone

dww’s picture

Category: task » bug

node/add dies with a white screen of death after this patch. next page refresh shows the following error:

warning: call_user_func_array() [function.call-user-func-array]: First argumented is expected to be a valid callback, 'system_admin_menu_block_page' was given in /Users/wright/drupal/cvs/6/core/includes/menu.inc on line 313.

pwolanin’s picture

Status: Needs work » Reviewed & tested by the community

@dww - see patch above.

dww’s picture

ah, ok, it wasn't clear that patch was for this problem. sorry about that. ;) just tested the patch and can confirm it fixes the bug. RTBC, indeed. thanks.

Crell’s picture

D'Oh! Figures I'd forget one. Thanks pwolanin.

Documentation here: http://drupal.org/node/146172

Wiki page to coordinate the refactoring effort here: http://groups.drupal.org/node/4179

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks for the patch and the documentation.

drewish’s picture

Status: Fixed » Active

this patch causes problems on MySQL 5 because it uses a default value for the file field preventing the table from being created.

drewish’s picture

Status: Active » Needs review
StatusFileSize
new1.04 KB

here's a patch for the .install file. it looks like the insert query always supplies a default value.

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Odd that it didn't turn up for me, since I run MySQL 5 at home. Patch applies fine and then Drupal installs cleanly. Thanks.

pwolanin’s picture

I think the MySQL 5 issue depends on how you run it- the error only occurs in STRICT mode, which is only on by default in some installs/OS's.

also, here: http://drupal.org/node/144765#comment-248367

Moshe says that require_once() is preferred to include_once() - it that worth changing? (I can add it to this patch: http://drupal.org/node/145058)

Crell’s picture

As written if the file doesn't exist, require_once() would give a fatal error on the require while include_once() would give a warning on the include and then I believe a warning on the call_user_func_array() call. Both are serious breakage from the user-perspective, but shouldn't be possible unless a module developer forgets a file.

The "cleanest" solution would be a function_exists() call and some pre-defined error message, but that's extra cycles on every page request and we usually don't try and protect against sloppy module authors. So I have no real preference between include and require. Go ahead and roll into the other menu patch if you feel it would be wise.

pwolanin’s picture

Status: Reviewed & tested by the community » Fixed

ok, added "require_once" to the patch here: http://drupal.org/node/145058

That patch deals with lots of critical menu functionality- reviews please!

pwolanin’s picture

Status: Fixed » Reviewed & tested by the community

setting back to RTBC- the patch to fix the .install still awaits.

dries’s picture

Status: Reviewed & tested by the community » Fixed

I've committed the .install fix.

Anonymous’s picture

Status: Fixed » Closed (fixed)
adventurer’s picture

Category: bug » support
Priority: Critical » Minor
Status: Closed (fixed) » Active

Perhaps I missunderstand something, but after going through the files in 5.1 and following the entire bootstrap process, I came to the conclusion that the greatest amount of memory, and page load time comes when "all" the modules are loaded so that drupal can create a map that links the URL to the FUNCTION that needs to be executed.

I started a post off in the forums about this, and was lead over to this project page.

I am very curious, or would like feedback (please understand I am new to drupal, and not an expert...just giving input here to both learn what is going on, as well as figure out if this would be possible, and reasonable to help Drupal be faster and use less resources) on this concept.

Rather then loading every single module that is "enabled" so that drupal can do it's thing and map the URLS to the functions, why couldn't the "enabling" of the module, or the "installing" of the module (ie...madule_name.install) take care of this...permanently. In this way there would be NO NEED to load any module other then module(s) needed to perform the work for the selected url.

Let me explain what I mean.

WHen a person goes the admin area, and to the module section and selects to "install" or "enable" a module, the module would have code in the module_name.install file that not only created the tables and so forth as they do now, but would also write to a very specific and dynamic table that is purely for the purpose of mapping the url to it's function as well as listing any other module, taxonomy, action, and so forth that it depends on.

Then during the bootstrap process, rather then loading up each and every module that is enable and installed, Drupal would simply go to the table look up the drupal path, select the appropriate function, and load all files (or the modules) that the main function being called would need.

Now I know that the way I explained leaves out a lot, like well, you wont always know what files need to be included so that wont work. I am simply asking for you all to look at the "concept"....it can be worked on from there.

But having one table that has the url and the callback function ALREADY mapped, would eliminate the need to load every module. Instead Drupal would know already exactly what function to call, what file it is in, what the path is, and what other modules it must load as it loads the main callback function as we would have already defined all this as an array in our .install file.

array(

'path' => 'mymodule',
'mapped to' => 'function_to_be_called'
'depends on' => array(
'second_module',
'third_module',
'taxonomy',
'included_file_name',
),
'whatever else we need in this table' => 'whatever'
);

Something of this nature would exist in the install file to write to the mapping table.

Then Drupal would only need to lookup the path and the function and whatever the callback depends on.

Now only the needed module, plus all modules that the callback needs to perform its operation is loaded, and nothing else is...very lightweight, very memory friendly, and allows for a much fast framework.

If menu.module is enabled, as the admin makes changes, or custamizations to the menu, again, it would be wrote to this dynamic table so that the new menu is already mapped out and no need to load every module. I have not played with this yet, I was just simply running through the code to see what is being done.

ALso since the menu hook is part of what determines access rights and such, these also could be written to the table so it can do a quick check to see if the person has access before it even loads the rest of the table, if not, no need to load the table at all, just display access denied as it is currently setup.

If the user has rights, do a querry for the url / function / dependencies )other modules it depends on....nodes etc.) then call the callack function, process the data, send it to the theme to be themed, return it back to the browser to be displayed.

I just want to know "why" this COULD WORK, not the reasons it can not work....or ways to MAKE IT WORK, rather then ways to not make it work. Eliminating extra modules being loaded would be a huge increase in performance since your not carrying all the extra weight around with you....it's all in a table that is changed as you enable / disable a module or when the system admin comes in and changes the menu itmes up a bit.

I just do not see a reason why Drupal MUST load every module to create the map, when the map could be created when the modules themselves are installed/enabled.

Secondly,
The table could be a "starting point". What I mean is drupal could come in an access the table, then IF it did not find what it needed, and ONLY IF it did not find what it needed, it then continues the normal bootsrap process by loading every module and creating an on the fly map for itself as it does now.....in other words it would default back to the normal bootstrap process if it does not find everything it needs.

Again, perhaps I missunderstand the bootstrapping a bit and just need to be educated...interested to see what you all have to say. Thanks.

Crell’s picture

Category: support » feature
Priority: Minor » Critical
Status: Active » Closed (fixed)

@adventurer:

1) Please do not reopen issues that are long-since closed. That only destroys the archive of how things came to be and makes continued discussion more difficult.

2) Please do not propose new features in threads that are already talking about a different feature. That only makes it harder for the developers to keep track of what they're working on.

3) Please do not change the information on the issue. This is neither minor nor a support request.

4) In short, please do not post in this issue or any other issue that has been closed. They are closed, meaning stop using them.

As to your suggestion itself, the problem is that modules can call other modules at any time in a dozen ways. That makes pre-calculating the required files quite difficult unless we have a good dynamic load-when-used system. See #21, #24, and #27 in this very issue for a proposed and rejected way of doing that. Also see the very first post, where I was originally planning to have multiple includes via menu_alter() but that was nixed in order to keep thing simple for now. It may be revisited later.

Again, this is not the proper place to bring up such an issue. It should be its own, new thread. Hijacking an issue thread is just as if not more impolite than hijacking an email thread, for many of the same reasons. Thank you.

adventurer’s picture

Crell,

I appolagize and didn't realize how things were setup here. I am sorry, feel free to close this.

sinasalek’s picture

Will be any back port for drupal 5?

dww’s picture

@sinasalek: No. This is a major API change to the menu system. Drupal 5 (and Drupal 6, at this point) are stable releases where no new features or API changes are added.