Closed (duplicate)
Project:
Drupal core
Version:
7.x-dev
Component:
base system
Priority:
Normal
Category:
Task
Assigned:
Reporter:
Created:
5 Feb 2007 at 18:07 UTC
Updated:
17 Feb 2008 at 04:14 UTC
Jump to comment: Most recent file
As I outlined on my post on the development list, I believe this work can ultimately:
1) reduce Drupal's footprint (memory use, time spent loading code)
2) increase the speed of theming.
Right now the current patch doesn't address theming. Also, as per chx, it's still handling the node_invoke hooks like normal hooks -- we both think they need their own system. (in fact, I think this patch breaks node hooks for multiple-node-types-per-module) so that will need to be addressed.
I need to put this down for the rest of the day but I'm putting up the patch for discussion and in case anybody wants to work on it.
| Comment | File | Size | Author |
|---|---|---|---|
| #11 | module_registry_1.patch | 45.13 KB | chx |
| #10 | module_registry_0.patch | 35.19 KB | chx |
| #7 | module_registry.patch | 41.81 KB | chx |
| #5 | registry_0.patch | 17.5 KB | chx |
| registry.patch | 20.14 KB | merlinofchaos |
Comments
Comment #1
merlinofchaos commentedHere is my dev post (and all the replies)
http://lists.drupal.org/archives/development/2007-01/msg00574.html
Comment #2
webchicksubscribing. This has been on my radar for awhie too.
Comment #3
merlinofchaos commentedchx' reply has particularly relevant info:
http://lists.drupal.org/archives/development/2007-01/msg00584.html
Comment #4
chx commentednor access nor form is a hook , node module code needs change.
Comment #5
chx commentedI did node changes.
Comment #6
chx commentedI moved a lot of things from module.inc to a new module_admin.inc because module.inc is loaded even for cached pages in the DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE phase. I have beaten the install system until it works with the new system, at the end of the day this mostly meant changing
module_invoketomodule_invoke_hook, while at it I changed a few more variable functions to a module_invoke_hook -- less code is better.All in all, Drupal installs, when you enable a new module, its install runs. I tested book, story and page creation and display and worked.
Comment #7
chx commentedAnd the patch.
Comment #8
kbahey commentedWhat performance gain do we expect from this? (low, medium, high)?
Comment #9
asimmonds commentedAttempting a fresh install with the patch in #7, I get:
Fatal error: Call to undefined function cache_get() in ./includes/module.inc on line 364
Comment #10
chx commentedI runned a million installs -- but , then my settings.php was filled in. Sorry! The new system does not rely on module_list to invoke things but on the registry so in the install.inc we need to fill in the registry cache with just enough information so that form API operates -- the elements hook. This let me remove the fixed_list from module_list because that's pretty much useless now. Yay, less code! I got warnings similar to http://drupal.org/node/131942 this, I fixed them.
Comment #11
chx commentedSize of patch tells me I forgot module_admin.inc
Comment #12
dmitrig01 commentedI benchmarked it. The results are not worth posting. They vary so much. Somtimes its faster, sometimes it's slower
Comment #13
moshe weitzman commentedsubscribe
Comment #14
dries commentedI assume we've yet to take advantage of this patch to make Drupal faster. I think it has potential, but it would be great if we could try to evaluate/estimate that potential.
The code looks good but I haven't tested it. My only gripe are the module_invoke_all_refN functions -- these are somewhat ugly. Can't we get rid of those? And if so, at what cost?
I'm going to let this patch develop a bit more ...
Comment #15
chx commentedThe function signature needs to contain a reference marker -- so in normal circumstances you can't get rid of module_invoke_all_refN . We could change to passing around an array of parameters and using extract EXTR_REFR but I doubt that'd be nice. The module_invoke_all_refN are well hidden from the user, so I do not see that as a big problem.
The big thing here is the 'file' key which is yet unused but it'll let us make Drupal smaller we all know that even with a code cache -- it takes a lot of time to load all Drupal code.
Comment #16
merlinofchaos commentedMy basic estimate is that between this patch and a similar patch to the menu system that allows hook_menu() to specify a file for menu hooks, we could break modules up into 2 or 3 files and reduce minimal Drupal load (of modules) by about 75%.
To take a wild stab at the kind of setup I would go with:
For a module named foo, I would structure it like this:
chx had suggested to me a level1/level2 type of thing, but the nodeapi case is what breaks it for me. Drupal's front page would then end up loading level2 on every module that implements nodeapi. Sadly that's most of them. But if we go with a setup like the above, even the front page would still have a drastic reduction in the amount of code invoked.
I have a brunch engagement today and after that, I am going to try to apply chx' current patch and break up at least some of the Drupal modules (at least the ones that are enabled by default) and do some benchmarking. However, I can state this with relative certainty:
Rasmus put up a slide showing that 59% of a Drupal page load comes during drupal_bootstrap. Now, this is only when you don't have something like APC running, but *even with* APC running I find drupal_bootstrap still takes 5-10% of our code time.
Let's say a page takes 250ms to load. 50% of that (to make my numbers easy) would be 125ms. If we can reduce that by 50% (Note that this is less than my estimate of 75% code reduction, to be conservative) then we can reduce that to roughly 60ms. 60ms is a 20% savings out of a 250ms page load. That's not just significant, that's major. If a page is doing very, very little, I think the savings would be bigger. A really, really great example of a page that does very little: for example, the private files system. Ideally, we want that page run time to be as low as possible. WE want to load core; we want to load system module's file system; and we want to load just enough module code to do whatever validation needs to be done on the file. And NOTHING else.
As to the public facing API, I would recommend this strategy:
Right now we have module_invoke; there is absolutely no reason that we couldn't just register our API as hooks and require everything to use module_invoke for the API.
All calls not from node.module to 'node_view' would then become module_invoke('node', 'view'). However, that does leave us a problem with references, unfortunately, so we must be careful about this. THis goes back to chx' discussion on the reference marker.
Comment #17
Crell commentedI'm all for splitting up files so that we don't load code we don't need, and I think merlin's breakdown is a not-unreasonable starting point. However, let's not forget that the volume of code is not the only problem with a full load. Hitting the disk for a file is very expensive. If we decrease the amount of code that has to be parsed per page load by 20%, but at the same time increase the number of files that need to be stat-ed on each load by 20%, that's a net loss. It's better to load a bit more code than you need and keep the file count down. Where the exact trade off point is, only benchmarks will really say for sure.
And yes, I am talking about sites without an opcode cache. Those are the sites that would benefit most from reducing the load time, since they have the stat/compile cycle on every load. opcode cached sites will still get a performance boost from less code, but normal sites will benefit from less code and less stat/compile. I recommend we target those sites when benchmarking this.
Comment #18
merlinofchaos commentedHopefully my proposal will add very little actual statting, because it won't stat files unless they're called upon.
include_once is very smart and won't stat a file. So I don't think this adds any additional stats at all.
Now, simply adding more files does add more stats, but the idea here is that only a handfull of additional files are loaded, since most of the files are meant to be 'specific purpose'. At worst, I think you'll get roughly 1 extra stat call, per module, which is much, much smaller than the amount of time spent loading and compiling the code. On a heavily used site, the stat information will be cached by the filesystem anyhow, because it will always be the same files statted over and over. In my experience, the problems with file stat occur mostly when the files you need to stat change over time.
Comment #19
merlinofchaos commentedWe don't necessarily need a patch to let the menu callback choose. We chould actually force the issue pretty easily, like so:
However, there are two problems with that:
1) right now I don't think the menu module actually keeps module information.
2) It won't work with forms or things like forms, where often drupal_get_form is the actual callback and then it calls through to a function. However, the solution there MIGHT be to try to create a generic callback pattern where all Drupal callbacks follow the same pattern: Either a string (the function) or an array which includes the filename and function -- this is similar to how these functions work naturally if using objects.
Comment #20
Crell commentedIn one of the menu issue threads, which I cannot find right now of course, I had suggested instead of having an $admin flag on a menu item having an include parameter, which was an array of files to load for that callback. It would look something vaguely like this:
somefile.inc would then include all of the functions involved in that form. The upside is that other modules could add to the include array in their hook_menu_alter if necessary. The downside is that it's non-automated and therefore different modules could potentially split things up differently. I suppose that could be a plus or a minus depending on how you look at it. Views UI and the Views Theme Wizard get renamed to .inc and otherwise don't really change, which is good, but some module authors could over-modularize their modules and balloon the stat calls.
I also think that foo_node.inc or foo.node would be preferable to foo_node.module, simply for readability. (Yes, foo.node. I did go there. :-) ).
Comment #21
merlinofchaos commentedI like that, though the actual syntax I have in my mind is:
Because it's easy enough for menu.inc to extrapolate the rest. Like the theme registry, it could also have the rarely used 'callback path' for those rare instances that your ifle won't be in your module directory. If we think that should even be allowed.
Comment #22
Crell commentedI can see one use for allowing out-of-module includes: Shared libraries and API modules. Eg, only load token module if the callback is going to care; only load some piece of XML-RPC if we're going to be doing an XML-RPC call. That would also require that the callback be allowed to specify multiple files to load (eg, the file where the form is defined and the XML-RPC library).
If we need both out-of-module includes and multiple includes, then I don't really like splitting the file and path into separate properties. Matching them up becomes problematic. Having the drupal_get_path() call in hook_menu is also better for performance since hook_menu is now called only rarely while anything in the menu dispatcher would be called on every single page load.
Of course, again we need to then be careful about splitting things too far and ballooning our stat calls. It also raises the question of whether this sort of optimization is something Drupal should be doing or the developer should be doing.
Since the degenerate case is just one big .module file, we should avoid stat-as-fallback to avoid excess IO and excess function_exists() calls. (They're probably not very expensive, but they add up.) My feeling is that we should let Drupal do the "easy" stuff (e.g., .node files) and let the developer explicitly say otherwise (e.g., put admin forms in a separate file). We don't want to have too many naming rules, or else we limit our flexibility in the future and end up with a sites directory that resembles a .jar file. :-)
Comment #23
merlinofchaos commentedCrell: I don't think a module should be loading another module's file as part of the callback. I think that's obscure and would lead to confusion. At the moment, I can't think of anything really valid, and for the edge cases that are, your callback can include the extraneous file directly. The only reason that theme functions are different was so that system.module could load theme functions in /includes on behalf of /include files. However, page callbacks shouldn't *ever* be directly in the /include, IMO, and if they are, they can use the format locale.module uses currently I think. (Which, honestly, we could do now, but it's a little ugly on a really wide-scale basis, IMO).
Also, the drupal_get_path would only happen on hook_menu -- it'd just happen that whatever calls module_invoke('menu') calls it, not the module itself, and doesn't use a function_exists() at all. That cleans up the syntax, and in the actual menu_execute_active_handler() call, the functionality is identical whether module_get_path() is invoked by your implementation of hook_menu or our implementation of menu_build_menu.
Comment #24
Crell commentedI agree that page callbacks themselves shouldn't be in /includes. I was talking more about shared libraries being there that would only sometimes be needed. pager.inc, tablesort.inc, xmlrpc.inc, etc. Presumably the page callback would know if it needs those.
If that's overdoing it, then yes we probably don't need to specify a path. We may still need multiple includes, however, so that value-add modules can value-add their includes as well. Eg, only include their form_alter_thisform() (assuming that patch goes in) function and additional validate and submit handlers for the page that has that form. I think that's a not unreasonable use case, if modules are going to be spinning out some forms to separate files anyway. Again, though, I don't know what the stat impact will be.
I'm just brainstorming at this point. :-)
Comment #25
webchickI'm going to attempt to summarize a 2 hour conversation in IRC last night:
My main concern was that forcing code separation would vastly increase the learning curve for new developers to create modules. While splitting code along .install files lines make sense, as these are cleanly separated as "db manipulation code," separating hooks out from .module based on their frequency of calling strikes me as error prone and confusing. Merlinofchaos assured me that this would be a purely *optional* optimization, for clean code separation and that helped a lot...
Merlinofchaos and myself proceeded to experiment by taking the functions of block, system, user, and comment module, and grouping them according to the method outlined in this issue:
http://drupal.pastebin.us/24562 (block)
http://drupal.pastebin.us/24563 (system)
http://drupal.pastebin.us/24569 (user)
http://drupal.pastebin.us/24575 (comment)
Page callback code, menu access callbacks, form handlers and such take up a *tremendous* proportion of the code in a module file. Separating these out instantly cuts the size of a module file by at least a half, often times more like 3/4 or more.
Some modules also have a disproportionate amount of 'admin' callback code vs. 'normal' page callback code. It might make sense to separate these out further into page-admin.incs or so.
My only concern is I would like to keep all hook implementations together (apart from _install and _update_N hooks, which go in .install files, and _load, _insert, _form, etc. node hooks in module.node.inc files... this way you can instantly tell before installing the module that a module provides its own node type(s)). Splitting them apart too much makes it difficult to tell what a module actually does unless you look in 4 different spots.
Comment #26
merlinofchaos commentedMostly I agree with webchick, though I see some very plausible benefits to splitting out 'block' code as well. I also see some plausible snags to doing that, so that's primarily just theory.
I do believe that the next step here need to be to modify this patch to support some form of 'callback file' system in hook_menu so that we can accomplish this goal easily, and then actually set up our system like this and do some benchmarking, both with and without APC.
Comment #27
Crell commentedFor what I was talking about with splitting menu handlers out into separate conditional files, see this issue: http://drupal.org/node/140218
Comment #28
chx commentedNeeds reroll and if someone calls a module_invoke on something which is not a registered a hook then we need to fall back to the current behaviour.
Comment #29
catchModule splits are in, but more to this than that, so bumping to D7.
Comment #30
fgmsubscribe
Comment #31
sinasalek commentedsubscribe
Comment #32
Crell commentedI think this thread has been supplanted by this one: http://drupal.org/node/221964