The main purpose of the registry and the D6 page split is to reduce memory usage so Drupal can run on shared hosts, however this has come at the cost of a lot of complexity (which only a handful of people actually understand), and it adds overhead to sites running APC.

Here's the Drupal front page, no nodes, with a primed registry cache and APC enabled:

Executed 31 queries in 40.87 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 126.73 ms.

Memory used at: devel_boot()=0.63 MB, devel_shutdown()=3.83 MB, PHP peak usage=4 MB.

Here it is immediately after TRUNCATE cache_registry;

Executed 63 queries in 604.18 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 1557.05 ms.

Memory used at: devel_boot()=0.58 MB, devel_shutdown()=3.83 MB, PHP peak usage=4.25 MB.

Benchmarks in the big split issue show there aren't dramatic improvements for non-opcode hosts either. Baseline Drupal core memory usage with all core modules enabled is well over 20MB with or without the patch - and it's the 16-24MB limit which really kills people on crappy shared hosts.

http://drupal.org/node/345118#comment-1608988
http://drupal.org/node/345118#comment-1619048
(many more benchmarks on the issue but these were easy to find because I did them - note I'm one of the primary proponents of that patch on that issue but I'm starting to change my mind).

Obviously if we remove the registry and page split, we will see worse memory usage on shared hosts. However, I think there's two ways around that:

We're already discussing adding file caching/boost to core - this would reduce overall memory usage on high traffic sites when apache processes don't have to hit PHP at all, not to mention it helps with database resources as well. I don't think file caching / boost would be more complex than the registry.

We could also make the default tarballs produced by the packaging script use bjaspan's much maligned autoload.sh script (see #445044: Try to remove the field autoload "feature"). This would massively decrease memory usage on shared hosting, since it not only deals with hooks, but also API functions (field_attach_load(), everything in common.inc etc.). That's likely to be the absolute best memory gain for sites which absolutely can't run APC, and doesn't add any complexity/overhead whatsoever to core itself.

I think we need to seriously discuss this, and try to compare the following scenarios:

1. HEAD

2. Registry + sun's split patch.

3. Core minus the registry and page split (just .module files - although I think we can compromise a bit and have module.registry.inc manually included since those really are loaded rarely, and split .install into .install and .update).

4. Core run through autoload.sh - note that you'd be able to swap in autoload.sh'd and non-autoload/sh'd file systems quite happily, because without the registry, Drupal won't need to care where the functions are actually located.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Status: Active » Needs work
FileSize
3.34 KB

Step one, run autoload.pl on all of core, and compare the results to HEAD and HEAD + page split.

Here's barry's script from http://drupal.org/node/445044#comment-1661472

Note, I'm not suggesting we ever ask anyone to run this manually, I'm suggesting that if it turns out to be the most efficient method, then we add it to the packaging script - if you download a tarball, you get the memory savings for free. I you're unlucky enough to be both downloading tarballs and trying to make patches from them then we provide non-autoload tarballs (either the -dev snapshots or an extra link somewhere).

catch’s picture

whoops, Barry posted a PHP version of the same script in the next comment, that's probably a better starting point for testing.

catch’s picture

OK so I did a very basic test by reverting http://drupal.org/node/445044#comment-1669250 to see how much difference it made:

With autoload:
Memory used at: devel_boot()=2.57 MB, devel_shutdown()=20.88 MB, PHP peak usage=21.25 MB.

Without autoload:
Memory used at: devel_boot()=2.57 MB, devel_shutdown()=20.75 MB, PHP peak usage=21.5 MB.

Not very useful. Could use someone with some regexp-fu to make a proof of concept which does a few more files like that, then we can compare to the module split.

catch’s picture

So looking a bit more at how the field API autoload used to work, to test this properly here's what I think we'd need:

add proper @defgroup in pretty much all core modules. Then have a script which does the following:

Finds every defgroup in every file, and pull those into a separate file per defgroup - generating node.access.inc, node.crud.inc, probably have @defgroup node_pages and @defgroup node_admin etc. etc.

Takes those files and creates a single node.autoload.inc for them, and then has node.module require that file.

That way core wouldn't do anything except have proper defgroups, all the nastiness would just be in the packaging script. Organizing .incs by defgroups would be a not too horrible situation for unlucky people who are messing about with their file system after downloading the 'optimized for shared hosts' tarball - since there's some logical grouping there.

Just to make it doubly, doubly clear, I don't have the (any) regexp ability to even make a proof of concept for this, so if you do, and you're interested in properly testing out the four options presented above, help appreciated.

catch’s picture

Here's a start on #3. I moved node.admin.inc, node.pages.inc, user.admin.inc, user.pages.inc and system.admin.inc back into the .module files - didn't make any other changes, did a registry_rebuild() and Drupal still worked.

For memory purposes this should give us a pretty good idea of where reversing the page split would take us to without APC- those three modules are the bulk of the code that was moved around in the first place.

Front page:
HEAD:
Memory used at: devel_boot()=2.56 MB, devel_shutdown()=18.74 MB, PHP peak usage=19 MB. - 

Revert patch:
Memory used at: devel_boot()=2.56 MB, devel_shutdown()=21 MB, PHP peak usage=21.25 MB.

I make that about 10% memory - 2MB out of 20M and a 250k patch.

Sun's patch at http://drupal.org/node/345118#comment-1670598 is 900K and gave about 5MB savings.

So the absolute best case with the page split + core fully optimized for the registry is about 7-8MB memory savings assuming just core - but from 25-30MB per process, down to about 20MB - i.e. a high baseline of memory usage - there's absolutely no way we can get core's base footprint below 16MB with the registry as it is now (maybe on some configurations, but not across the board). With 30-40 contrib modules installed, we might see another 5-7MB of savings (consider that system.module is the size of ten regular contrib modules just by itself). This impacts overall memory usage and the chance of fatal memory errors, but again, it doesn't lower our absolute baseline.

Assuming the autoload/defgroup based approach would let us move more code than the registry, we might squeeze 10MB maximum out of core, maybe a bit more (but if contrib has defgroups, you'd get automated savings there from the packaging script too with no other work from contrib maintainers required). Much the same effect, possibly more drastic savings (in return for some ugliness), but minus the overhead of the registry.

If the numbers turns out to be right, I think a decent compromise would be this:

# registry/info hooks go into module.registry.inc - we add a small bit of magic somewhere to load this file whenever a registry (theme, module) is being rebuilt. This is going to save a few thousand lines of code and would offset the baseline memory usage of removing the page split, but very simple to code.
# .install files are split into .install and .update - so that the installer doesn't have to include hundreds of update functions that will never be used.
# When you download a tarball, by default you get a split version generated using a version of barry's script. If you checkout from CVS you get what's in CVS, and we make it possible to download a 'clean' tarball as well.

bjaspan’s picture

subscribe

sun’s picture

subscribing

But -1 for removing the registry.

CorniI’s picture

subscribe

Crell’s picture

Subscribing and very -1. The page split in D6 was measured and found to be a 20% performance boost and 25% memory reduction (when not using an opcode cache), on a Drupal version that was overall slower than its predecessor. I have yet to see anyone question those benchmarks. Rolling that back gives us what advantage?

chx’s picture

This is not going to be absolutely on topic rather than just some thoughts.

I added node_rewrite_sql so many years ago. Just to show, this was the regexp originally
$query = preg_replace('/(SELECT.*)('.$nid_alias.'\.)?nid(.*FROM)/AUs', '\1'. $nid_to_select .'\3', $query);

That's not that bad. But it was broken and the fix brought us 4-5 years of fighting with DISTINCT. We (I?) probably should have done much smarter much earlier than trying to fix a horribly complex regexp.

We made two fundamental mistakes with the menu system. One, the ancient storage model we were using as recently as Drupal 5.0 was created for approx a hundred entries and that really did not bode well when Drupal sites exploded to having 1e6-1e7 magnitude pages. Two, the menu tasks were a great idea and the small problems with default tasks grew to a colossal !#$%^ when the seemingly small change of secondary tasks got introduced. To recap, when you are on user/1/edit and you navigate on to user/1/edit/foocategory then you to be able to navigate back so there will be two tabs, one being user/1/edit the other being the default category which point to the same page while storing this in a structure which is keyed by paths. This is only being fixed right now.

What of the registry. There are already and will be enough posts about the speeds and the gains and the other drawbacks, I am trying to give a higher level view. I wish I had such concrete problems as above -- I do not. In general, however, the registry works marvelous when your Drupal is booted up and repeatedly serves the same pages over and over -- which, let's face it, is what it does most of the time. However, when you install, update or test or very early in the bootstrap, we are facing problems from time to time. I am truly wondering whether we spend a lot of effort getting things working when there are not enough gains. From a recent issue it was clear that even Dries had absolutely no clue what the registry does or how it works. This is worrying too. It might have been a temporary lapse on his part but color me worried.

So, as the person who worked second most on the registry: +ε (never mess with a maths teacher :p ). I am not +1 but I am not -1 either rather I am very cautiously positive towards removing and I want to see a lot of numbers before I decide.

catch’s picture

Here's a bit more work on some numbers.

I took the latest from sun's page split patch, cut out everything not relating to system.module, and also the system.module hunks themselves - so this ends up with the same system.foo.inc files we get from the split patch, but all the functions left in system.module

Hitting the front page still works - because sun did a good job on that and we don't call any of the functions moved out on a normal request - i.e. no cannot redeclare errors or anything.

Then - I went through the 'removed' functions in system.module itself and rather than removing the split out functions, made them all autoloaders like

/**
 * Autoloader.
 */
function system_menu() {
  require_once 'system.build.inc';
  return _system_menu();
}

This is pretty close to what Barry's autoload.sh did, except a manual version just to get some quick numbers. The idea is to simulate option #4 from the opening post.

Baseline memory usage for HEAD is:
Memory used at: devel_boot()=2.6 MB, devel_shutdown()=18.61 MB, PHP peak usage=19 MB
- same as #5 which is good, means my laptop's not completely unreliable for this.

manual autoload.sh
Memory used at: devel_boot()=2.56 MB, devel_shutdown()=18.26 MB, PHP peak usage=18.5 MB.
About 500k of memory savings from system.module itself from around a 100k patch.

This is almost exactly the same amount of savings we get from the split patch itself (a 600K patch on that issue led to about 3MB memory savings vs a 100k patch for 500k - more than close enough to get an idea - works about about 500k for every 50kb of code moved) - it'd be a little bit less effective because we're leaving 4 lines per function in the .module file, but that has to be weighed against a few things:

1. API functions can't be moved around using the registry yet (functions like node_load(), field_*() aren't called on every page but need to be available on every page). defgroups in core and some version of autoload.sh in the packaging script could do this though - hence it being in field API in the first place. This means we'd at least get the same memory savings overall if not better.

2. The overhead of the registry itself.
On a front page load with no content, _registry_check_code() is the fourth most expensive function - called 12 times and with a self cost of 30ms (35ms total includes two cache_get() and require once, but most of the time is spent inside the function).

drupal_function_exists() also takes up 4ms of the page request (in self time), called 133 times. Both drupal_function_exists() and _registry_check_code() have to maintain static caches which take up at least some memory - this memory is going to be used whether your server runs APC or not - but you don't get any trade-off if you're running APC.

I don't personally think the autoload.sh in packaging scripts is a particularly nice idea - but it's a way to do the same sorts of memory and page execution time savings for non-APC hosts without any overhead (performance or code maintenance) in core - and it doesn't penalize sites running APC at all, which the registry does going by the numbers above.

Crell’s picture

If we re-introduce front-loading of files, that would reduce the _registry_check_code() calls, I suspect. That way the code we'll need will be available already, so we don't need to lazy-load it.

catch’s picture

Bit more data.

On admin/build/modules we do a full theme/modules/menu rebuild - on my system this equates to 55mb memory usage, 2.5s page execution time and about 700 queries.

I tried removing the registry_rebuild() and it only took off 1mb of memory - about about 80 queries - so not responsible for the worst of that - however that means our core baseline memory usage is 55MB, not 28MB - meaning any shared host is going to need at least that much in php.ini to avoid out of memory errors assuming they /ever/ visit admin/build/modules.

However, while the registry doesn't make that page much worse, it doesn't help there either - I checked node.pages.inc user.pages.inc and node.admin.inc by putting dpm(debug_backtrace()) at the top of each file and they're all loaded on admin/build/modules - this is due to the theme registry rebuild.

Removing the drupal_theme_rebuild() drops memory usage by 4MB with no APC, with APC it drops us 500k. This means that 3.5MB of the memory usage on admin/build/modules is due to extra files being included by the theme registry - this is probably most of the savings that the page split got us - meaning that in terms of absolute peak memory usage it's currently not reducing anything much at all (in fact it adds 1MB memory usage to that page and removes very little).

Just to summarize - there's two places where memory usage is important:

1. The maximum memory limit required to serve a page on your site, to avoid out of memory errors. If admin/build/modules remains the most memory intensive page on most sites then currently the registry is having no effect there or a small negative one.

2. The average memory used per apache process (which affects how many concurrent requests you can serve on the whole server) - on non-opcode hosts the registry obviously makes a decent improvement - as mentioned above, potentially bringing 28MB down to around 20MB (and roughly equivalant page execution time savings). However I think if we were to test an APC host with and without the registry it'd have a small but measurable negative impact there.

Crell’s picture

The registry should alleviate the need to have drupal_theme_rebuild() parse all files. It can just scan through the registry table for functions named theme_* rather than trying to parse files and do function_exists() or whatever it does.

28 MB to 20 MB is a non-small savings. If that's a consistent percentage difference rather than a constant difference, that's huge.

If the improvement on a non-APC site is large enough and the penalty on an APC site is small enough, I still think it's worth it. We have to keep Drupal functional on shared hosts, and sites running APC are more likely to be able to make custom tweaks or throw hardware at a problem than shared hosts are. (Naturally "large enough" and "small enough" are highly subject terms subject to interpretation.) The hit on APC sites seems to be mostly on cache-rebuild type operations, though, not on normal page views, isn't it?

catch’s picture

Well it's worse on cache rebuilds but there's a clear overhead on normal page views as well.

webgrind tells me that 7.23% of a request is spent in _registry_check_code() and 1.01% is spent in drupal_function_exists() - this is on the front page with no content at all. Add them up and that's 8.24% of a request.

In http://drupal.org/node/345118#comment-1619048 I got about 10% improvement in page execution times. Let's say the page split already gives us 20% compared to without based on Crell's benchmarks from last year. And we'll call the 8.5% 10% because a lot of pages will have more hooks fired than the front page with no content.

Here's a few scenarios - these are page execution times rather than memory usage, numbers are going to be different server to server and contrib module folder to contrib module folder but I think this is reasonable:

Non-APC site, HEAD.
20% improvement, 10% penalty = 10% improvement.

Non-APC site HEAD + page split latest patch:
30% improvement, 10% penalty = 20% improvement.

Non-APC site, no registry, but keeping the page split (i.e. we'd add the files argument back to hook_menu, or do some kind of naming convention thingy):
20% improvement, 0% penalty = 20% improvement.

APC site:
0% improvement, 10% penalty = 10% penalty

APC site without registry:
0% improvement - 0% penalty = 0%.

catch’s picture

FileSize
76.82 KB

webgrind screenshot. Again this is front page with one node, times are percentages.

andypost’s picture

On admin/build/modules we do a full theme/modules/menu rebuild - on my system this equates to 55mb memory usage, 2.5s page execution time and about 700 queries.

Is this without contrib?

This one kill most of shared hosts or we should manually ini_set('memory_limit', 268435456); or something to handle core+contrib

Suppose here is a memory leak as I dig into registry.inc - there's no load of all files only parse but maybe loaded resources stay in memory?

As I pointed before - I prefer 2 packages. At this moment I think that $menu['file'] is more flexible way to join all module's parts in one LOB. But maybe I'm wrong.

catch’s picture

That's without contrib, and the registry isn't responsible for it - most of it's in menu_rebuild() as far as I can see (add an early return to menu_rebuild() and hit the page for quick and dirty testing).

andypost’s picture

As I see menu_rebuild() invoke module_implements('menu') so it loads all modules in memory and IMO deprecates registry parsing that should works without loading files in memory :(

catch’s picture

When you need to get hook_menu() for all modules, you need to actually execute hook_menu. The memory usage is there on opcode sites as well, so it's not the loading of the files which is using the memory there. Either way this is an issue in itself so I've opened #499614: menu_rebuild() uses lots of memory.

andypost’s picture

my stats for D7 + all modules enables + devel (admin/build/modules)

eaccelerator enabled:
Memory used at: devel_boot()=0.27 MB, devel_shutdown()=6.67 MB, PHP peak usage=7.25 MB.

eaccelerator disabled:
Memory used at: devel_boot()=1.82 MB, devel_shutdown()=21.63 MB, PHP peak usage=22.25 MB.

with return; at menu_rebuild

eaccelerator enabled:
first run Memory used at: devel_boot()=1.64 MB, devel_shutdown()=18.57 MB, PHP peak usage=19.25 MB.
next run and others Memory used at: devel_boot()=0.26 MB, devel_shutdown()=2.86 MB, PHP peak usage=3.5 MB.

eaccelerator disabled:
Memory used at: devel_boot()=1.83 MB, devel_shutdown()=17.78 MB, PHP peak usage=18.25 MB.

Forget to say - this is a windows box (php 5.2.9-2 + apache 2.0.63)

RobLoach’s picture

The registry would help contrib a lot and I find that it actually helps the developer experience. For any hooks for a contrib module, you can stick them in mymodule.contribmodulename.inc, and the registry takes care of including the file when appropriate. It also seperates the code nicely so you know which hook is being called where. I'm sticking a big -1 for removing this. It's one of the great additions to Drupal 7 and I don't want to see it go away.

kbahey’s picture

Subscribe.

moshe weitzman’s picture

I don't know if registry should go or stay but i recoil at shipping 2 versions of drupal or shipping with code that splits your drupal for you (or unsplits).

mfer’s picture

So, I see some positives and negatives of the registry.

Positives:

  • Improved DX for separating code out. (see #22)
  • Better for memory on shared hosts where a large part of our install base is.

Negatives:

  • There is a hit to sites running APC.

A lot of drupal sites are running on shared hosts and we need to keep that up. It seems this is a place where we can't make everyone happy.

Would it be possible to make the registry system plugable so we can ship it with a system that's optimized for shared hosts and have a contrib module providing a system better for apc setups?

catch’s picture

Discussed trying to make the registry (somewhat) optional for hosts with APC. The basic idea is have a variable which you set in settings.php, then in core, we don't fetch registry caches and use a 'dumb' version of module_implements(), then in contrib, you'd have a script which agglomerates everything into one big file (or alternatively a module which does require_once() on everything in files[] in hook_boot()).

Here's a very early draft of the first bit, however, the combination of a dumb module_implements() plus the overhead we have with drupal_function_exists()/_registry_check_code() obviously makes things slower out of the box, so we'd need the other half to see if it's viable at all.

module_implements_no_registry() is more or less a copy and paste of the D6 version. It's possible we'd need alternate versions of module_hook() and/or drupal_function_exists() too.

Damien Tournoud’s picture

If we are seriously considering alternatives to the registry, I can only suggest we replace it with dumb class wrappers. Those wrappers could be loaded based on naming conventions, or a hook_autoload_info() as defined by the autoload module.

I think it doesn't make any sense to consider putting everything back in one file. The performance difference would be negligible, while it obviously harm DX and the readability of the code.

catch’s picture

@Damien, it's my understanding that a single big file saves a lot of time from dynamic includes and also having a single opcode to consult - but I've not tested it - that's something which could only be done in contrib though - but we'd still need a dump module_implements() to be able to do it.

Crell’s picture

The time saved by loading fewer files is very OS dependent. On most modern OSes that gets cached anyway so it's not as big a difference as it used to be. (Which hosts are running a "modern OS", of course, is difficult to say.)

Even if we consider not leveraging the registry for functions, PHP's native support for class overloading makes still using it there a 100% no-brainer to me. Bringing back front-loading of files, which the registry had originally, can then avoid nearly all autoload triggers.

I am still -1 on pseudo-classes. Rather, there are plenty of places where we could/should use a class anyway for its own reasons, and the cleaner autoloading is a nice bonus.

I really feel it important to reiterate that even with Drupal 6 *not* loading every file in a module on every page load (page callback files, most views code, etc.) it's already a horrific memory hog. "Load it all and be done with it" is simply not an option on a non-opcode server, and probably not even on an opcode-cache-using server.

catch’s picture

Crell, I don't see how it'd make any difference on a server running an opcode cache? Because even trying to push things very far in the registry's favour on the module split issue, I could barely get measurable savings, and that was immediately after an apache restart before loading any admin pages etc. Note that any savings would only be in the size of the APC cache itself (if I don't ever load module.bot.inc on a site without bot.module installed), not for apache processes. However, with lots of modules installed, both the size of registry entries, and the number of drupal_function_exists() calls are going to go up considerably (not that a dumb module_implements() wouldn't have to foreach through bigger arrays too, but it's precisely this sort of data I wish we had available, and we currently have zero on that front).

Could you point me to where the front-loading was removed? That seems worth looking into.

I agree on pseudo-classes though - if we're going to have hacks, lets just do autoload.sh in the packaging script and relegate the hacks to tarballs rather than adding ugliness to the actual code base.

Crell’s picture

I honestly do not know the memory architecture of APC well enough to say; just that I've heard mixed things about how it allocates and reports memory between processes, which is enough for me to say that there may be an impact, or may not. I do not actually know. If the actual answer is that "memory-wise APC is fine no matter what we do", I'm cool with that. For a non-APC site, though, our memory issues are critical.

Front loading was removed in the module_implements() refactoring patch that chx and, I think, justinrandall were working on a few months ago. I don't have an issue number off hand. Instead the registry caches what files to include for each hook implementation and just loads those, then iterates just those function names rather than every $module_$hook possibility. While an improvement, I think it is actually bad to remove the front loading because we have a lot of indirect code that is not hooks.

catch’s picture

FileSize
1.74 KB

Forgot to post the patch - this adds back the foreaching over every module/hook possibility in module_implements (when you set a variable), but doesn't front load the files - it's just for comparison/testing purposes, not a proposal - at the moment it adds quite a bit of overhead to core - so I don't see how removing the module_implements() changes chx and justin randell worked on would make things any faster.

edit: obviously it doesn't make the registry optional, but it removes one of the registry cache_set()/cache_get() queries.

catch’s picture

FileSize
317.52 KB

And this is why - webgrind screenshot of the front page with that patch applied and $conf['registry_disable'] = FALSE. drupal_function_exists() shows up as the third most expensive function (since we're doing drupal_function_exists() for every module/hook combination) - and most of that is in self time.

Crell’s picture

catch, that webgrind screen is with the patch in #32 applied, which rolls back the module_implements() caching? Sounds like having that is still good, but that doesn't mean the front caching wouldn't be bad. They're not mutually exclusive, and the module_implements() caching is only applicable for hooks. It doesn't help with callbacks, classes, or theme functions.

catch’s picture

That's right on the #32 patch. Do you have a link to the patch where the 'front caching' patch is, I think I remember it, but need to actually see what was removed.

Crell’s picture

The front caching was part of the original registry patch, committed on 6 May 2008. It wasn't a separate issue.

If you are gunning to try and reimplement it, it would involve recording every file that gets included in _registry_check_code() and then on exit recording that list to the cache table, keyed by the menu router item. If it hasn't changed, do nothing. If it has, build up an ever-increasing list of files to include. The net result is that we probably end up including a bit more code than we need on a given page request, but after the first request or three we almost never have to include anything after the initial load.

To get a proper test out of it, you'll need lots of theme functions, callback functions, and classes, since hook implementations are now handled by module_implements(). (I fear what will happen without front caching when we add Views to the mix, with its complex class hierarchy.) Exactly how those two will interact I am not entirely sure. chx or justinrandell could say better.

catch’s picture

FileSize
944 bytes

Needs benchmarking / profiling.

Wondering if this would speed drupal_function_exists() up a bit.

catch’s picture

Split last patch to a new issue #523358: Optimize drupal_function_exists().

JacobSingh’s picture

Very much in support of this. Will review this week.

Anonymous’s picture

here are the droids you're looking for: #325665: Cache registry lookups.

at the time this went in, it sped up drupal, as shown by catch here:

http://drupal.org/node/325665#comment-1102578

the main issue with pre-loading code is just that its hard to figure out how to cache it. a menu-router based key is hard, because sometimes a simple node/$nid page follows a cache clear, and therefore loads a bunch of extra code to rebuild caches.

catch’s picture

Ahh, /that/ issue. Well yes that one cut out a lot of database queries, so reverting that would be a nightmare. If we accept my 8% numbers here as correct (which we shouldn't since they're likely to be absolutely a bit higher, if potentially relatively lower compared to D6 module.inc, and it's a very hard thing to quantify except possibly for going back in time to the original registry patch and trying to benchmark before and after that), then reverting that page takes us to well over 10%, closer to 15% in terms of cost.

I don't know about trying to have both the path based and current HEAD caches together - might be interesting to try - but that's an extra database query on every page in addition to more memory usage just to tread water.

Crell’s picture

I had been working on a patch to flag certain files as pre-cacheable and some not, to avoid the issue mentioned in #40. When the alternate caching went in and the front-caching was ripped out, I won't fixed it as it did not apply anymore at the time. I still think that's a useful approach, though, at minimum for class autoloads as those will never be caught by the module_implements() cache.

Anonymous’s picture

Subscribe

catch’s picture

The drupal_function_exists() patch is in, which actually cuts a fair bit out of the page execution time issues with it I'd found before, not all, but some. Also just posted a patch which needs testing/discussion at #532010: Use require instead of require_once in _registry_check_code() for registry_check_code(). Given those, and the fact that Berdir found the old module_implements() taking around 8% of execution time on a D6 site (with 100+ active modules though), I'm less sure about the PHP overhead arguments here. There's still plenty of angles which need investigating though, but I'm not going to have much time over the next week or so to do so.

Anonymous’s picture

Crell, i'm interested in helping with adding the front-load back if we can find a way to make it sane.

do you have any (even broken or half-baked) code to share?

Owen Barton’s picture

Subscribe

pwolanin’s picture

my thoughts:

remove the registry except for the bit that auto-loads classes/interfaces. Probably can use a simple regex instead of tokenizing.

consider whether there are benefits to retaining a registry (built on the fly?) of which modules implement which hooks.

pwolanin’s picture

In e-mail Jacob suggests we shoudl jst rm the auto-loading too for simplicity. I think removing auto-loading would require some reworking of
simpletest. Otherwise I think it's minor either way. We are currently trying to use drupal variable to hold a class name in some
factory methods. They can always be key => value with class and filename.

Crell’s picture

There is no simplicity to be gained by removing the class autoloading. If anything, there's complexity to be added because then we have to manually track not only class location but class hierarchy. That is, we need to do drupal_class_exists() and such like we do now for functions, only it will be uglier because we have to deal with class parentage. Views is forced to do that now, and it's quite ugly and prone to fatal errors if you're not careful. If anything, we need to bring back the front-loading of classes to make it faster/cleaner, not remove it.

Unless of course you want to just load all of Drupal's code on every single page load. Class autoloading is a PHP language feature intended to solve this exact problem (performance and memory). To not make use of it is quite simply short-sighted and naive.

Really, I put this out to anyone suggesting we remove the registry, rather than try to further optimize it: Find me another approach that gives us at least a 10% improvement in performance and manages to keep Drupal's memory usage low enough that Drupal doesn't mandate the use of an opcode cache for a reasonably complex site. In the past year, no one has come forward with a way to address our rather drastic memory issues other than "load less code". If you have a better idea, speak now or forever hold your peace.

And "just use an opcode cache" is not an answer.

JacobSingh’s picture

This is part of an email I sent to an Acquia list describing what a scenario which happened yesterday:

The only way to make the current registry smart enough to not bail on normal developer behavior is to trap all require errors and undefined function errors and try to reload / compensate as they happen.

Another example of registry naiveté we found the other day:

We had a class called Palette which used to be a standalone class. It was in palette.inc which was referenced in themebuilder_compiler.info like this:

file[] = palette.inc

Okay, that's kinda handy, because we never have to include it, even if a bit opaque, but then, Katherine made a check in:

A inc/colorset.inc
M palette.inc
M themebuilder_compiler.info

She changed the signature of
- class Palette {
+ class Palette extends Colorset

and in themebuilder_compiler.info
+ file[] = inc/colorset.inc

Okay, so this works great with the registry, and palette.inc does not have to include inc/colorset.inc. However, it means that until the registry gets rebuilt, I get a fatal error on every page as palette.inc is included on every page, and the class it is extending does not exist in the included files.

So I can't even go to admin/structure/modules to reset it. For anyone not interested in editing the registry table in MySQL, this means a re-install.

So the only way to solve this is:

a). Build a recovery script which stays outside of the registry which can reload it in some intelligent way.
b). Change the error handler to catch these errors, know every possible include related error and rebuild the registry, and then try again. This is also problematic because a handling the error will screw up the page flow requiring us to essentially resubmit the request which can cause data integrity issues.

The problem of loading too much code is endemic in plugin based architectures. I don't see how it can be avoided because plugins need to be available to respond to events. The only option would be voluntary disclosure of which events(hooks) the plugin wants to implement, and run-time inclusion. But even that has performance implications AFAIK. It's also a pain the ass. What I mean is, in the info file: hooks[] = nodeapi;

The clear path forward I think is backwards.

I'm also against the autoloading facility. I know it makes some things simpler and I like it for performance reasons, but at the cost of complexity for incoming developers, and this is NOT worth it. Drupal is already a pain to learn because there are so many anonymous array structures not connected to error handling and control based on naming schemes. If I use type instead of #type, I get some bizzare error 5000 lines of code later, etc..

We need less mystery meat APIs, not more behind the scenes voodoo, and until it is very clear how it finds the files to include, and it actually works without hosing the site, I think it hurts more than helps.
----

To respond to Crell's last point about an autoloader being there for this purpose. I agree, and I also think it should only be used for classes, and I think classes should maintain namespacing with directories and files (ala PEAR) to make this simpler. For functions, it's just not practical IMO.

Don't get me wrong, I'm all for the cause of making Drupal faster and more accessible to small hosts. But I think what we've got now is untenable for module development and has no obvious path to viability.

If someone can answer Crell's challenge, I think it would be great. I for one don't claim to.

-J

catch’s picture

If you look at my rough calculations in #15, the most performant and simplest method for even non-APC sites is the D6 page split - since it has most of the memory savings, without any of the overhead of the registry. If you take the gains from sun's 900k moving code around patch, and subtract the overhead of the registry, you pretty much end up back at D6 in terms of page execution time.

Damien Tournoud’s picture

It would be naive at this point to think we could simply rollback the registry, especially without drafting a way forward first. The registry does two things:

  1. lazy-loading of files, via drupal_function_exists(), and
  2. auto-loading of class and interface hierarchies, via the SPL autoloader.

Those are two radically different things. While I happen to think we could easily live without (1), (2) is very important for us to go forward.

One way of accomodating that, is to go back to the hook based approach that Crell drafted in the autoload module. Something like that could work:

/**
 * Implementation of hook_autoload_info().
 */
function example_autoload_info() {
  return array(
    'functions' => array(
      'example_admin_*' => 'example.admin.inc',
      'example_page_*' => 'example.page.inc',
    ),
    'classes' => array(
      'ExampleClass' => 'example.classes.inc',
      'ExampleInterface' => 'example.classes.inc',
    ),
  );
}

Note that it cannot be based on the information in the .info, or we go back to the issue described in length by Jacob in #50. And no, we don't want to parse all .info files at each request. No, we don't.

catch’s picture

For me it's drupal_function_exists() which looks to be the most expensive - probably because we simply have more hooks than classes of course - micro-benchmarks show it to be 3 times more expensive than function_exists() and that's when doing function_exists('count') vs. drupal_function_exists('count') (i.e. no actual autoloading going on).

Something like Damien's draft looks OK to me, my main interest would be less stuff to do just to check if a function is loaded, which most are 90% of the time anyway, at least in HEAD, and smaller cache entries (and less of them).

Crell’s picture

Making the registry more fault-tolerant I am all for. There's already an issue to help address Jacob's issue: #532010: Use require instead of require_once in _registry_check_code(). I think an "if that fails for some reason, rescan the registry and try again once" is a reasonable attempt to make. chx has also proposed a rescan.php file (or similar) that would rebuild all of Drupal's semi-caches; menu, theme, registry, etc. There's some sort of lock down logic needed there, of course, for security but I think that is solvable. A flag in settings.php would work as well.

The naive name==directory class pseudo-namespace structure taken by PEAR or ZF is actually entirely incompatible with Drupal's approach to modules. I've had that debate with some of the more "traditional" framework folks, too, who want to make that a de jure standard for PHP 5.3 with namespaces.

While we could replace the registry with a manual hook a la the autoload module, that is an enormous burden on module developers. With the registry, as a worst case you kick Drupal really hard in the right place and it re-finds all of your code. With a manual approach, you have to maintain a by-hand index of every piece of code in your module, and we have yet another gigantic static cache array we have to reserve memory for. Even if we only did that for classes, think of a module like Views. That would be a horrifically large hook to have to maintain.

Plus, such a naive approach would result in dozens of chained autoloads on deep class nesting. Imagine class A extends class B extends class C extends class D. If you create a new instance of A, that will trigger an autoload on A, which will in turn trigger an autoload on B, which in turn triggers an autoload on C, which in turn triggers an autoload on D. With proper caching and front-loading in the registry (which I still feel we need to re-add for this reason), we can eliminate all four autoload triggers or at least cut it down to one. I am not confident that an entirely manual approach will be able to do that.

Crell’s picture

Sorry, correction, this is the issue I meant: #540402: Use include_once instead of require_once in registry

chx, I am reopening that issue. It's a perfectly straightforward and simple fix that doesn't break a recovery script.

Frando’s picture

#534594: [meta] system info cache writing and registry rebuilding is broken without a full bootstrap has a recovery script that works for rebuilding the registry in situations like described by Jacob in #50.

JacobSingh’s picture

I don't understand how you using include instead of require is going to help the situation I described.

I don't think there is a way to "autofix" this. Also, using include instead of require is faulty. Say I upgrade my site and it is a busy site. People are submitting forms all the time. If I cause a registry problem, which causes registry to somehow know it is broken and then rebuild itself, it will cause the form submission which triggered this to re-submit itself I suppose. This could cause data integrity issues. Further, regsitry rebuilding take a long time. So does that mean you will lock all users out of the site while rebuilding happens? Or will we get into a race condition where 100 processes are all trying to rebuild it at once.

I agree w/ Crell that DamZ's proposal is too much of a burden on developers.

Here's another idea:

Keep Drupal using easy to understand PHP with a path of include statements you can follow.

Implement the page split and other performance improvements

Define what our benchmark is for $10 hosting explicitly and test it. Maybe you can run a site with 1 admin and up to 10 concurrent authenticated users who get logged off after 10 min. Anything more than that, and you need more h/w. Is that so bad?

The answer really is "get an opcode cache" IMO, and you can get one on shared hosting now for about $20-$25/mo. Hell, I bet it's a good business to get into offering it for $15 and someone will do it.

In conclusion:

Scaling is important, being fast under all conditions for all price points is impossible. This does not increase our scalability (because has little / no benefits over a proper APC cache), hurts deployment (because new files / removed / moved files can not be taken care of safely) and destroys the developer experience (for the obvious frustrations myself, everyone on my team, and others in the community have voiced).

Can we focus on making it optional at least? So that it can be turned off for sites in development, sites with APC and sites which have to do hot deployments in mission critical environments can disable it?

Crell’s picture

Jacob, user count is not the issue. A site with one single user per hour that has Views, CCK, a couple of multimedia modules, Panels, and Ubertcart installed (which is not at all an unusual setup, even for a low-traffic site) is going to crash and burn from hitting the PHP memory limit. It's going to spend more time just loading and parsing code it never uses than it will on database traffic, making a single page load for a single user longer than it has any right to. That is the problem. Yes, I have run into that already in Drupal 6.

You may think that's not an issue, but it is. If the low-end and mid-range markets can't use Drupal, they'll find an alternative (WordPress, Joomla, Plone, SharePoint... We do have competition). And then rather than switching to Drupal when they have the budget for a dedicated server and opcode cache, they'll stay with those alternative systems and put resources into that project instead. We'll have lost users and developers forever. I do not want Drupal to be a high-end-niche platform. Those are the platforms that die.

An an opcode cache is not a silver bullet, either. Even on a well-tricked-out dedicated box, if you burn 50 MB per page request on a 2 GB server, you can serve a maximum of 40 requests at once. (Actually less since the OS, DB server, Apache itself, etc. will eat up memory as well.)

However, per-hook-manual-registry (like we have in Drupal 6 for theme and menu systems) is no better than one giant autoload hook. You still have to manually specify every function or class in your module. You're just specifying it in 50 places instead of one. This is still not an improvement.

Even if we went with the otherwise stupid plan of "make everything OO so that we can use PHP autoload", we'd still need the registry to find classes in the first place.

Switching to include_once() is to make the system more fault-tolerant so that it doesn't WSOD. That's separate from a self-correcting system, which as you correctly note is non-trivial to write but is still potentially valuable to have if we can get it right.

catch’s picture

in my testing, I've seen no evidence at all that the registry will avoid a site hitting the PHP memory limit - we use over 32MB in D7 with the registry on certain pages, and that's just core. The 2-6MB savings on 30-40 don't go anywhere near far enough to actually make a difference to PHP memory limits - the most expensive pages (like modules page submit), have to do menu and other rebuilds all at the same time, including the registry, where it's unlikely to make much of a difference at all.

So that only leaves actual performance improvements as a tangible benefit. While the registry can shave 28MB per apache process down to about 24MB with sun's module split patch - APC shaves the same page down to about 6-8MB in my testing - the page execution time, in the benchmarks I've done, look like a no-op once registry overhead is taken into account, compared to what we already have in D6.

I don't see what opcode caches not being a silver bullet have to do with this - there's no memory reduction for opcode caches at all here, and if anything, having more files to include is likely to lead to more problems. I've seen references elsewhere to more memory fragmentation the more files you have, and opcode caches being more efficient the less opcodes they have to maintain. There's plenty of articles about performance issues on APC with require_once as well, the usage of which we'll be increasing, see http://www.techyouruniverse.com/software/php-performance-tip-require-ver... for just one example.

I think we should explore keeping only the class autoloading, and having module_implements() maintain it's own cache for hooks after doing something like:

$hook = $module . '_node_insert';
if (function_exists($hook) || (file_exists($module.node.inc) && require_once "$module.node.inc"  && function_exists($hook))) {
}

to check if the hook implementation exists in the first place.

Then have module_implements() maintain a cache of hook implementations it finds rather than the tokenizing. That would save looping through 100 modules to find one hook implementation on every request, which is as far as I can see the only benefit to sites on remotely decent hosting at all.

Oh and here's that $15/month shared hosting with APC - http://www.dreamhost.com/hosting-vps.html According to webchick, Dreamhost also has a 96MB memory limit by default on shared hosting plans.

chx’s picture

I am now more for removal. Class autoloading is not that hard, just have a hook in .module that tells about your classes. There are not many. The whole of includes has 48 classes including DBTNG, file etc.

Crell’s picture

Core has 48. Views alone adds a few dozen. My understanding is Panels wants to move in the direction of Views and use more classes. There are plenty of other contrib modules that use classes. Do you really think Views needs another 500-line hook? It has plenty already. :-)

Anonymous’s picture

From what I've read, the general feeling seems to be that the page split and interface/class registry is a good idea, but that having a registry for functions make development quite unpleasant (at least that's my own experience.)

Attached is a patch that removes the function portion of the registry (and restores the 'file' => 'whatever.inc' mechanism previously used in hook_menu. It leaves the autoload for interfaces and class, as well as leaves the page split.

Anonymous’s picture

Status: Needs work » Needs review
pwolanin’s picture

Status: Needs review » Needs work

Looks like a great start - my only thought was that ideally we'll be able to do away with tokenizing the files in favor of something faster.

catch’s picture

Status: Needs work » Needs review

Let's see what the test bot says.

catch’s picture

Title: Remove the registry and page split » Remove the registry (for functions)

annnd fix the title.

Status: Needs review » Needs work

The last submitted patch failed testing.

RobLoach’s picture

Is it just me, or does this look ridiculous....

function drupal_function_exists($function) {
  return function_exists($function);
}

I assume there will be follow up patches?

Frando’s picture

So what exactly does it buy us to remove the registry (for functions)? I don't think the "drupal_function_exists() is more expensive than function_exists()" can really outweigh the potential performace increase for non-opcode sites. Thing is, all of this stuff is really hard to benchmark, especially without contrib. The registry will have a *much* bigger effect with an actual site that uses lots of contrib modules. Think of a common site that has ubercart, views and panels - here we can really save a lot of code parsing (e.g. just think of all the ubercart code that is only needed in the cart section of a site).

I'm all for adding a recovery script to core, that's why I opened #534594: [meta] system info cache writing and registry rebuilding is broken without a full bootstrap. With something like this added, I don't see why we should remove the registry - we're discussing benchmarks vs. benchmarks, but unfortunately withoug actual, proper benchmarks. Such benchmarks are admittetly hard to do, especially as we don't have many contrib modules for D7 yet. But still, if the reason for removal is the claim "On a D7 site, the penalty by drupal_function_exists() vs. function_exists() outweighs the potential memory savings by parsing a lot less code", then I am very unsure if it's wise to proceed with removing the registry.

catch’s picture

@Frando, as much as performance, there's the fragility of relying on a cache to know if you have files or functions available - which means adding new hooks, removing them etc. can cause your site to break from being completely unrecoverable to simply silently failing to add the new hook. Also, IMO /any/ overhead we add to sites running on $15-20/month hosting (low-end VPS/high-end shared hosting) to make things easier for $5/month hosting - unless it was an absolute magic bullet in terms of performance and memory (i.e. 50% savings instead of 10%), is too much.

catch’s picture

Status: Needs work » Needs review
FileSize
125.38 KB

This doesn't fix all the fails, but it does fix an exception on nearly every page.

mfer’s picture

For reference, after talking in IRC we want to keep class/interface autoloading. You can read a simple example of this at http://www.brandonsavage.net/making-life-better-with-the-spl-autoloader/ or see it in a drupal module with a hook at http://drupal.org/project/autoload

This is a feature in php5 that we should use.

So, the efforts to remove the registry should focus on functions at this point. If anything, change how the classes are defined and/or how the information is stored.

catch’s picture

Also, since we at the moment have a lot less classes than hook implementations in core (and in contrib too), any overhead of autoloading a class is likely to be very, very tiny - since we'll have small cache entries and not be doing it dozens of times on a page. Also, we should do here what we didn't do when committing the registry, benchmark it very, very carefully.

Status: Needs review » Needs work

The last submitted patch failed testing.

David_Rothstein’s picture

But still, if the reason for removal is the claim "On a D7 site, the penalty by drupal_function_exists() vs. function_exists() outweighs the potential memory savings by parsing a lot less code", then I am very unsure if it's wise to proceed with removing the registry.

Given that the patch removes the registry but leaves the page split intact, it certainly seems like it shouldn't increase the amount of code parsing/memory usage at all (or at least if it did, it could be easily rewritten not to do so). Or am I not understanding something fundamental here?

Agreed with @pwolanin that it would be nice to remove the tokenizing -- though it's a small point in the grand scheme of things, it makes the Drupal 7 installation painfully slow to endure. I guess that would mean exploring the other options mentioned above for class/interface autoloading.

@catch: I think your latest version of the patch has some other unrelated changes mixed in by mistake?

Frando’s picture

@David_Rothstein: Well, up to now, we haven't optimized Drupal core to actually make use of the registry in terms of parse savings. If we did, we would e.g. move away all the registry style hooks. And contrib would do the same. That's where the parse savings would come from. Crell did some very compelling benchmarks a while a go on Drupal 6 (see http://www.garfieldtech.com/blog/benchmark-page-split).

pwolanin’s picture

@Frando - there is an easy solution - move all the hooks to something like modulename.hooks.inc.

module_invoke_all() can load this file if it can't find your hook implementation - this is a trivial way to get uneeded code (e.g. hook_menu, hook_actions, etc) out of .module.

catch’s picture

The most recent benchmarks are on #345118: Performance: Split .module files by moving hooks. They're not bad, 10-20% on page execution time and memory iirc, but 1. they won't stop you hitting memory limits (which is the only place you really notice high memory usage on shared hosting, no-one's running real benchmarks on a real shared host) 2. they're nowhere near the gains you get from installing an opcode cache - which doesn't require the 4-5 people who actually understand the registry to maintain it indefinitely, with all the fragility and special casing it's introduced.

It also wouldn't be too difficult to keep the module.hooks.inc idea for hook_menu(), hook_theme(), hook_entity_info() and other cached info hooks, and look for that when rebuilding those caches.

David_Rothstein’s picture

Even simpler than modulename.hooks.inc, anyone is always free to do something like the following, right?

/**
 * Implement hook_menu().
 */
function mymodule_menu() {
  require_once 'somefile.inc';
  return _mymodule_menu();
}

(where _mymodule_menu() is defined in somefile.inc and contains a massive block of code)

Ironically, that is the pattern currently used by the registry system itself....

function registry_rebuild() {
  require_once DRUPAL_ROOT . '/includes/registry.inc';
  _registry_rebuild();
}

Although certainly the idea of auto-loading a modulename.hooks.inc file is pretty simple too, and worth exploring.

catch’s picture

Status: Needs work » Needs review
FileSize
83.44 KB

Slightly less broken patch:

obvious fails in case anyone wants to jump in:

Batch API within simpletest is broken (I can run tests via the UI though)
Some forms look to be broken in simpletest, but not in the UI.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

HEAD changed from under me and I won't have time to work on this more for a day or two. Here's where I got up to:

module_invoke_all('boot') fails because bootstrap_invoke_all() was removed, along with the 'bootstrap' column in the system table and relevant code from module_list().

This would be an obvious case for retaining some sort of module_implements() cache here, so we fetch that cache here, instead of querying the system table on most requests, but alternatively we need to revert those changes to bootstrap and module.inc.

JacobSingh’s picture

The only way to speed that up is to start using standard naming conventions for classes and their locations.

See http://drupal.org/node/395472#comment-1719782 and the following comments for a rather brutal investigation of the matter.

Perhaps a lib directory in each module, which used upercase filenames (to match the classname) would be an efficient way to handle it.

-J

sun’s picture

  1. The registry allows to split of AND dynamically (automagically) load integration code for other modules that may not be installed. Currently, contrib modules either need to implement their own API layer to allow third-party modules to integrate with their API without negatively affecting the site performance of many other sites that do not have that particular module installed and thus don't need to load the code at all (such as hook_panels|ctools_include_directory() or I think I've heard from from $module.rules.inc). The registry empowers Drupal developers to advance on Drupal's modularity (and do it properly).

    This is a major issue for contributed modules and most probably one of the primary reasons for why we have weak cross-module integrations and so many totally needless "glue" modules that only implement a module's API on behalf of another module.

    If we consider to drop the registry, then we have to ensure this highly valuable feature through a different mechanism.

    A $module.hooks.inc is no solution, because that file would be loaded on 100% of all requests.

  2. We certainly can achieve a vast decrease in memory consumption with the registry by ensuring that files are never loaded but only read to scan for functions when the registry is rebuilt. I think that reducing the overall memory consumption on production sites is the registry's primary purpose (where code changes occasionally only).

    Considering the time spent in PHP, my benchmarks for the total module split resulted in a 20-30% decrease of both execution time and memory consumption (on APC) -- with all Drupal core modules enabled only. Common Drupal sites have a lot more modules + code to load.

  3. We still have a bunch of patches in the queue that advance on the registry (such as making theme() fully leveraging it, too). Removing it now is like removing fields in core now, just because the implementation is incomplete.
  4. If we remove the function registry, then I see little point in keeping the D6 page split either. The purpose of both is identical.
Crell’s picture

Even if we only use the registry for classes, it's worth still keeping the function index in place. It's already written, and a contrib may be able to leverage it. We just need to remove the drupal_ from function_exists().

Class-per-file is a very stupid way to organize PHP code. We also need to account for deeply nested class hierarchies with front-caching, which can avoid a huge number of autoload lookups.

Anonymous’s picture

The function index is already written, but it's also already a massive headache to try and develop around. Attempting to keep it around if we aren't going to use it just adds unneccessary overhead to the resource requirements for Drupal (memory, database usage and processor usage). It becomes even more pointless since we can't guarantee it to be accurate (since that was one reason that it managed to keep breaking installations.)

Also, I'm inclined to agree with the class per file approach. Really any naming convention. We already have some naming conventions in Drupal. If we actually do care about performance then it might pay for us to take advantage of these / draft new ones. I'm inclined to believe that using naming conventions to our advantage would produce much more efficient, robust, and readable code than a registry can.

xxxxxx.hooks.inc, xxxxxx.hookname.inc, xxxxxx.forms.inc, xxxxxx.admin.inc, xxxxxx.classname.inc, etc...

pwolanin’s picture

we could put classes to file relationships in the module .info file to avoid having to parse them out of the source. etc - there are many options. I think that's far easier to get people to accept than an arbitrary file naming convention - especially if that means we need to deal with stupid CVS file move/rename limitations.

@sun you APi can be public if callers use module_invoke_all() or module_invoke() - as suggested by catch, that would let us stuff rarely-called hooks outside the .module file. The point is that hooks called frequently would stay in .module.

catch’s picture

@sun

#1, as we found in the module split patch, you can't put any _load() hooks in an include file anyway, since that causes all your other hooks for that module to be loaded on every request - so hook_taxonomy_term_load() goes in .module, and hook_taxonomy_term_every_else() goes in module.taxonomy.inc - so you only get nice clean modular separation of code for 75% of CRUD hooks.

Also, we already have

if (module_exists('taxonomy')) {
 require_once 'module.taxonomy.inc';
}

as a pattern. Which I've seen in revision_moderation.module and elsewhere. I don't think not having to do that is sufficient reason to keep the registry.

#2 I couldn't measure any difference in APC at all (maybe 500k, but too much variation to be reliable)

#3 Fields in core provides a feature which can be used by all Drupal sites, the registry provides a performance boost for sites running on poorly optimized hosting plans which have a lot of contrib modules and authenticated user traffic. These are not comparable subsystems. The reason for removing it, is to avoid shipping and having to maintain something for the next few years which should never have been committed in the first place. It's a shame the performance issues (some of which we admittedly have workarounds committed or in the queue for) and fragility was only really flagged up in the past 6 months or so instead of last year, but better to have this discussion now than in another six months.

#4. The page split doesn't have add any overhead to sites running an opcode cache, and doesn't result in site breakage. But as mentioned much earlier in this issue, if we really, really cared about optimizing especially for shared hosting, we'd work on splitting tarballs so we could move API functions to includes as well as hooks, and move everything back to .module in core. I found out the other day that all the code and hook invocations for deleting node revisions is in node.pages.inc, it makes sense from a performance perspective, but it's crap API and code organization.

webchick’s picture

Another thing you can't do anymore w/ the registry, I just found out...

Remove a module you have enabled from the filesystem: KABOOM. Recommended practice? No. But it's going to happen while you're upgrading a module if you follow the steps outlined in UPGRADE.txt. Better hope to $higher_power that someone doesn't ping your site in between your deleting the old files and putting the new ones in place. :P

JacobSingh’s picture

@webchick: Amazing you haven't noticed that you can't remove mods till now!
#498412: No way to access site if modules are deleted or disabled without using form
This is especially relevant when people want to move mods from sites/all to sites/$site or whatever

Also, unrelated but equally nasty: If you make a typo in hook_menu (try MENU_NORMAL), your site is completely screwed unless you edit database tables. I'll make an issue for this if no one has one on hand.

@catch:
1 class per file: Okay to disagree, but also I think it's pretty naive to call it "incredibly stupid" when it is the convention used by almost every programming language on the planet (and by many or more popular PHP frameworks). Doesn't mean we have to do it, but I think randomly naming a file based on the "general feeling" of the classes inside as we do now is equally or more stupid.

@pwolanin: Defining classes and their relationship in the info file sounds like trouble to me, but open to seeing what benefit it gives us.

@everyone:
DO we need a class or function registry? The class registry is not going to buy us very much the way Drupal is now. There is NO WAY to make an efficient autoloader without a filenaming syntax unless you are able to load the definitions from a config file somewhere (which still has overhead). Seriously, how would anything we build be any different from the current registry?

You can't recover, right? Or you have to load and parse files to make sure you aren't out of date. I'm much more in favor of making a hooks.inc file which can include its own libs inside function scope as needed upon invocation and a pages.inc when your menu handler gets called if we want to cut down on kb loaded.

Best,
Jacob

Anonymous’s picture

JacobSingh: I agree with the you on this. I don't really believe that it has a place in Drupal, at least not as anything more than a profanity. I just didn't feel like fighting any backlash. (It's very tiring.)

Personally I do not see what was wrong with the way we were developing before the registry. My personal belief: If we're worried about shared hosting, there is more to worry about than just memory. There is also processor usage and database usage. ;) The registry managed to slighly lower the first by multiplying the load of the last two. I'm not sure why we need everything to autoload, etc... I'd much rather use 'require file.name.inc' than to invoke the registry so that it can check a static array before it makes a database query that, if all went well (and many times it doesn't) will end up running 'require file.name.inc' The cure seems many magnatudes of times worse than the disease.

Personally I would stick to one of two directions:
1) Use naming conventions to make an EXTREMELY low overhead / dumb loader.
2) Page split and use "include 'file.name.inc'; _filename_form();" like we have been.

catch’s picture

@JacobSingh: I think you're confusing me with Crell. One class per file does sound like a pain even for .test files though.

casey’s picture

An alternative:

  • fully remove registry
  • keep file definitions in .info files
  • By default include all files at bootstrap; developers don't need to bother about function/class dependencies and have all freedom in code/api organization, but is bad performance)
  • Create a pluggable "compiler" that generates a optimized version of your drupal installation (store in cache dir)
    • sites on servers with opcode caching could use a compiler that combines all files into one file.
    • sites on limited webservers could use a compiler that split's code into several files (have a look at that dcc.module I posted in #363891: Optimize performance using code generation).
mfer’s picture

@JacobSingh - At the very least we need a class/interface autoloading system. Here's an example to illustrate. Currently Views is OOP with quite a few classes and most of those are extensions of other classes. There are a number of modules that extend views that all use classes. In Views 2 there is a system that tracks all the classes and their dependencies to load them and work in PHP4. This system already requires us to define stuff and does a lot of work PHP can do for us.

If we use the PHP5 spl autoloading system it should be less code for developers and faster (it's C code rather than PHP).

Sure, this could end up in views or ctools so those modules could use it. But, I expect more OOP with drupal now that PHP5 is required. I have already heard people talk about it. It would be good to provide a generic system that all modules can use.

At the very least this could be a hook (something like hook_autoload or hook_classes). If views defines 20 classes it would only need to be a 26 line hook (including the header comment block). Even if we remove the registry something like this could be very useful.

@casey - I like the idea of having a compiler that generates one file for opcode caches. Removing the requires will save a little on performance. But, that's not going to happen for D7. The code freeze is almost upon us.

We can't include all files at bootstrap on shared hosts. That would use a lot of memory and would be a problem. More users run drupal on shared hosts than anywhere else. We need to keep shared hosts in mind.

Anonymous’s picture

Anyone for creating a new thread / patch to handle autoloading interfaces / classes? There have been several ideas here that might be worth discussion in the new thread. This would leave no reason to keep the registry. If someone is willing split off the autoloading in to a new thread I'll start working on a patch that removes all reminants of the registry. (Let's see if I can break EVEN MORE tests this time...)

catch’s picture

Status: Needs work » Needs review
FileSize
89.21 KB

Should fix a few more test fails. I'm getting errors in simpletest, which I'm not getting when clicking around - wondering if there's problems with batch API or simpletest itself somewhere.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

SimpleTest's own tests are failing, which probably explains why this fails quite so spectacularly, anyone wants to pick this up that's probably the place to start.

pwolanin’s picture

@JacobSingh
re:

There is NO WAY to make an efficient autoloader without a filenaming syntax unless you are able to load the definitions from a config file somewhere (which still has overhead).

I thought this is more or less what I was suggesting with putting class to file relationships in the .info file. We already incur the overhead of parsing those files, so hopefully this would not be more more.

Crell’s picture

Requiring a declaration of all classes in the info file is worse than requiring a declaration of all files, IMO. It's more manual work unless you're doing the naive class-per-file approach, which we need to get away from. While it's true that right now most modules do not have any classes, I do not expect that to remain the case indefinitely and plenty of modules will have many classes. (Think Views, views plugins, and if merlin converts Panels to be more Views-ish has he's mentioned in the past any Panels-related modules.)

That also doesn't address classes in Drupal core, of which there are a couple now and will be more as time goes on.

pwolanin’s picture

@Crell - listing a class in the .info file is a one-time cost. So, I'd personally rather add a slight cost to DX and have a stupid-simple class auto-loader with no file parser versus the alternative of actually parsing files.

Crell’s picture

Parsing is not a regular cost. It only happens on total rebuild, which is an extremely rare operation. The parse cost itself has no impact no normal page activity.

pwolanin’s picture

Discussing with Jacob - we handled variable class names in a factory function in the Solr suite like:

    list($module, $filepath, $class) = variable_get('apachesolr_service_class', array('apachesolr', 'Drupal_Apache_Solr_Service.php', 'Drupal_Apache_Solr_Service'));
    include_once(drupal_get_path('module', $module) .'/'. $filepath);
   $instance = new $class;

Which is a little ugly, but not hard.

mfer’s picture

@pwolanin listing a class in a hook makes more sense than a .info file. In you rename a class you need to do a total rebuild. This is one of the complaints about the current system listed above.

Crell’s picture

A separate variable for each class is seriously broken, and means we don't get the ability to use PHP autoload. If we must do it manually, the autoload module already has this question solved. I still disagree with dropping back that far, but if we get rid of automation entirely then that's the way to do it. It's like a 2 hour job to roll and commit that patch.

pwolanin’s picture

@Crell - the variable is for classes we can override in a factory function or the like (simple cases). In other cases, you'd just include_once

mfer’s picture

@pwolanin a much cleaner way is to register all the classes with the spl autoloader and than use a variable to store a class name. Either the default or what is overriding it. The way the spl autoloader works every module can have it's own autoloading function. But, it would be better if we centralized it.

Crell’s picture

What mfer said. "What class to use here" and "where can I find this class" are by design (of PHP) separate questions. They should not be merged needlessly. A module should just say new $class, and then trust that the underlying system (registry, autoload module, whatever it is) will load the class if necessary. You then don't need to explicitly include_once() anything. PHP handles that for you.

catch’s picture

I'm very unlikely to have time to work on this in any depth before freeze - maybe if someone can debug why simpletest and batch are failing I might be able to work on loose-ends, but I'll be moving continent most of next week.

Could someone also open a new issue for class autoloading? That's not on the table for this patch afaik, so it'd make things easier to track.

pwolanin’s picture

ok, yes - let's just figure out why tests fail.

Crell’s picture

catch: If this patch removes the registry entirely, then it also removes autoloading and we need another solution. If it keeps the registry for classes and ignores it for functions, then nothing more needs to be done. If the registry doesn't get removed by freeze, then adding another autoload mechanism is redundant and I can't see it getting in until this is removed, but then we're right down to the wire for freeze. So since we seem to agree that we at least need a standard autoload mechanism of some kind, I'm not sure how to proceed here in this issue or another.

webchick’s picture

There's a lot of back and forth here about the ultimate way to do class registration/autoloading (which is something that basically everyone agrees is needed). Each suggested approach (files-per-classes, a "registry"-style hook, some sort of relationship metadata in .info files, etc.) has its pros and cons, and more importantly its DX cost.

The existing registry's auto-registration feature is quite nice... except when it breaks. :P By removing functions from the equation, we make the window of breakage far smaller than it otherwise would be, and we can re-evaluate at that point (after benchmarks, etc.) whether it makes sense to go further. Though I tend to think that, especially at this late stage in the game, we should be aiming to make the least invasive changes possible unless deemed absolutely necessary.

catch’s picture

@Crell, current patch keeps autoloading for classes - hence Drupal not blowing up when you install it. Just a note that I'm pretty sure once all my function_exists() || drupal_function_exists() patches are in, there'll be no significant performance impact from removing the registry, however I'm still extremely concerned about the fragility (and would at least like a chance to benchmark things without it to see if I'm right).

pwolanin’s picture

a first pass at removing drupal_function_exists()

chx’s picture

Status: Needs work » Needs review

Bot, what do you think?

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

I'm pretty sure the bot just gave you the finger! :)

catch’s picture

500 less exceptions than my last one, nice work pwolanin!

pwolanin’s picture

Huh, well - I didn't put it to needs review cuase I knew it was still failing massively.

Somehow the right modules/paths are not available in simpletest - trying to debug...

chx’s picture

Assigned: Unassigned » chx

Let me drive this home.

chx’s picture

Status: Needs work » Needs review
FileSize
122.65 KB

And what about this?

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
124.36 KB

Another round. Stream wrappers were registered after the ini_set('error_log', file_directory_path() . '/error.log'); call for simpletest. Bad, bad move.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
131.98 KB

Let's see the failures now.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
147.17 KB

This fixes node display by adding field.default.inc and re-adds the bootstrap column we removed to fix bootstrap calls. I have added watchdog to bootstrap_hooks otherwise what's the point in having watchdog in bootstrap.inc? system_test calls watchdog() in hook_boot and fails otherwise.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
146.06 KB

Bah!

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
154.1 KB

Fixed image stuff and mimetypes.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
160.15 KB

I really wanted to get further but I am unable to remain awake any longer. There is not much left: two aggregator tests, simpletest functionaly, Translation export, Token replacement. Role permissions, module API.

Status: Needs review » Needs work

The last submitted patch failed testing.

karschsp’s picture

subscribe

Crell’s picture

1) Why rewrite the parser? It works now, it just gets more data than we'll be using. Rewriting the parser means using one that we haven't been debugging for the past year.

2) It occurs to me that we never did get around to un-disaster-area-ing module_list(). It looks like this patch adds yet another obscure boolean parameter to what is already a horrifically evil and obtuse function. Is that really necessary? If we're going to touch that function it should get better, not worse.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
161.86 KB

Fixed conflict in menu.module in last patch due to LOCAL_ACTION change.

Status: Needs review » Needs work

The last submitted patch failed testing.

mattyoung’s picture

subscribe

chx’s picture

Working on the last fails.

pwolanin’s picture

FileSize
171.29 KB

chx is having trouble posting to d.o - here's his latest patch.

Note - some code is commented out due to critical bug #535564: D7UX IA: actionable top level content item

pwolanin’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
171.29 KB

oops, wrong file maybe

Status: Needs review » Needs work

The last submitted patch failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
171.33 KB

oops, this patch contain already-committed fixes for debug code in token. That caused a conflict.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
170.08 KB

Now?

chx’s picture

FileSize
170.08 KB

A little bit refined. If this passes, I would like to see a speedy commit.

chx’s picture

FileSize
170.11 KB

Sorry that was the wrong patch :/

chx’s picture

FileSize
171.64 KB

tobiasb points out that xmlrpc() needs a restore too.

catch’s picture

   function testUserPermissionChanges() {
+  return;

Still broken test or just left in?

Did some simple benchmarks - ab -c1 -n1000, one node, front page, APC enabled. No measurable difference.

Concurrency Level:      1
Time taken for tests:   57.043 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6306000 bytes
HTML transferred:       5819000 bytes
Requests per second:    17.53 [#/sec] (mean)
Time per request:       57.043 [ms] (mean)
Time per request:       57.043 [ms] (mean, across all concurrent requests)
Transfer rate:          107.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    45   57  10.2     53      82
Waiting:       45   57  10.2     53      82
Total:         45   57  10.2     53      83
Concurrency Level:      1
Time taken for tests:   57.212 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6853000 bytes
HTML transferred:       6366000 bytes
Requests per second:    17.48 [#/sec] (mean)
Time per request:       57.212 [ms] (mean)
Time per request:       57.212 [ms] (mean, across all concurrent requests)
Transfer rate:          116.97 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    51   57   7.0     54      93
Waiting:       50   57   6.9     54      91
Total:         51   57   7.0     54      93
pwolanin’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
173.54 KB

some clean up of doxygen, plus remove ~4 unneeded bare calls to function_exists() that were replacement for drupal_function_exists where it was used to load files contained the desired function. Also, one user test had a return; added at the beginning, which is probably not desired.

As long as this passes all tests, I think it's good to go.

pwolanin’s picture

Seems like the # of tests is a little lower than I see for other patches since some of the registry tests were removed.

webchick’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs documentation

Ok. After quite a bit of drama related to missing commit messages (ugh), I got this committed. Great team work on this, folks.

There is probably documentation in the upgrade guide that needs to be amended as a result of this.

pwolanin’s picture

Status: Needs work » Fixed
Issue tags: -Needs documentation
pwolanin’s picture

mcrittenden’s picture

Discussion on how this affects AJAX behaviors in .inc at #559526: All AJAX Behaviors Broken for Forms in .inc files

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

dww’s picture

Note to interested parties, the patch that was finally committed here broke drupal_bootstrap() in a very important way. See #606146: DRUPAL_BOOTSTRAP_VARIABLES needlessly loads all bootstrap modules for more.