On every page request, we run the following two queries:
SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 AND bootstrap = 1 ORDER BY weight ASC, filename ASC
SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC
However we only ever update that information on module_enable() and module_disable().
So, attached patch adds module_list_boot() and module_list_enabled() functions, which get their information from a central function which caches both all enabled and also bootstrap modules in an associative array.
With database caching, this saves one query on every request - since we swap two direct queries for a single cache_get(). It also removes one db hit on cached pages if using a non-database caching backend.
Comments
Comment #1
damien tournoud commentedCould we:
- move this cache to a separate bin (I suggest 'cache_bootstrap', that way we could move variable cache there too)
- add a (very short) expiration time (1s?)
This would allow a site to consider moving this cache to a local storage (APC like).
This review is powered by Dreditor.
Comment #2
catchI've been planning to open a separate issue for a 'cache_local' bin - although cache_bootstrap is a better name.
Potential candidates include this one, variables, locale, module_implements - anything needed on every request.
Not sure about 1s expiration though - doesn't that mean a write per-second when using default database storage, that's an unacceptable hit.
Comment #3
moshe weitzman commentedMy ultimate goal here is to get the list of all enabled modules and enabled bootstrap modules to be a variable that can be overridden using $conf. That way, a dev environment can enable devel and it is disabled in prod. I tried a patch like this once but the variables were not available early enough (I think thats been fixed since). A small benefit of a variables approach is that we don't introduce a new bin
Anyway, thats food for thought in case someone wants to extend this patch. Until then, this looks very useful.
Comment #4
catchI was thinking about putting bootstrap into a variable once this was running, hadn't thought about all modules though, that's a nice idea.
Before seeing moshe's post I had this idea:
Do one query of the system table.
SELECT name, filename, type FROM system WHERE status = 1 ORDER BY weight ASC, filename ASCTake that array and do a single foreach, which generates the following four arrays - to be used for the following purposes:
1. Bootstrap modules
2. Enabled modules
3. http://api.drupal.org/api/function/list_themes/7
4. http://api.drupal.org/api/function/drupal_get_filename/7
Then have module_list(), list_themes() and drupal_get_filename() all use that one function to get what they need.
That would take us from four system table queries per page down to one (I nearly always see one from drupal_get_filename(), the rest are required on any normal request). Once thats done, we can then look at cache_get()/cache_set() for that array or using it to populate a variable.
Comment #5
catchHere's a start on #4, it should allow us to do #3 with a bit of tweaking too if we want.
What it does:
# Centralizes all system module queries called on a single page into one function - this is currently module_list() * 2, and list_themes()
# Moves the static priming of module_list()/drupal_get_filename() to that central function to save a second foreach (and because that doesn't make any more sense to have in module_list()) - this is done for themes as well as modules now, not sure if that improves anything yet, but neither can it do any harm.
This should cut out two queries for every request in HEAD. While doing this, I realised a variable for modules would also require us to store their filenames somewhere due to the drupal_get_filename() issue.
Comment #7
catchNice, wasn't sure if it would, but this also kills the one stray query from drupal_get_filename() we do every page in HEAD too, so that's 4 queries down to 1 for this group of functions on every request, without adding a new cache apart from the drupal_static().
Applied #607244: Decrease performance impact of contextual links and created a test admin user so I wasn't getting contextual links.
Empty front page, default profile:
HEAD:
Executed 40 queries in 23.09 milliseconds.
Patch:
Executed 37 queries in 21.24 milliseconds.
(query time not at all consistent between page views, just pasted for reference).
While here, I also noticed that 16 of those queries are from cache_get() - so actual required database hits to serve an uncached page is ~20 with this patch applied.
Then I benchmarked the same page as an anonymous user, with page caching disabled.
HEAD:
Patch:
The drupal_get_filename() and module_list() come out as 0.5-1.5ms each on my system (although there's also some PDO overhead executing queries), so that's as expected.
Comment #8
catchThis one fixes the installer for me.
Comment #9
moshe weitzman commentedFYI, I wrote a follow-up patch at #625444: Override enabled module/theme list dynamically using variable override in settings.php. Lets get this one in first.
Comment #10
sunWhy doesn't it take an optional argument to filter the returned list?
The function name is a bit awkward.
1) drupal_system_list() would be much more clean and would also open the door for complementing this function with a "rebuild" function someday (and not having two functions that just differ in a "re").
2) However, and I think that this has been mentioned elsewhere already, I think that we definitely need to start to remove the drupal_ prefix and start to move mission-critical functionality from system.module into includes/system.inc, potentially moving a lot of module.inc in there, too. So, someday, system.module becomes an ordinary module, but system.inc contains the mission critical full bootstrap functionality. Long story short: Considering all of this, I thought about just plain system_list(). This would also help me along to find a proper name in #624848: Allow to retrieve a list of modules defining a certain .info file property.
This should be a !isset().
I'm on crack. Are you, too?
Comment #11
catchRenamed to system_list(), added required argument since I can't see any case where you'd indiscriminately want the three arrays - and we have other API functions for getting the full contents of this table, which this doesn't and can't replace. Dropped the default from drupal_static() and changed to isset().
Also ropped two no longer pertinent indexes from {system}, added a new one so that the new query doesn't cause a filesort.
Comment #12
catchQuery with this patch:
Both the bootstrap and modules queries in head also come out at const/ "Using where".
Comment #13
catchtha_sun noticed a phpdoc typo.
Comment #14
sunAwesome.
Comment #15
catchsun asked in irc about the index- whether we additionally need to include 'type'.
I said no, then double checked, then had to ask David Strauss to understand why it's a no.
Here's an EXPLAIN with the index in the current patch:
And one with the index on type:
And here's a pasted irc conversation from #drupal with David Strauss, apologies for code tags but that's the easiest way not to have handles eaten by the HTML filter:
Comment #16
dries commentedCan we find a better name for 'system lists'? It is rather abstract, and unless you are familiar with Drupal bootstrap internals, it might not make a ton of sense.
Second, I noticed that the original query still referenced 'throttle' (see issue description at the top). I'm pretty sure we removed that field from core. Are you sure your benchmarks are valid? It sounds like something might have gone wrong?
Comment #17
catchI copy and pasted from D6, but the queries are identical in HEAD apart from the throttle column.
I'm not really expecting anyone to use system_lists() - it should really only be used via module_list() or list_themes() anyway. If anyone has a better idea for a name (I originally had drupal_build_system_lists() but that's no less abstract), would be great.
Comment #18
sunI think Dries meant the PHPDoc summary only, which indeed sounds a bit strange ;)
Build a list of bootstrap modules and enabled modules and themes.
Comment #19
catchAhh. Re-rolled with sun's suggestion.
Comment #20
damien tournoud commentedThis looks like a really nice cleanup. It introduces more consistency between the three modes of interaction with the system table ('bootstrap', 'module', 'theme').
Is there really a reason for the IN ('module', 'theme') condition? What can be in that table anyway?
I would like to know what is the purpose of this file_exists().
By the way, drupal_get_filename() could take advantage of the shared list too.
The drupal_static_reset() should be the job of module_list(TRUE).
I'm on crack. Are you, too?
Comment #21
dries commentedYes, I meant the phpDoc only. Thanks for the re-roll!
I'll wait for a reply to DamZ comment in #20 before proceeding.
Comment #22
catchI was under the apparently mistaken impression that theme engines and etc. appear in the system table, but it looks like not, so removed.
This was simply copied from module_list() but I think we can get away without it, so removed that too.
It could, and I considered adding that in, but with the optimization here, we don't cause any queries with drupal_get_filename() on most requests anyway, I'd like to look at converting that in a followup maybe, but it's not necessary here yet, and adds additional complexity to the patch.
It should, but it can't - because the static is reset in both calls to module_load_all() - which would cause it to rebuild the list from the {system} table twice each request, which is precisely what this patch tries to avoid. See #222109: module_list() should be rewritten as 2 functions: module_list() and module_list_bootstrap() and inline comments in drupal_get_schema() as well. I don't want to litter core with multi-line inline comments everywhere we reset this static - there's not one place to do that, so have not added additional documentation, since there's already an outstanding issue, but we could do that if we wanted.
Comment #23
sunEven more awesome.
Comment #24
ajayg commented@catch,
in comment #4 you have mentioned "we can then look at cache_get()/cache_set() for that array". Is that still the plan? Or you think it is not necessary anymore? I haven't noticed that in most recent patch.
By caching wouldn't we get additional benefit of saving several "file_exists" php calls (via drupal_get_filename) , in subsequent page reads which will further reduce disk access and improve performance?
I understand the patch so far has been focusing on reducing queries for single page call, but I think there are further potential benefits of subsequent calls if we persists the built list.
Comment #25
dries commentedClean, simple to understand patch. Committed to CVS HEAD. Great job, catch.
We can follow-up on #24 in separate patches if necessary.
Comment #26
catchI've opened #626688: Add caching for system_list() and #626690: Use system_list() for drupal_get_filename() for the caching and drupal_get_filename() followups.
Comment #27
damien tournoud commentedThis broke core. We need a revert.
Comment #28
yched commented"broke core" = See #626866: Test bot broken - .
The green test report in #22 above mentioned 359 passes.
Comment #29
damien tournoud commentedI disabled the test bot.
Comment #30
yched commentedHere's a revert patch.
Comment #31
yched commentedEr, mixed up with some patch I'm working on.
Here's a revert patch.
Comment #32
sunStrange thing is... installing and working in HEAD manually works just fine. Only running tests fails. During database setup in setUp(), the system seems to report 1) modules that do not exist in the testing site yet, and 2) an empty list of themes.
Comment #33
sunThis patch fixes the bug for me.
Also moved a stale comment to the proper location.
Comment #34
sunAlso removing drupal_static_resets, which are duplicate now.
Comment #35
dries commentedI committed the patch in #34. We can re-enable the test bot now! :)
Comment #36
sunlist_themes() needs all themes.
Comment #37
dries commentedCommitted as it looks like it fixes the tests for me.
Comment #38
sunAnd I forgot to update the PHPDoc accordingly.
I also want to prepare this thingy for #624848: Allow to retrieve a list of modules defining a certain .info file property.
Comment #39
catchLast one looks good.
Also, sorry for breaking HEAD.
Comment #40
dries commentedCommitted to CVS HEAD. Thanks!
Comment #41
catchSorry to re-open yet again but this is ridiculous. While I can see the utility in a generic function, making it as generic as this caused a 50% slowdown with normal page caching. At least 25% of that overall cost was in the change to loading all records from the system table as opposed to just enabled - that took the query from 1.4ms to over 2ms according to devel (100 reqs/second is 5ms per request) - there are much fewer disabled themes than modules. I missed the list_themes() requirement in the initial patch that went in, but fixes overcompensated for that whereas we should've taken a different direction altogether when that was realised. HEAD is working now so let's get this fixed up for the last time.
Attached patch doesn't fix everything here, hence CNW, but demonstrates the problem:
ab -c1 -n1000 http://d7.7/
HEAD:
Patch:
What I suggest we do is:
1. Special case the bootstrap list when serving a cached page which is what I've done here - we need a solid check for "I'm going to serve the page from cache now" - I thought this was drupal_page_get_cache(TRUE) but that takes #/sec down from 80 to 60 so obviously isn't doing what we want here. For now drupal_page_is_cacheable() is good enough.
2. Cache the bootstrap modules separately once #627338: Add a cache_bootstrap bin is in - so cache_get() only has to fetch and unserialize a very small array instead of potentially four arrays 50-100 items long.
3. Don't ever keep an array of all disabled modules in a static variable that's built on every request. We have http://api.drupal.org/api/function/system_get_info/7 for that sort of thing, I have no idea why that was added here.
4. Consider using a separate query for list_themes() to avoid query bloat on uncached pages (so that we don't load the entire contents of the system table into memory then chuck 50% of it away). While that gives us two queries on uncached pages again (still down from four but not ideal), we can then cache that resultant list for uncached pages too, in cache_bootstrap(), so that high performance sites are able to skip it entirely as well.
That ought to give us the quickest possible response for both cached and uncached pages, without one impacting the other too much. The drupal_page_is_cacheable() check will fail if a page is cacheable, but not served from the cache (basically cache misses), but it'd still be optimized compared to HEAD a week or so ago, so of all the places to let a query split through that's probably the least worst - and it'll be cache_get() by the time we're done here anyway.
Comment #42
catchAlright with this version of the patch:
normal cache 98 #/sec
HEAD: 48 #/sec
D6: 91 #/sec
So actually faster than D6 out of the box and twice as fast as current HEAD.
We have two queries if you serve a cached page - because we don't seem to have any way at this point to know if we're serving from cache or not, so that gets duplicated, but still an improvement for logged-in users there too.
The patch is also almost 10% faster without page caching - as far as I can see due to removing disabled modules from the equation.
Patch:
HEAD:
Additionally, I profiled a cached page view, and this is the only remaining query required to serve a page with the normal page cache (assuming you disable IP blocking in $conf) - so sites running non database caching backends will now be able to serve pages without touching the database.
Comment #43
catchNow using the cache_bootstrap bin.
Comment #44
peterx commentedSpecial case Ajax? If a site has a million Ajax requests, where do they fit in to the bootstrap in D7? Do the 14680 tests for this change include an Ajax call to make sure we are not doing more than necessary for Ajax? I tried testing in D6 with basic Ajax, not Ahah because Ahah might use themes, and I ended up with inconclusive results because I do not think I was using an optimum Ajax process. RPC and RSS are other areas that might be a special cases.
Comment #45
dries commentedCommitted the patch in #43. We can talk more about #44 if desired, but it feels like it belongs in another issue. Thanks catch, good job (as always)!