Coming from #769226-154: Optimize JS/CSS aggregation for front-end performance and DX and also identified the need for this in contrib just recently, we want to
1) make system_list('module_enabled') return full module records instead of just module names. system_list() is only invoked by module_list(), and hopefully/likely not in contrib yet, so this is an API change, but a small one.
2) due to 1), system_get_info() can re-use that information to build module/theme information. I've added a static cache, because of the unserializing of all the info data.
3) Ideally, I'd additionally like to change system_get_info()'s $type parameter to be consistent with system_list(), i.e., 'module_enabled' and 'theme', and eventually add 'module' as an option. That would be a giant help for #624848: Allow to retrieve a list of modules defining a certain .info file property
Comment | File | Size | Author |
---|---|---|---|
#47 | drupal.system-list-get-info.47.patch | 6.08 KB | sun |
#22 | drupal.system-list-get-info.22.patch | 4.82 KB | David_Rothstein |
#9 | drupal.system-list-get-info.9.patch | 4.54 KB | sun |
#8 | drupal.system-list-get-info.8.patch | 6.49 KB | sun |
#3 | drupal.system-list-get-info.3.patch | 3.3 KB | sun |
Comments
Comment #2
bleen CreditAttribution: bleen commentedsubscribing
Comment #3
sunOdd. The enabled module list can be empty? Very odd. Anyway, let's just cope with that.
Comment #4
sunSince this blocks the critical issue #769226: Optimize JS/CSS aggregation for front-end performance and DX, bumping to critical.
Comment #5
bleen CreditAttribution: bleen commentedI've been looking at this patch a bunch today (actually the first patch - I was trying to figure out how $list could every be empty also but I got nuthin) and everything looks good ... I'd give it a RTBC, but we should probably get another look or two
Are you going to do #3 (from the original post) in a followup?
Comment #6
sunDoing 3) would mean to make system_get_info() support $type = 'module_enabled', and to make system_list() support $type = 'module'.
I'd love to do that. Also in this patch, if possible/agreed on.
Comment #7
David_Rothstein CreditAttribution: David_Rothstein commentedI like this overall.
Any reason we can't just do the unserialize() inside system_list(), right when the data is loaded?
Minor - don't think we should bother with the outer parentheses there.
I had to read this a couple times (and then confirm by looking at the code) before I got it :) How about something like "The name of a module or theme whose information will be returned. If omitted, all records for the provided $type will be returned."
Comment #8
sun1) You're right. Didn't even notice that. That's even a bug fix, since I'd expect system_get_info() to work identically to module_list() and list_themes(), resp. system_list(). :)
2) The problem is that normal requests do not need to unserialize that data at all. Thus, the unserialize() would be needless for most requests. However, I'd also be happy to simply remove the static cache. Unserializing this data more than once might be needless, but then again, it's not the end of the world, performance-wise.
3) I know it's not (yet) in the coding standards, but I really believe that all ternary operations should be wrapped in parenthesis. That is, because the code itself is shorthand logic and it's not always entirely clear or directly visible what the resulting value will be, especially in combination with = variable assignment. I especially can't stand code like the following, because it presumes a very certain behavior of PHP internals:
Anyway, not to be discussed here. Just wanted to mention that I'll keep the parenthesis for IMO very good reasons. :)
4) Thanks, incorporated!
Additionally incorporated #6.
Comment #9
sunHm. This whole "module_enabled" thing does not make sense.
Let's simplify:
- system_list() retrieves all of $type + statically caches that.
- module_list() only takes over enabled items.
- system_get_info() uses system_list(), and is able to return $only_enabled.
=> Zero API change. Well, almost - system_list()'s "module_enabled" changes into "module".
Comment #10
sunComment #11
David_Rothstein CreditAttribution: David_Rothstein commentedThis makes a lot of sense in terms of consistency, but the side effect is that we are now storing (in the cache) all data for all modules in the filesystem, including e.g. the 40-50 hidden modules used by Simpletest. Because it's cached, that's not too bad, but it's still nonzero work to cache_get() a very large record on each page request, plus store the resultant giant array in the static cache where it takes up memory. Is it definitely worth it?
If the unserialize() moved to system_list(), it would not happen on every request, only very rare ones - since system_list() uses a database cache. Whereas right now, for themes it is kind of silly that system_list() caches the info in a "still-serialized" state, but then http://api.drupal.org/api/function/list_themes/7 unserializes that on every request. Your patch adds more of the same to system_get_info(), so indeed not a performance disaster, just perpetuating some odd behavior in a new place, that would be lovely to fix instead :)
Maybe the order of the last two parameters should be switched? (I would think getting info about a particular module would be a lot more common than wanting to set $only_enabled to FALSE.)
With this patch, the "active" in that docblock is no longer quite correct.
Comment #12
effulgentsia CreditAttribution: effulgentsia commentedSubscribe.
See #812016-8: Themes cannot always participate in drupal_alter(). A very large cache_get() that includes an unserialize() can be on the order of ms: something to avoid, if happening on every page request and unnecessary.
Comment #13
catchLast time someone tried to make system_list() more generic, I won't name names but they're the author of this issue, page caching took a more than 50% performance hit. I can't find that issue but this is more or less exactly the same thing.
Comment #14
catchHere it is:
http://drupal.org/node/623992#comment-2248022
Also there is no way on earth that this is critical.
Comment #15
Dries CreditAttribution: Dries commentedThe
$name
parameter confused me. The name module_list() really suggests that we return more than one record.Do we really need it? It feels like it is cramped into the API, unless I misunderstood what
$name
does.Why wouldn't you return everything, and then use the big array to get the details for one record?
Comment #16
Dries CreditAttribution: Dries commentedSorry, cross-post. Didn't meant to put the status back to 'critical'.
Comment #17
sunNot sure why this was demoted. We still read issues before changing anything, right?
The idea is that, when doing a full bootstrap, functions like being introduced in #769226: Optimize JS/CSS aggregation for front-end performance and DX will need access to full module records anyway. So the point of not querying and caching it because we don't need it, is bogus.
What's debatable and might be removed is the caching of all modules (instead of just enabled).
Comment #18
Frando CreditAttribution: Frando commentedWouldn't it be easier to just add a simple cache_set/cache_get of the css/js declarations found in the module .info files right where this information is processed? The cache can be cleared in hook_module_enable. Then there's now penalty for cached page requests.
Comment #19
sunThe idea is that this information is useful for more than CSS/JS. I know of at least two contrib modules that have to retrieve and eventually cache this information manually/separately.
Comment #20
catchIt might be useful but it shouldn't bog down poor, defenceless Drupal 7 which is already slow as molasses. This doesn't merit review until there are accompanying benchmarks. I'm tempted to mark it won't fix due to http://drupal.org/node/623992#comment-2248022 which just appears to have been completely forgotten.
Comment #21
David_Rothstein CreditAttribution: David_Rothstein commentedI think I was guilty of using imprecise language in some of my reviews above. To clarify, this patch shouldn't have any effect whatsoever on cached (i.e., anonymous) page requests.
It may have an effect on uncached (i.e., authenticated) pages due to the size of the system_list() cache, as discussed by @effulgentsia and I above. To deal with that, we can limit this to enabled modules only (I think the use case for querying info about disabled modules is relatively small). Frankly, if we do that and then couple this with removing the silly unserialize() call from inside list_themes() we may wind up with a performance improvement: We'd effectively go from "double-unserializing" (once in the cache_get, once outside) all enabled+disabled theme data on every authenticated page view, which is just silly, to "single-unseralizing" both the (enabled) module and (enabled+disabled) theme data. Themes tend to have much bigger info files than modules anyway so who knows how it would all add up. It could be benchmarked.
Comment #22
David_Rothstein CreditAttribution: David_Rothstein commentedMaybe something like the attached reroll?
Regarding $name, we don't strictly speaking need it (since as Dries points out you can just get the whole array and pick out what you need from that), but on the other hand it should lead to simpler, clearer calling code in cases where you're doing that. I don't have a strong opinion, but left it in the patch for now.
Comment #23
manarth CreditAttribution: manarth commented[updated: these benchmarks may not be accurate - see later comment]
ab shows the patch in #22 reduces performance by 90%.
Without patch: typical 30 req/sec
With patch: typical 3 req/sec
I re-ran these tests a number of times and saw similar results each time.
Without patch
With patch
Comment #24
RobLoachClosed #338002: Allow modules to cache module information as a duplicate. I had a module_info() function going on there, but system_get_info() is more appropriate.
Comment #25
moshe weitzman CreditAttribution: moshe weitzman commented+ $list = (!empty($list) ? array_combine($list, $list) : array());
use drupal_map_assoc().
@manarth - do you believe your own benchmark? For starters, there should be no change for anonymous cached pages. Are you passing a cookie in order to be logged in? If not, is page caching enabled?
IMO this is critical since it blocks the css/js preprocess issue but not getting fussy over that.
Comment #26
David_Rothstein CreditAttribution: David_Rothstein commented@sun pointed out above that we don't want to use drupal_map_assoc() because it makes module.inc require common.inc (on the other hand, if this part of the code path only happens during a full bootstrap, maybe it doesn't matter - not sure).
Yeah, geez, I hope those benchmarks are wrong :) Is there any chance that one was accidentally done with page caching enabled and the other with page caching disabled?
Comment #27
sunBenchmarks with patch state less than half of the delivered HTML size, so you were likely benchmarking a PHP fatal error or something. Additionally comparing total size to HTML size, it looks like page caching was enabled.
Comment #28
manarth CreditAttribution: manarth commented@moshe @David_Rothstein I'm having doubts on those benchmarks too (perils of running benchmarks on a VM instead of bare metal). Both were run with ab (as anon user, hitting the homepage, fresh build, no page-caching).
Xdebug profiler doesn't show a great deal of difference with the patch, and there doesn't seem to be anything performance-sucking in the patch, so I'd discount the benchmarks.
Comment #29
jenlamptonThis issue is only major, but there is a critical issue (#328357: Allow module .info files to add CSS/JS) we're working on that allows modules to add css/js in their info files.
If we want that work do be done here instead, doesn't that make this issue critical?
Comment #30
jenlamptonguess not, I'll keep focusing on the other issue.
Comment #31
sun#22 works beautifully.
Of course, you need to flush all caches, so module and theme info is rebuilt. Without doing so, pages contained lots of PHP warnings and notices.
Unrelated to this patch: After flushing all caches, all blocks in Bartik were disabled. Most likely, because Bartik defines custom regions, and no theme info was available during... although, hm, the proper theme info should have been available again, before blocks were rehashed. Anyway, separate issue.
Comment #32
catchStill no valid benchmarks on this issue.
Comment #33
sunComment #34
sunThis issue effectively blocks #328357: Allow module .info files to add CSS/JS now.
Comment #35
alanburke CreditAttribution: alanburke commentedAttempting to Benchmark the patch at #22
Drush Devel generated content.
Without Patch
Applied Patch.
Used Drush CC all to clear caches.
Interestingly, got this warning [multiple times] on first run of drush cc all.
Ran drush cc all again, and it seems to run fine.
With Patch
Ran ab
This seems to return a performance improvement, so something somewhere might not be right [the patch, my system or my benchmarking].
Comment #36
manarth CreditAttribution: manarth commented@alanburke - the filesize looks awfully small - 353 bytes isn't enough for the doctype and HTML tag!
Looks like you're getting error responses.
Comment #37
manarth CreditAttribution: manarth commentedHere are the results:
I think it's reasonable to conclude that the patch improves performance.
250 requests, 10 concurrent users (timeout after 5secs)
Without patch
With patch
1000 requests, 15 concurrent users (timeout after 5secs)
Without patch
With patch
1000 requests, 12 concurrent users (timeout after 5secs)
I re-tested with fewer concurrent users, because ab reported unreliable tests for c=15:
Without patch
With patch
Longer running:
Without patch
With patch
NB: the Document lengths is larger for the patched version than the unpatched version by 182 bytes, because the URLs are different on my setup (foo.core vs foo.corepatched). The difference is less than 1% of the total document length and can safely be ignored.
Comment #38
catchDid some micro-benchmarking, 100 iterations of system_list(); drupal_static_reset('system_list');
HEAD: 45.91s
Patch: 47.034038066864s
Note I have no idea why it's taking ~500ms to call system_list() in both HEAD and the patch, so these may well be invalid overall, I checked a single call to system_list() locally and also saw 500ms. About to get on a flight but will try to look again at this later, possibly on another machine.
Here's the test script:
Comment #39
catchCorrect function argument would've helped... system_list('module_enabled');
100k iterations this time.
HEAD: 22.606045007706 seconds
Patch: system_list: 22.733495950699 seconds
Looks like no measurable difference, which is great. Not sure how manarth got consistently faster results though. Did the same test with 'bootstrap' instead of 'module_enabled' and no change there either.
Comment #40
bleen CreditAttribution: bleen commentedWell ... either Catch is correct and there is no measurable difference ... or manarth is correct and we get a bit of a win with this patch. Either way do we have enough to mark this back to RTBC?
Comment #41
manarth CreditAttribution: manarth commentedXHprof is happy to report improvements with the patch:
Comment #42
sunThanks for all the benchmarks! :)
It looks like this query is using the 'bootstrap' index now, too. The existing 'system_list' index only contains weight and name for the ORDER BY, but not type and status.
Powered by Dreditor.
Comment #43
David_Rothstein CreditAttribution: David_Rothstein commentedI also just did a quick micro-benchmark of list_themes() - since this patch affected that too. I used @catch's code but with:
With 1000 iterations I got this:
I expected some performance improvement due to removing the unserialize(), but not sure I expected that much, so maybe I did something wrong...
In any case, it sounds like overall the conclusion from everyone is that this patch doesn't hurt performance and may actually help it :)
Comment #44
David_Rothstein CreditAttribution: David_Rothstein commentedActually, after looking at http://www.phpdevblog.net/2009/11/serialize-vs-var-export-vs-json-encode... maybe my results do make sense.
Basically, the unserializing that took place in list_themes() during the first call to it on every authenticated page request probably took around 20ms, and with this patch, that is gone, so we gain those 20ms. (Offset somewhat by a larger cache unserialization elsewhere, but apparently not as bad.)
Is this RTBC now?
Comment #45
catchit's pretty much impossible that list_themes() was taking 20ms, I'd run it once with a timer to compare before and after the patch to see what might be happening.
I'd be really happy if I could see the xhprof report just for system_list() instead of the whole request, but this seems fine to me, thanks to people for actually doing the benchmarks.
Comment #46
sunThis leaves us with #42. What do we do with the unused 'system_list' index on {system}?
a) Drop it.
b) Drop it and rename 'bootstrap' index to 'system_list'.
c) Tweak it and keep it separately to the 'bootstrap' index.
Comment #47
sunWent ahead with option b), replacing 'bootstrap' with 'system_list' index.
Comment #48
moshe weitzman CreditAttribution: moshe weitzman commentedCode looks solid to me, and bot is happy.
We could move drupal_map_assoc() to bootstrap.inc and use it here. We've needed it early elsewhere. Lets only bother if we have to re-roll.
Comment #49
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Hopefully we can make progress with that critical now. Thanks!