module_list() is broken
earnie - February 15, 2008 - 13:31
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | chx |
| Status: | patch (code needs work) |
Description
This patch supports http://drupal.org/node/196862 at #36.
| Attachment | Size |
|---|---|
| module_list-HEAD.patch.txt | 631 bytes |

#1
Please reopen this if I'm mistaken... but as I said at the other issue, this patch doesn't seem to help.
#2
Reopened. This patch does indeed help and is more proper than #237336: Url Aliases no longer work comment #4 which does the same thing but every call and thus eliminating the cache.
#3
I've included a modification to the $bootstrap default value.
#4
I'm not convinced that drupal_lookup_path needs to be in path.inc at all. Why not just move it to path.module? (along with drupal_get_path_alias and drupal_clear_path_cache). Would need to be renamed of course.
drupal_get_normal_path needs to stay, and would need a function_exists for drupal_lookup_path (there's already a function_exists for custom_url_rewrite_inbound in there so I guess it'd be balanced if nothing else).
#5
retitling to something more eye-catching.
#6
This change make sense, looking at the code
if ($refresh || $fixed_list)seems to assume the first call tomodule_list()should bemodule_list(TRUE)which is not consistent. But$listis not an array only when it is not set, so a check for!isset($list)should suffice, and should be less expensive than!is_array($list).I'm not sure about defaulting
$bootstraptoFALSEthough. If it is correct we should search the calls tomodule_list(FALSE, FALSE)andmodule_list(TRUE, FALSE)and change them tomodule_list(FALSE)andmodule_list(TRUE).#7
The change for the default value of $bootstrap is needed for the !isset($list) case. However we need to check all the uses of module_list to find those that might have assumed it to be TRUE. Those would something like module_list(TRUE) with no other parameter given and would need to change to module_list(TRUE, TRUE).
#8
OK, I just tried applying the patch from #3, but unfortunately it doesn't fix the problem either. URL aliases are still broken.
It's definitely a caching-related problem, but I think a more serious one than described above... As I said at the other issue, I think the fundamental problem with
module_list()is this:This shouldn't be too hard to fix... we need an
isset()check as you suggest here, but we also need to statically cache the bootstrap and non-bootstrap lists separately. I think that should be about it, right? I'll see if I have time to whip up a patch later.#9
@catch, your idea in #4 sounds worthwhile in its own right (maybe a separate feature request?). But for fixing this bug, I would vote for a direct fix rather than a workaround... and there's no fundamental reason why calling
module_exists()inside path.inc should be forbidden, right?#10
OK, here is a patch. It fixes module_list()'s internal cache to actually work correctly... and on an existing Drupal 7 installation, everything seems OK. URL aliases work again, and I didn't notice any other problems (although I haven't tested it extensively).
The only teensie-weensie problem is that this patch totally breaks install.php (and probably update.php too, I haven't checked). It causes a "call to undefined function db_query()" error inside module_list()..... I have to track this down further, but I think the code in install.php was basically relying on the old, weird caching behavior of module_list(), in particular with regard to the $fixed_list parameter. I think $fixed_list doesn't really belong in this function anyway (it's kind of a weird parameter), and what the function really needs is a more general way to get a list of modules to use when it's called before there is an active database connection.
But I think this is on the right track. Frankly, I'm amazed that the caching problem never caused any bugs before this... module_list() really seems like it's completely broken internally! (Either that or I'm totally going crazy here ;)
#11
OK I took a look at splitting the path.inc stuff into a different issue but that's kinda tricky.
This from bootstrap.inc - it runs before common.inc is included or any modules loaded:
<?phpcase DRUPAL_BOOTSTRAP_PATH:
require_once './includes/path.inc';
// Initialize $_GET['q'] prior to loading modules and invoking hook_init().
drupal_init_path();
break;
?>
If we follow the trail from
drupal_init_pathit does this:
drupal_init_pathdrupal_get_normal_pathdrupal_lookup_pathelseif (module_exists('path'))<< before any modules are loadedNote I did this via api.drupal.org rather than a debugger but it appears to explain what's going on.
My inclination would be to roll back #196862 (where this issue never came up, and the final patch was committed without testing) and look at this from scratch. Judging by this I'm not at all convinced that module_list is broken, or at least not in such a way that it's causing this specific issue.
Attaching a rollback patch for #196862 and marking to needs review.
#12
@David: Your patch to me appears to return a module named boolean TRUE since it is possible to call module_list(TRUE, TRUE) or even module_list(FALSE, TRUE) and then somewhere else call module_list() but you've set $list[$bootstrap] when $bootstrap == TRUE and you don't have a correct list of modules. Why not use another static array named $bootstrap_list to take care of the bootstrap modules? We could even go wild and fill both $list and $bootstrap_list if $refresh is TRUE since we have knowledge of the bootstrap modules with the full list read anyway.
@catch: There is no reason to not use module_list as the indicator that the path module is enabled. You're just ignoring the bigger issue with module_list. David's patch @#10 is the correct method to correct the issue.
#13
The patch to review is #11. earnie, you again?
#14
path.module is not in module_list() during that stage of the bootstrap.
Easy way to check is to add this to the top of path.module
<?phpfunction path_boot() {
}
?>
and url aliases work again (after a module_rebuild_cache).
#15
Marking as duplicate of http://drupal.org/node/196862
#16
module_list is broken. The sucker needs fixed.
#17
Indeed, I'm retitling this, since module_list() is definitely broken (regardless of whether or not fixing it is the best way to deal with the URL alias problem).
@catch, your analysis makes a whole lot of sense, but I think there is one problem. module_exists() does not actually check whether a module has been LOADED... it checks whether the module is enabled (i.e., whether it WILL BE LOADED at some point during the bootstrap process). Indeed, this is very confusing, since module_exists() calls module_list(), and the code comments for module_list() claim that it returns a list of "loaded modules"... however, if you look at the code, that's not what it's doing. It simply checks whether status=1 in the database, which is not the same thing.
Therefore, @earnie's original use of
module_exists('path')should theoretically work, as it's only designed to check whether or not the path module is ENABLED -- and indeed, it does work when my patch in #10 is applied.Basically, it seems like we have two bugs here which are sort of canceling each other out... or something like that ;) Arguably, module_exists() should be rewritten to behave the way you thought it was behaving (i.e., check whether the module has actually been loaded)... as to a large extent, that would make more sense.
The end result is that there is definitely broken code here which should be fixed. @earnie, thanks for your review in #12... I will look over those comments more carefully later when I have time.
#18
<?phpelse {
if ($bootstrap) {
$result = db_query("SELECT name, filename, throttle FROM {system} WHERE type = 'module' AND status = 1 AND bootstrap = 1 ORDER BY weight ASC, filename ASC");
}
else {
$result = db_query("SELECT name, filename, throttle FROM {system} WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC");
}
?>
note bootstrap = 1, which is set in module_rebuild_cache, which sets it on the basis of whether a module uses hook_exit or hook_boot, as evidenced by my empty hook_boot example for path.module
So I think it's actually a documentation/naming issue. module_exists() should be module_is_loaded()
#19
Adding to my issue queue (catch thought I should have a look at it, so... ;)).
#20
@catch: module_exists returns TRUE if the module is both installed and enabled. That is what it is being used for throughout Drupal core. I don't understand the reasoning to rename it.
Module_list returns the list of enabled modules. Some of that list has special meaning in the bootstrap phases. The documentation for it should state a list of enabled modules rather than loaded to give it proper definition. Even bootstrap uses it to get a list of modules to load. It doesn't use it to get a list of modules loaded.
#21
Ah, okay, I think I finally have an explanation that hopefully we can all agree on:
@catch, you are right, module_exists() actually does behave more like "module_is_loaded" in this instance... but it doesn't always. It's a very fragile house of cards that allows this to occur.
For example, I can easily make module_exists() go back to its bootstrap behavior even after all modules are fully loaded. Try this:
<?php
// Full bootstrap.
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
// This works as expected.
if (module_exists('system')) {
print 'The system module is loaded... ';
}
// Retrieve a list of modules.
$junk = module_list(TRUE, TRUE);
// Uh oh.
if (!module_exists('system')) {
print 'But Drupal suddenly forgot that it is!';
}
?>
Or the opposite:
<?php
// Minimal bootstrap.
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE);
// This works as expected.
if (!module_exists('system')) {
print 'The system module is not loaded yet... ';
}
// Retrieve a list of modules.
$junk = module_list(TRUE, FALSE);
// Uh oh.
if (module_exists('system')) {
print 'But Drupal suddenly thinks it is!';
}
?>
The behavior of module_exists() depends on the past state of function calls to module_list() -- and as stated above, the behavior of module_list() itself does so as well. This is double-plus-ungood.
This whole thread is a poster child for why unit tests are a good idea, I think ;) I can guarantee you that module_list() would fail any unit tests written for it right now.
#22
@David: That is why I advocate for module_list to have two caches, one for the full list of modules and one for the bootstrap list. If $refresh is true it sets both lists from the full query. But if $bootstrap is TRUE only that one will be returned. The $fixed_list parameter is a special case for being able to call drupal_load('module', 'system') and drupal_load('module', 'filter') from places like install.php, update.php and includes/theme_maintenance.inc. However, I can't find from the drupal_load function that it calls module_list and I think we can do away with the special case of the $fixed_list parameter and simply call the drupal_load for the two modules.
#23
@earnie, with regard to your comments in #12 and #22:
I'm pretty sure that's what my patch in #10 does (?). $list[TRUE] is an array of bootstrap modules, and $list[FALSE] is an array of all modules. When you call the function you will always have $list[$bootstrap] returned, which is what you want. Your idea of using a separate $bootstrap_list variable would certainly work too (and would probably be more readable code, although at the cost of some code duplication, I think).
That certainly seems reasonable. (Although I'm not sure if it would have a practical effect, since I suspect there is never a situation where the first call to this function during the course of a page load is with $bootstrap = FALSE?)
You know, you may be right. I think module_list() does get indirectly called somewhere during the early stages of install.php -- there might be a module_hook call somewhere within one of the functions called by drupal_maintenance_theme(), I think? However, I strongly suspect that those hooks are not intended to be fired during the install process... especially since system and filter are the only modules loaded then, so hooks are probably pretty pointless. So it may indeed be possible to get rid of $fixed_list, which would be very nice.
I'll have more thoughts related to the above in a moment.
#24
OK, I have been doing some thinking and I am now pretty convinced that there is no way to fix the problems with module_list() without also dealing with the module_exists() vs. module_is_loaded() distinction. There are just too many places in Drupal that assume that module_exists() behaves one way or the other... and all these places can (and will) break when they are reached during install.php and friends.
Thus, I've been thinking of the following totally new plan:
The above might be a fair amount of work, but I think the result would be a much cleaner, clearer and simpler system that "just works" correctly and is not prone to bugs. I probably do not have time to work on this for a little while, so I wanted to just throw it out there now for comment. I have not thought through every detail of this, so if there are holes to be poked in it, please poke them ;)
#25
I also just came across #116820: module_list review, which seems relevant? I haven't look through it in depth, but I wonder if there are ideas there that can be merged with those here? @earnie, being the author, I guess you would probably know... ;)
#26
David, very nice writeup. I rely on drupal_get_filename in this patch. I believe it's one nice cleanup. Look at how special cases collapse into 1-2 function calls.
#27
Installed fresh drupal and applied CHX's patch in post #26.
Installed successfully.
Listed modules successfully.
Enabled profile and statistics module successfully
Added article and page nodes successfully
Statistics module appears to be working except for top visitors. May be un-related to this patch, but this module was specifically mentioned when asking how to test. Missing table 'access'.
user warning: Table 'drupal7.access' doesn't exist query: SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = 'host' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname, a.uid, u.name, ac.aid ORDER BY hits DESC LIMIT 0, 30 in /home/mike/projects/drupal/drupal7/drupal/modules/statistics/statistics.admin.inc on line 87.
#28
I think you just discovered a bug in statistics -- access table was used to store the access rules which is now removed. As the accesslog table is there, it's not that I broke module install...
#29
I removed one unnecessary hunk (remnant from an earlier attempt using drupal_load) and remove one line of code: the drupal_get_filename from module_load_minimal . Heine pointed out that drupal_load will call drupal_get_filename and if the DB is down then it will fall back to file search which is fine. But, I needed to fix that with a check whether the db system is even loaded already :)
#30
Issue for statistics breakage here: http://drupal.org/node/248436 (sorry, my fault).
#31
drupal_get_schema will include every schema there is, even disabled modules. Stay tuned.
#32
@David: I've marked the #116820: module_list review as duplicate referring to this node. Some of the ideas have already been redone.
@Chx: Thank-you for your revamp patch. I'm waiting for the stay tuned and will be giving it my reviews. I'm glad people with intimate knowledge of Drupal core has given this a closer look.
#33
Wow, thanks @chx! This looks really great.
I too am waiting for the "stay tuned" in #31, but in the meantime I found a couple things that need work with the patch in #29. I won't try revising the patch now, since I'm waiting to hear what the changes in #31 mean for all this. But:
bootstrap = 1to always appear first in the list, regardless of their weight (since they're the first ones that get inserted in the drupal_get_filename() cache). So a standard call to module_list() will not always return the modules in the correct order.array('module_1' => FALSE, // Stored this way by module_fetch_all() to indicate an enabled module.
'module_2' => TRUE, // Stored this way by drupal_load() to indicate a loaded module.
etc...
);
I think that keeping track of this difference would allow things to be even simpler and more flexible, with even fewer special cases to worry about... any time you wanted to know if a module was (a) enabled or (b) both enabled and loaded, you could always get exactly what you want.
#34
Is this something we can write tests for now? :)
#35
I used drupal_load instead and defer loading when that's needed.
#36
Now, path tests pass with one failure regarding alias deletion which has hardly anything to do with this patch -- before the patch, they failed on every 'alias works' test. Alas, I still a cache reset possibility -- when you remove modules, that is not picked up by this approach. I have fixed the "bootstrap modules first" problem. Also, I forgot about update.php that is fixed, too. I have not implemented the distinction between the 'fetched' and the 'loaded' states as I see no use case for it. It'd be easy if it's needed, though, one just needs to keep a separate list in drupal_load.
#37
First, this patch needs more php docs. This would help me figuring out what some of the params are doing. But here’s what I can determine so far…
This patch modifies/simplifies the API for
drupal_load()andmodule_list()and addsmodule_load_minimal()andmodule_fetch_all().1. I think
module_fetch_all()needs:<?php/**
* Fetches a list of modules from the database.
*
* @param $bootstrap
* Whether to fetch the reduced set of modules loaded in "bootstrap mode"
* for cached pages. See bootstrap.inc.
* @param $force
* Force module list to be reloaded from database.
*/
function module_fetch_all($bootstrap = FALSE, $force = FALSE) {
?>
2.
drupal_load()needs its$modedocumented. Here’s my stab at it:<?php/**
* Includes a file with the provided type and name.
*
* @param $type
* The type of item to load (i.e. theme, theme_engine, module).
* @param $name
* The name of the item to load.
* @param $mode
* Whether to load the item(s) immediately, defer loading the item until
* later, or to forget that all the items of that type were loaded.
* @return
* TRUE if the item is loaded or has already been loaded.
*/
function drupal_load($type, $name = NULL, $mode = DRUPAL_LOAD_IMMEDIATELY) {
?>
3. drupal_web_test_case.php: there should be one extra space before
// We might have removed some modules.4. If
drupal_load('module', NULL, DRUPAL_LOAD_DEFERRED)is called then all the modules are included. This seems contrary to passing in DRUPAL_LOAD_DEFERRED.<?phpif (!isset($name)) {
if ($mode == DRUPAL_LOAD_DEFERRED) {
foreach ($files[$type] as $filename) {
include_once "./$filename";
}
}
?>
5.
module_load_minimal()is much better than copy and pasting code 3 times into theme.maintenance.inc, install.php, and update.php. +16. What’s with the update to
simpletest_entrypoint()? This seems unrelated to this issue.7. In
module_fetch_all(),'AND bootstrap = 0'should be'AND bootstrap = 1', no?8. When running Path and System simple tests before the patch:
Path: 76 passes, 4 fails and 0 exceptions.Alias works. at [modules/path/path.test line 38]
Changed alias works. at [modules/path/path.test line 49]
Alias works. at [modules/path/path.test line 89]
Changed alias works. at [modules/path/path.test line 98]
System: 85 passes, 0 fails and 0 exceptions.
After the patch:
Path: 79 passes, 1 fails and 0 exceptions.Alias was successfully deleted. at [modules/path/path.test line 120]
System: 82 passes, 3 fails and 0 exceptions.
Module "aggregator" is enabled. at [modules/system/system.test line 43]
Module "translation" is enabled. at [modules/system/system.test line 88]
Module "locale" is enabled. at [modules/system/system.test line 88]
#38
Thanks for the through review, the documentation and drawing my attention to the fact that system.test has a lurking
module_listreset. And that's necessary here as well because some other Drupal have changed the module list and we need to resync. In normal operations, such resets will be much less necessary than they were before. I simplified howdrupal_loadworks with deferred stuff and I have documented everything stressing that all this new stuff to drupal_load is not something people should do. Finally, I have enrolled http://drupal.org/node/211439 into this patch. system test pass and path has a single, unrelated failure.#39
subscribing. will review tomorrow.
#40
note that the aforementioned single unrelated path test failure is fixed in http://drupal.org/node/251631
#41
I found two unrelated hunks in the patch. Removed. No other change.
#42
I reviewed and tested this patch. (Sorry, I had intended to do this earlier!). Overall looks very good, and everything seems to basically work.
I'm attaching a new version which includes a few changes to the documentation (no code changes, though, so it doesn't matter if anyone else started testing the earlier version):
Now some comments/suggestions about the code:
<?phpfunction module_load_all($bootstrap = FALSE) {
module_fetch_all($bootstrap);
drupal_load();
}
?>
and bootstrap_invoke_all() as:
<?phpfunction bootstrap_invoke_all($hook) {
module_load_all(TRUE);
module_invoke_all($hook);
}
?>
Which looks a lot cleaner anyway.
<?phpif ($sort) {
// Only do the sorting if the list has changed since the last time we sorted.
if ($stored_list != $list) {
$stored_list = $list;
ksort($list);
$sorted_list = $list;
}
else {
$list = $sorted_list;
}
}
?>
and then we need
static $stored_list = array(), $sorted_list = array();at the top of the function too. On the other hand, I'm not sure this cache is even necessary?! I can only find one place in core where module_list() is actually called with $sort = TRUE (in user.admin.inc).$list = drupal_load('module', NULL, DRUPAL_LOAD_NONE);just become$list = drupal_load('module');?$files[$type][$name] = $filename;near the bottom of drupal_load() is called regardless of whether the file is being loaded immediately or deferred. If this is correct, we should probably document that better, since I found it pretty confusing when reading through the code! It also probably affects the documentation of module_load() and module_exists()... we probably should explain more carefully in those functions exactly what is being returned. I think this is somewhat important, since this seems to be the whole reason that this patch actually allows path aliases to work again.#43
It also occurs to me that since drupal_load() is kind of a crazy function with all sorts of parameter combinations intended for internal use only, it might be good to take that code and put it in a private function, with a few public wrapper functions intended for calling it in its various ways. That would make the code more readable and make it harder for modules that might be using this function to make any mistakes.
I don't think that necessarily needs to be part of the current patch, though.
#44
Excellent review, David. I had a look at the patch and I have some additional suggestions.
1. In the code comment of DRUPAL_LOAD_DEFERRED, I would explain what it means to load a module later, and when that should (or shouldn't) be used. The current code comment doesn't answer these questions and I'm left wondering why I'd want to defer loading a module ... *confused*.
2. The explanation should probably be repeated in drupal_load()'s comments.
3.
+ $list = drupal_load('module', NULL, DRUPAL_LOAD_NONE);looks somewhat "ugly" -- it wasn't self-explanatory and could use a code comment (if not a more intuitive API).4. I wonder if we could write a couple of Simpletests that use (aggressive) page caching, and that use the new API to check the number of loaded modules. Let's set the example and start writing tests.
It looks like this patch is getting close to be RTBC.
#45
Here is some possible text for Dries' first two suggestions (it's not so easy to explain, I guess... I struggled figuring out how to write this):
<?php/**
* Mark the file to be loaded later on in the bootstrap process.
*
* This is used when information about enabled modules needs to be
* retrieved from the database before the module code itself is actually
* required. This method prevents unnecessary loading of files if the
* bootstrap process is cut short.
*/
define('DRUPAL_LOAD_DEFERRED', 2);
?>
And then in the drupal_load() comments:
* @param $mode* Whether to load the file immediately (DRUPAL_LOAD_IMMEDIATELY, the
* default), mark the file to be loaded later on in the bootstrap process
* (DRUPAL_LOAD_DEFERRED) or just reset the internal cache of files
* (DRUPAL_LOAD_RESET). These are for internal use only; do not specify
* this argument. DRUPAL_LOAD_DEFERRED is used when information about
* enabled modules needs to be retrieved from the database before the
* module code itself is actually required; this prevents unnecessary
* loading of files if the bootstrap process is cut short.
#46
I think we should really postpone this past the registry. As we were wroking on that , all this deferring problem is solved: we will not load modules, no need.
#47
So, here we are, the registry is in! Stay tuned, I will post more today.
#48
I have written http://drupal.org/node/259412 which as a side effect fixes this too :)
#49
#259412: Total module system revamp is stalled and this issue is blocking dbtng now.
#50
I was just about to post a new issue for this, but here's what I was going to say:
I really think the fix is to just trash module_list() and make 2 functions. That will simplify the code here and in various other places, too.