Updates run in unpredictable order
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | update system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | active |
I'll leave it up to others to decide if this is a bug or not and if it's critical or not and if it should be fixed in D6 or not.
The contrib module updates in update.php do not run in any predictable order, it's not alphabetic and it's not by weight, and it's not by dependency, it's seems to be a pretty random order. I ran into this while working on fixes for the D6 updates for CCK. When CCK is updated, the content module is one of the last to be updated, even though it is first alphabetically and everything else is dependent on it. The fieldgroup module is one of the first to update even though it has a weight of 9 and the others are zero.
We just fixed the install process so modules install in dependency order, it seems like updates should also run in dependency order. The update will be a harder fix than the install though because updates now run on uninstalled as well as installed modules so the dependency must be observed even when the modules aren't installed.
This is not a new problem in D6, but it has new implications now that modules that are not installed will get updated.
The problem will be mitigated, a little, by http://drupal.org/node/208602, if it gets in, so developers can prevent updates from running if some important previous step hasn't happened yet, but that is not a total fix.

#1
KarenS: modules which are not *enabled* are updated, not modules, which are not *installed*. I don't remember having updates running in predictable orders anytime.
I wonder how dependencies will be resolved for disabled modules (which could depend on modules which are not even in the file tree, so we cannot track their dependencies).
#2
@gabor, right, I mis-spoke, disabled modules are updated, not modules which are not installed.
#3
OK I have no idea how to fix this issue in core, but maybe this might be a potential workaround for cck.
There was a unique index introduced after RC1 that had to be rolled back and replaced. Because we can't know what the schema is in core and had to deal with updates with two potential schemas, a variable_set() was put into the first update, then subsequent behaviour was dependent on that. This effectively makes one update dependent on another. If you know exactly which module requires the update to have been run already, the respective update function could also delete that variable. If it's one update dependency for multiple dependent modules, then I guess the worst that happens is a stale variable gets left in the table.
Patch and discussion is here upwards: http://drupal.org/node/164532#comment-679172
Also, I don't see that this is that much of a different situation to the 4.7 cck to 5.x core content type creation (whch iirc relied on users following instructions correctly), or 5.x update status to 6.x core update module - which requires a full uninstall run on the contrib before upgrading. Obviously it's better not to have to worry about users following instructions, but both came to mind reading this.
#4
Well, we never had ordering / dependency support for updates, and this is indeed sad to be missing. But we did live without it so far and being a totally new feature to implement I'd suggest you depend on some existing technology to implement it when you need:
- http://drupal.org/node/208602 is not committed, so #abort-ing updates of one module is possible
- you can check the schema version of a module in your update functions, to ensure that a module you depend on already run its updates *to the point you require them*
- you need to have your users run update.php multiple times to get through this, or we can build in something automated at the end of update.php to look again for update functions to run, but that would possibly go into an infinite loop with #abort-ed updates being tried to run again.
Looks like you can do this with the current tools, and no dependency ordering would help you require specific schema versions, which you might not get with #abort taking effect even with some kind of consistent update ordering.
Or to put it differently, if module B updates run after module A updates, but module A aborted some of its updates, you are screwed again. You need to look for the schema version anyway, which requires some specific code on your part, but leads to the expected result instead of more failures in this case.
#5
gabor - you correctly say that #abort can cause problems but i think this will be very rarely used if we guarantee the order in which updates are run. modules with pending updates should be considered unreliable until their updates are run. we are reducing the meaning of dependency to mean "except when running update.php".
my inclination is to fix this for 6, even at this late hour.
#6
Moshe: so you would not run updates for modules which depend on modules for which updates were not run / are not runnable? I think would be just a small step in the right direction, since often we make assumptions about a specific schema version of a module being already there, but I am looking forward to non-disruptive ways of solving dependencies in the update as well. There is no such feature to mark this as a bug of, so at best, this can be made a task.
#7
"There is no such feature to mark this as a bug of, so at best, this can be made a task." - one could say it is a bug in the dependency feature.
I'll ask a naive question - how come we can guarantee module run order during install and not during update. i would think they are the same problem. karen says "the dependency must be observed even when the modules aren't installed". whats hard about that? dependencies are declared in .info file which is always safe to load.
#8
The difference IMHO is that in install we expect a clean slate, while in update, we always go from one version to another, so the beginning of the road is also "unknown" which results in more error conditions. Lamenting on whether this is a bug or not, does not move this forward anyway. Dependencies were never a feature in updates, so I would not call them a bug, but I am not interested in arguing about it, so let it be.
#9
Moshe redirected me to this issue from #220945: patch: drush mm enable/disable/dot/uninstall where I try to enable/disable modules. It was a too quick patch for drush, but atm i'm reverse enginering D5 to make it work ;-)
Afaik this dependencies stuff is about directed graph.That is walk through the graph start to end or the other way around (depending on dependency or reverse dependency) according to a 'Topological Sorted List' and halt on failure.
Whether installing modules or enabling or updates (backwards) disabling/uninstalling (forward) them. For a graph example see the aforementioned issue or http://build2be.com/files/build2be.com/ryan.png (ubercart).
I want to contribute to this issue but don't know how to start. So some feedback is welcome.
Regards,
Clemens Tolboom
#10
Talked @ FOSDEM to Gabor and Gerhard. Their advise was to first patch D7
... so i will try to deliver a patch for D7
... guess i have to provide this for install.php, update.php and module.inc
... that will take some time :-/
#11
Hm, there is a graph implementation already in the install code as chx pointed out (he wrote the code). Look into what happens when you enable a module which has dependencies, all dependencies are nicely installed as well on the web interface in Drupal 6. So drush could use that in Drupal 6 or even backport that code for Drupal 5 for use. The same is not run on running updates though, and this is what this issue is about. I am not sure how the drush operations relate to update ordering.
#12
Update.php function update_batch() has a simple for each in preparing the batch. This should not be the case!
$operations[] should contain the batches in a topological sorted list (TSL) order. This way a dependant module gets the 'right' updated dependency when updating itself.
In order to do this some changes should be made to modules.inc to provide a TSL of all modules choosen to update.
Ie function module_get_tsl( $modules = array()) which will according to 'dependencies' builds the right order out of the given list.
I've done this in my drush patch for mm for enabling/disabling/uninstalling and it works great.
I'm in preparation to document this idea.
#13
I documented my thoughts http://build2be.com/content/module-management-drupal without getting much response.
Just talk to chx a little about DFS, TSL and yes I could try to provide a few patches. But what I want is a little more: rewrite install.php, update.php, module.inc and system.inc to streamline all module related code into module.inc. And I guess that's to far fetched to do.
I made my drush patch for D5 a real module drush_mm (not only) to make it easier to test this issue against.
@KarenS What is the recipe to reproduce this issue?
#14
We ended up using the #abort solution for cck field modules - all contrib fields should do the same :-( See #304813: CCK module developers read this!!
#15
In #310898: Making module_enable and module_disable API functions module_enable and module_disable do not run in a unpredictable order. I implemented a TSL based on the modules dependency graph. Maybe this can be used here too?
#16
If you need a workaround to ensure that a module like CCK has its updates run first (without requiring other dependent modules to add any code of their own), you could just do that via hook_form_alter. For example:
<?phpfunction content_form_alter(&$form, $form_state, $form_id) {
if ($form_id == 'update_script_selection_form') {
// Preserve the Drupal core behavior where system.module's updates run first.
$form['start']['system']['#weight'] = -100;
// Make sure CCK's updates run next, before any others.
$form['start']['content']['#weight'] = -99;
// (You could clean up the above code to make this more robust, but it's probably good enough as is, since other modules will default to #weight = 0.)
}
}
?>
It's really ugly, but as far as I can tell from testing it, it works fine. Seems like it might be a much better approach than #304813: CCK module developers read this!! (is it too late to matter)? You could also probably use
unset()statements in the above code to prevent dependent modules from running updates if they are not enabled (the other issue that CCK is facing).(I've been banging my head against this all day for Acquia, where we are considering in general how we might have to deal with complicated update dependencies in the Acquia Drupal distribution, and so far this is the solution I have.)
#17
Presumably a real solution would go in Drupal 7 now.
#18
In thinking about a way to fix this for D6 that involves minimal changes, one possibility is that Drupal could continue to run updates exactly as it does now, but just before running them, it could simply invoke something like drupal_alter() on the list of module updates. This would allow CCK (or any other module) to implement a function in its install file that reorders the updates, enforces dependency checking, etc. For an example, see the patch I posted at #304813: CCK module developers read this!!... which more or less does this already for CCK, but since there is no hook it needs to use a very sketchy method to "intercept" the list of module updates, and that is not really good.
Any thoughts on whether something like this kind of drupal_alter() could make it into Drupal 6 as a bugfix for the issue here? It would be completely nonintrusive (would not change Drupal core's behavior at all), but it would be adding a new hook to a stable release... not sure if that's a no-no or not. (Note by the way that it couldn't use "drupal_alter" exactly, since the hook would need to be called for disabled modules also. But that's not really a big deal.)
#19
From the complex upgrade experiences we have it looks like it would indeed be a good idea to try and fix this in a backwards compatible way. We should however sketch up a few scenarios. How would multiple modules "interact" on the chain of alter calls to form a list which is fine for all? To me it sounds like the modules know rules that "I should come before these other modules" (and the list of those other modules is dynamic depending on the site, eg. all field modules); or "I should come after this module" (eg. after a base module of a module suite). If we only provide a simple alter, modules altering the rules might do changes which go against requirements set by previous modules possibly, right?
#20
Could we perhaps use the dependency system here - so core runs first, then modules with no dependencies, then modules which depend on those modules, then modules which depend on them etc. etc. - Are there any situations where that pattern would be inappropriate?
#21
Yes, if modules run updates on Drupal 5 database tables for example on a Drupal 6 site. Take image module. It does not have any 5.x-2.x table release, but it does include db updates for 5.x-2.x in the Drupal 6 version. So you are likely used 1.x stable and now update to 6.x, but all your images will be lost, since the image module tries to run updates which should be run BEFORE system module's updates (system module updates the core tables). Although system module does not depend on image module in any way :)
#22
Well, to some extent, this is a potential issue with many Drupal hooks, right? Any module that implements something like hook_nodeapi() or hook_form_alter() can do a whole lot of damage to other modules if it chooses to :) So maybe the solution is just to say that any module implementing this hook should be careful to only affect modules it actually cares about, and if it doesn't, it should be filed as a bug against the module implementing the hook...
For example, the code I proposed in #304813: CCK module developers read this!! is definitely too "greedy" in that it forces system.module to always run first and content.module to always run second. It should instead be rewritten so that it checks module dependencies and only sorts CCK to be in front of modules that depend on it, rather than in front of any module at all.
Perhaps to facilitate this process, the results of module_rebuild_cache() could be passed along with the hook, so that modules have an easy reference to figure out module dependencies they might need to know, etc.
#23
Well, the consequences of wrong update function order are a bit farther reaching then the consequences of nodeapi hooks not runnning well. Anyway, let's see a dependency based update function order.
#25
Marked #168018: Module dependencies not enforced on update as a duplicate, subscribe, and IMHO this is critical.
#26
Not sure this is "we cannot ship without this fixed", but it is really close.
#27
For me the solution looks quite trivial :0 Let me illustrate this by an example (guess that's not complete so please reply)
Say we have just _installed_ a new filefield and enabled it and _afterwards_ installed a new drupal version _and_ updated views. Futhermore the *.install have a new
<?phpfunction hook_install_dependencies
?>
<?phpfunction filefield_install_dependencies() {
return array(
// ie patch for new views schema change
"6003" => array(
"views" => array(
"after" => "6005"
"before" => "6006"
),
),
);
}
?>
<?phpfunction views_install_dependencies() {
return array(
// ie patch for new node teaser short field
"6005" => array(
"node" => array(
"before" => "6025"
),
),
);
}
?>
By laying out the dependencies named in
hook_install_dependenciesas 'module' - 'schema-version' together with the current and latest version and adding all updateable modules we get a graph like below (http://drupal.org/files/issues/update_0_0.png)Having a graph we can run update.php on it. This example shows a problem though. FileField was _installed and enabled_ before the updated views version was installed. To solve this extra complication the
function hook_install_dependencies()must be used with installing new modules too. That way the filefield module wasn't even enabled before the new views arrived.For module dependencies _without_ a
hook_install_dependencieswe could use the module dependencies graph which is illustrated with the taxonomy and forum module.#28
A better hook name would be
hook_schema_dependenciesbecause it's about schema (version) dependencies.#29
Because we already use the metadata from the docblock during the update process, why not putting @requires tags in that:
<?php/**
* Upgrade FileField to Drupal 7.
*
* @requires content:7001
*/
function filefield_update_7001() {
// Do your stuff
}
?>
#30
In #27 the image is broken so here is a new version.
@29 Good point but we need more then require right? Hope the image helps
#31
I like the docblock suggestion. But do we need #393068: Make the registry class hierarchy, interface and docblock aware for that?
Anyway: a given module must tell _both_ a require and a _reversed_ require. Better words are maybe: depends and rdepends. It is ie not up to views to tell filefield what schema(s) to use. See #27
Anyway ... I'll try to device a test for this first. graph.inc does not have a Topological Sorted List TSL) but a weight assigned. That's kinda random TSL :p