Problem/Motivation

there seem to be several cases where the module_implements() cache is polluted with incomplete info:

Proposed resolution

clear the static cache at the end of the full bootstrap and prevent the persistent cache from being written to prior to full bootstrap completion

Remaining tasks

N/A

User interface changes

N/A

API changes

N/A

Data model changes

N/A

And there are different opinions on how to solve it:

  1. Just document "you can't do this"
  2. #578044: move module_invoke_all() out of the reach of hook_boot() (though I'm not sure if this fixes the 2nd point)
  3. Throwing an exception when you do this (#10 / Damien Tournoud)
  4. previous + find a way then to be able to add hooks to the list of bootstrap hooks (#11 / Jose Reyero)

In the meantime, the patch uploaded in #14 fixes just the caching of incomplete info.

Original issue report

I can't understand the logic of module_implements() function.

Look on definition:

/**
 * @param $hook
 *   The name of the hook (e.g. "help" or "menu"). */
...
function module_implements($hook, $sort = FALSE, $refresh = FALSE) {
...
  if (!isset($implementations[$hook])) {
    $implementations[$hook] = array();
    $list = module_list(FALSE, TRUE, $sort); // second param is $bootstrap = TRUE
    foreach ($list as $module) {
      if (module_hook($module, $hook)) {
        $implementations[$hook][] = $module;
      }
    }
  }
/**
...
 * @param $bootstrap
 *   Whether to return the reduced set of modules loaded in "bootstrap mode"
 *   for cached pages. See bootstrap.inc. */
...
function module_list($refresh = FALSE, $bootstrap = TRUE, $sort = FALSE, $fixed_list = NULL) {
...
      if ($bootstrap) {
        $result = db_query("SELECT name, filename, throttle FROM {system} WHERE type = 'module' AND status = 1 AND bootstrap = 1 ORDER BY weight ASC, filename ASC");
      }
...
/**
 * Define the critical hooks that force modules to always be loaded.
 */
function bootstrap_hooks() {
  return array('boot', 'exit');
}

PROBLEM:
1. I'm calling i.e. module_implements('help') expecting list of all modules that implement that hook
2. module_implements calling module_list with $bootstrap param = TRUE?
$list = module_list(FALSE, TRUE, $sort); // second param is $bootstrap = TRUE
3. So instead of receiving all modules with hook_help, I'm getting modules with this hook only which are registered as $bootstrap only

It's also related to: #222109: module_list() should be rewritten as 2 functions: module_list() and module_list_bootstrap()

CommentFileSizeAuthor
#124 496170-124.patch1.24 KBmarco
#120 496170-120.patch9.12 KBgreenSkin
#104 496170-70.patch1.77 KBstefan.r
#94 496170-94.patch10.65 KBDavid_Rothstein
#79 496170-79.patch7.27 KBstefan.r
#79 interdiff-70-79.txt2.09 KBstefan.r
#70 496170-70.patch5.09 KBstefan.r
#70 interdiff-65-70.txt716 bytesstefan.r
#67 Screen Shot 2015-07-16 at 12.09.10 PM.png91.73 KBrwohleb
#65 496170-65.patch5.02 KBstefan.r
#63 496170-63.patch4.9 KBstefan.r
#63 496170-63-testonly.patch3.65 KBstefan.r
#60 496170-59-testonly.patch2.92 KBstefan.r
#60 496170-59.patch4.17 KBstefan.r
#58 496170-57.patch1.25 KBstefan.r
#56 496170-56.patch1.24 KBstefan.r
#53 496170-50.patch1.22 KBstefan.r
#51 496170-51.patch1.4 KBstefan.r
#50 496170-50.patch1.22 KBstefan.r
#48 496170-48.patch1.15 KBstefan.r
#46 496170-46.patch1.12 KBstefan.r
#44 496170-44.patch1.22 KBstefan.r
#27 interdiff.txt637 byteschx
#27 496170_27.patch2.04 KBchx
#26 496170_26.patch4.3 MBchx
#24 module-implements-boot-496170-24.patch2.01 KBjbrauer
#23 core-module-implements-boot-496170-23.patch4.37 KBjbrauer
#22 module-implements-boot-496170-22.diff2 KBFabianx
#19 module-implements-boot.diff1.81 KBFabianx
#14 496170-14.patch3.39 KBroderik
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

kenorb’s picture

It causing in same cases WSODs and wrong rendered pages.
Read more: #496198: module_list() called with wrong arguments causing WSOD and breaking theme_registry

Heine’s picture

When are you calling module_implements? If a full bootstrap finishes, module_list will have been called with refresh TRUE, bootstrap FALSE, and module_list will serve items from the static cache (with all modules).

Damien Tournoud’s picture

Version: 6.x-dev » 7.x-dev

The logic is strange and convoluted (as it relies on module_list() ignoring the bootstrap parameter as soon as Drupal is fully bootstrapped), but it does work. Let's fix that in D7 first.

kenorb’s picture

#2
In case when modules not touch module_list() yes, it's working.
Function is in API, so it can be used:
http://api.drupal.org/api/function/module_list
(used by CCK, admin_menu, etc.)
Documentation don't mention anything that you will get invalid list (bootstrap list) when you request for non-bootstrap list.
And it have different behaviour during boot and init mode.

Even if you calling:
module_list(FALSE, TRUE, $sort);
you could get different lists, one time bootstrap, another time non-bootstrap.
It could be called and rebuild during shutdown (when static variables are cleared), when some module want to invoke some hook (module_invoke_all) or even from error handler (hook_watchdog).
When it will be rebuild, it will have bootstrap list, instead of full list of modules.

kenorb’s picture

#2
module_implements() is called each time when watchdog is called, and watchdog could be called anytime from error handler.

function watchdog($type, $message, $variables = array(), $severity = WATCHDOG_NOTICE, $link = NULL) {
...
  foreach (module_implements('watchdog', TRUE) as $module) {
kenorb’s picture

In 6.x the fix should be simple:

$list = module_list(FALSE, FALSE, $sort);

instead of:

$list = module_list(FALSE, TRUE, $sort);

Because module_implements() should checking all hooks, not only bootstrap_hooks and this function could called any time (#5).

Heine’s picture

As to #5, calling a hook in a module that _has not been loaded_ will not work. theme() use in watchdog listeners is wrong. A possible solution is to load modules implementing hook_watchdog during the bootstrap.

Damien Tournoud’s picture

Status: Active » Closed (won't fix)

module_implements() rely on the fact that module_list(FALSE, TRUE, $sort) will return:

  • the bootstrap list while Drupal is not fully bootstrapped
  • the full list as soon as Drupal entered DRUPAL_BOOTSTRAP_FULL

Obviously this is not very elegant, but I don't see any bug or bad logic here. This function has been fully refactored in Drupal 7, so I'll simply won't fix this.

Jose Reyero’s picture

Status: Closed (won't fix) » Active

I'm sorry but I think this needs to be reopened.

The problem is that when any drupal hook is invoked from any module's hook_boot(), the results for module_implements('module_implements_alter') are cached into the database. At this time, a module that implements the hook and is not yet loaded will never see it being called.

Since this 'module_implements' cache can be rebuilt on random pages this is extremely difficult to trace as sometimes the module may be already loaded, sometimes not.

Step by step:
0. Module invokes any hook from hook_boot().
1. module_implements() is called.
2. If the list of implementations for that hook is not yet cached, then it is rebuilt, which ends up in calling drupal_alter('module_implements', $implementations[$hook], $hook);
3. This causes a call again to module_implements, to find out which modules implement hook_module_implements_alter()
4. This list of implementations is saved to the database at the end of the request.

This could be fixed either by adding 'module_implements_alter' to the list of bootstrap hooks or by not caching $implementations['module_implements_alter'], since this is rarely invoked.

As an example, see #1267966: Pathauto patterns are not showing up

There are also other issues with this function, like the list of implementations for a hook depending on whether a it has been called from bootstrap first or not, which ends up on weird issues with random modules that are very difficult to track for module maintainers, since they are always caused by other modules calling hooks when they shouldn't.

Also these issues happen randomly, depending sometimes on whether the cache has been rebuilt on a page with full bootstrap and module init or not.

See also #1415278: Using entity_get_info in a hook_hook_info results in an incomplete module_implements cache.

Damien Tournoud’s picture

The problem is that when any drupal hook is invoked from any module's hook_boot()

This is not supported, period (and in addition it's not going to work). The rest describes how to protect against something that is not supported.

We should just throw an exception when you try to call a non-boot hook when Drupal is not fully bootstrapped.

Jose Reyero’s picture

I basically agree, though we should find a way then to be able to add hooks to the list of bootstrap hooks.

The problem with this 'not supported' behavior is that it works sometimes. And then if a module does the wrong thing, it ends up in someone's else issue tracker causing major maintenance headaches. So I agree 100% on either supporting it somehow or throwing an exception. But saying it's not supported and allowing it to work is really bad for module maintenance.

Also there's no mention of that not being supported on the documentation and since all module.inc functions are available on hook_boot()...:

http://api.drupal.org/api/drupal/modules!system!system.api.php/function/...

(Same for module_invoke_all(), module_implements(), etc.. )

Dave Reid’s picture

Seems like we should make bootstrap_hooks() accept a variable instead?

Anonymous’s picture

I think I am experiencing this bug when devel module and jquery_update are enabled on a site with core page caching enabled.

Both devel and jquery_update modules implement hook_library_alter, but only devel has a bootstrap hook (hook_boot). Sporadically, the site will revert back to core jQuery, which breaks scripts on the site that depend on the later version of jQuery.

I discovered this by waiting until the site exhibited this problem and displaying the output of module_implements('library_alter') in devel's PHP execution page. Only devel module was listed, even though jquery_update is enabled.

The workaround for it is to implement hook_boot in jquery_update OR to implement hook_boot in another module and load the jquery_update module on every page request.

roderik’s picture

FileSize
3.39 KB

So there seem to be two cases where the module_implements() cache is polluted with incomplete results:

Related to the 2nd point: #968264: hook_hook_info() can lead to an infinite loop

And there are different opinions on how to solve it:

  1. Just document "you can't do this"
  2. #578044: move module_invoke_all() out of the reach of hook_boot() (though I'm not sure if this fixes the 2nd point)
  3. Throwing an exception when you do this (#10 / Damien Tournoud)
  4. previous + find a way then to be able to add hooks to the list of bootstrap hooks (#11 / Jose Reyero)

I completely agree with Jose Reyero (#11) that something should be done about it -- more than just documenting it. Because even if it's not officially allowed, contrib modules are doing this - causing very hard to trace problems in unrelated modules.

But I don't really understand how adding a hook to bootstrap_hooks() would solve things. (I don't see that function being called from anywhere useful, to prevent cache pollution.)

In the meantime, I don't want to have to rewrite other people's hook_boot() implementations until I know where this issue is going... and besides: even though throwing an exception might be the correct thing to do, in practice it will break unsuspecting live D7 sites with contrib modules that have 'officially wrong' behavior that happens to not cause any problems.

I just want my cache to not become polluted. The attached patch fixes this.

I'm aware that this is not a 100% theoretically correct solution - because in e.g. calls to module_implements() from a hook_boot(), incomplete info is returned. But at least this incomplete info does not get cached. My specific problem gets fixed by it. (Namely: a PHP notice that is thrown every time a certain page is rendered, because module_implements() returns wrong results forever.)

Please consider this.

roderik’s picture

Status: Active » Needs review

.

Jose Reyero’s picture

@roderik,

But I don't really understand how adding a hook to bootstrap_hooks() would solve things. (I don't see that function being called from anywhere useful, to prevent cache pollution.)

If the hook is in bootstrap_hooks(), then modules implementing that hook are loaded for bootstrap, then problem solved, well, kind of...

I don't think not caching the results of module_implements() really fixes the problem either. Because then invoking a hook may have very different results depending on the bootstrap phase on which it is invoked, which creates another issue if the resulting values of that hook are cached before some module implementing them is loaded (An example is hook_entity_info()..)

So I think the good solution should involve a few steps:
1. Make bootstrap_hooks() accept a variable, as @Dave Reid suggests in #12
2. Modules providing bootstrap hooks updating that variable when installed / uninstalled.
3. Throwing an exception when invoking a non bootstrap hook from bootstrap.

Jose Reyero’s picture

Issue summary: View changes

Added summary for current state.

BrockBoland’s picture

Has anyone identified any other situations that cause this bug? I reviewed all our code and didn't find any cases
of the hook_boot() or entity_get_info() problems mentioned in the issue summary.

Part of the problem—and the reason that I haven't tried the patch from #14 yet—is that we haven't been able to replicate this bug consistently, so it would be hard to say whether the patch actually fixed anything. I'm reluctant to patch code so deep in core without really understanding what is happening and why.

roderik’s picture

FWIW: Another situation: running code in .module files (outside of functions), that call any hook. More detail in #2066205-7: "Error: Unknown condition" in Rule for custom inline condition

@Jose Reyero: Thank you for your very quick response. (I couldn't parse the info immediately. I want to set aside time to get back to this, I do!)

Fabianx’s picture

Priority: Normal » Major
FileSize
1.81 KB

Here is a different patch to solve this problem for the hook_boot() part:

- It uses two different static caches for the hook_boot() problem, so that an incomplete cache can never leak into the system
- That way caches are never written from the wrong implementation
- Both statics are populated with the cache_bootstrap cached implementation
(that is why this bug is seldomly reproducable, it only happens when cache_bootstrap is empty during bootstrap, e.g. after variable_set.

Performance for non-boot users

:

- one if - clause
- one function call to drupal_get_bootstrap_phase().

should be acceptable.

Lets see if this passes tests, also this is major if not critical as watchdog(), url(), drupal_goto() all pollute module_implements() cache and you can't know this from your innocent hook_boot() implementation.

Fabianx’s picture

Title: Bad logic in module_implements() ? » module_implements_cache() can be polluted by hook_boot() implementations calling module_invoke_all() directly or indirectly
Fabianx’s picture

Issue summary: View changes
Fabianx’s picture

That patch as innocent as it looks actually breaks the module_implements cache due to unset() on a static variable putting a new variable in scope, which was then used for $implementations, but only in the first call. The others used the hook_boot() one, leading to a mostly empty $module_implements cache entry ...

Here is a fixed patch that also makes the logic more easy to understand.

EDIT: Interdiff would have been something like:

-      unset($drupal_static_fast);
+      // This MUST not use unset() as unset only resets
+     // the variable for the rest of the function, but
+      // restores it again for the next request.
+     $drupal_static_fast = NULL;

jbrauer’s picture

Two very minor code style nits updated in this version of the patch no functional changes.

jbrauer’s picture

Ugh and of course the wrong patch file... Once again this time with the right patch file this is just a couple minor code style items.

The last submitted patch, 23: core-module-implements-boot-496170-23.patch, failed testing.

chx’s picture

FileSize
4.3 MB

We fixed #667098: drupal_get_bootstrap_phase() is broken so we can use drupal_get_bootstrap_phase, I changed that. I also do not understand the hunk with "future static" , why is that necessary / useful?

chx’s picture

FileSize
2.04 KB
637 bytes

Let's try this again.

The last submitted patch, 26: 496170_26.patch, failed testing.

Fabianx’s picture

#26:

Yes, once #667098: drupal_get_bootstrap_phase() is broken is not only RTBC, but actually fixed we can start using drupal_get_bootstrap_phase(), but can we do it in a follow-up to replace all of core's drupal_bootstrap(NULL, FALSE) with the right code? There is a lot!

- The future static is for the following scenario:

Given: Cache is warm and 'cache_bootstrap:module_implements' is fully populated

If we reach this function during bootstrap, we will do:

+       $implementations = $implementations->data;

BUT:

This implementations is pointing to the normal static used for bootstrap
and is not the final &drupal_static('module_implements')

+      // In case we don't have full bootstrap, yet, store the data also
+      // in the future static as that cache_get is always the full cache.

Here we then check if we don't have full bootstrap.

+      if (!$full_bootstrap) {
+        $module_implements_cache = &drupal_static('module_implements');
+        $module_implements_cache['implementations'] = $implementations;
+      }

and populate the final $implementations - what implementations will be when we reach the final bootstrap.

This is an optimization to avoid two cache_get()'s - if the cache is warm. A cache_get() is cheap, so I am okay to also remove this, if there is a concern.

In case it is cold, two cache_get()s are done, which is also good as another request might have populated the cache in the mean time.

Status: Needs review » Needs work

The last submitted patch, 27: 496170_27.patch, failed testing.

chx’s picture

Status: Needs work » Reviewed & tested by the community

#24 is good to go then.

joelpittet’s picture

I've rolled back this patch on a production server because I was having some issues with module_implements cache bin item balooning out of control and taking down the site and/or making memcached/redis fail.

I'm not setting this back to needs work yet, but may if that rollback turns out to solve this issue. For now though it has...

Fabianx’s picture

#32: Appreciate the feedback, but this does not change anything for the actual stored module implements cache, but just ensures it works properly. From hook_init() on the behavior is completely like without the patch.

I assume your problem is that the cache item grows too large, which might happen with too many modules. If rolling this back "fixes" it, then in fact your module implements cache is broken.

joelpittet’s picture

@Fabianx, I came to the same conclusion and was going to write to that this patch isn't necessarily the problem as well.

I think what this patch does is expose another issue that is deeper. The patch itself looks harmless, but so far rolling it back has produced 0 module_implement cache bin corruptions. I'll likely need to leave it for a couple of days to be sure and I will definitely report back if that error surfaces again. So far 12 hours of 0 issues. The big combo seems to be cron job + a small burst of traffic from what I can tell.

The cache item shouldn't grow out of control for 'module_implements'. I'd expect once those implementations are found and accounted for it would stay at that size after that even if re-built. I've cleared, truncated and even switched bins for cache_bootstrap between redis/memcache/database. All had the same corrupt/baloon problem.

I do have a lot of modules, it's a commerce site with features: ~350 active modules.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review

I think the patch looks good and really would like to get it in, but given the last few comments I think I should wait for more info to come back before committing this one. (It could be that the patch is correct, but exposes some other more serious problem. If so, it would be good to understand exactly why.) Sorry, this one just has enough potential to break things, given the code that it touches, that I think we need to be extra careful with it...

In the meantime:

+      // And clear any existing drupal_alter() cache.
+      drupal_static_reset('drupal_alter');

Later in the function the code which resets static caches does a lot more, such as drupal_static_reset('module_hook_info') also. Maybe this should explain why drupal_alter is the only one that matters here (I think the others aren't relevant because they either aren't used before bootstrap or are used but run no risk of being corrupted?).

+    // In case we are pre-bootstrap, ensure to use a different static
+    // implementation.
+    if (!$full_bootstrap) {
+      if (!isset($drupal_static_fast)) {
+        // This does not need a drupal_static().
+        $drupal_static_fast['implementations'] = array();
+      }
+    }

Isn't a key point here not just that it doesn't need a drupal_static(), but that it must not use the same one that the function normally uses because that's the one module_implements_write_cache() looks at when it's writing, but we never want the pre-bootstrap data to be written?

+      // In case we don't have full bootstrap, yet, store the data also
+      // in the future static as that cache_get is always the full cache.
+      if (!$full_bootstrap) {
+        $module_implements_cache = &drupal_static('module_implements');
+        $module_implements_cache['implementations'] = $implementations;
+      }

I had trouble understanding this comment also - something about "future static" and "full cache" is confusing. I think what you're trying to say is that because only data from the full boostrap is ever written to the cache, it is always safe to use data that comes from that cache throughout the remainder of the page request, so we can put it in the static variable that is used for the remainder of the (post-full-bootstrap) page request?

Fabianx’s picture

Thank you very much for the very careful review.

I agree with all the points.

stockliasteroid’s picture

We applied this patch on a site after experiencing the issues mentioned in #1934192: hooks disappear from module_implements cache, hooks disappearing from module_implements. After running this patch on that same site for about 24 hours the cache ballooned out of control and the machine ran out of memory. As mentioned in #33, this patch may not actually be causing that issue, but is instead causing an underlying problem manifested via #1934192 to cause the cache to grow out of control.

This site has far fewer modules, around 73 contrib. Without the patch each FPM process runs consistently with about 80MB of memory, with the patch each process eventually ends up pegging at over 200MB.

Just a bit of anecdotal evidence, we've still had no luck diagnosing the missing module_implements hook issue, but in the meantime we've reverted this patch.

Fabianx’s picture

#37: Interesting, what do you mean with: ballooned out of control?

Any numbers / dumps of what the cache included? Which cache item ran out of control? Can you reproduce the OOM on a dev machine or only on prod?

We run this in production on several sites, so it is a high priority for me to find any issues.

There is a trick you can use to find the problematic module_implements hook:

function mymodule_module_implements_alter() {
  if (drupal_get_bootstrap_phase() != DRUPAL_BOOTSTRAP_FULL) {
    trigger_error('module_implements called before full bootstrap.');
  }
}

If you have trigger error setup correctly on prod, this is even safe for production usage and will throw a fatal on dev environments.

stefan.r’s picture

@Fabianx just out of interest, which patch do you run, #24? And this is just to protect against faulty contrib modules?

Web-Beest’s picture

I've stumbled upon this issue while coming across this bug whilst implementing subfolders_domain (Issue https://www.drupal.org/node/2489108).

Patch #24 resolves the issue in my case.

stefan.r’s picture

Priority: Major » Critical

As mentioned by @Fabianx in #19 this is possibly critical, when only a limited set of hooks run this can lead to strange behavior including data loss.

stefan.r’s picture

FileSize
1.22 KB

If the problem is that the static cache for drupal_alter() and module_implements() can be different both before and after a full bootstrap (and shouldn't be called before a full bootstrap in the first place) maybe something like this could work?

- A trigger_error() if module_implements() is somehow called anyway
- Clearing the static cache for those functions before moving on to full bootstrap, just in case their statically cached values were based on the (reduced) list of "vital" modules only.

Status: Needs review » Needs work

The last submitted patch, 44: 496170-44.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.12 KB

Let's see what happens if we put the trigger_error() in module_invoke_all() instead

Status: Needs review » Needs work

The last submitted patch, 46: 496170-46.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.15 KB

...with an exception for the installer

Status: Needs review » Needs work

The last submitted patch, 48: 496170-48.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.22 KB

and update.php

stefan.r’s picture

Back to module_implements() to cover all possible cases.

Status: Needs review » Needs work

The last submitted patch, 51: 496170-51.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.22 KB

Back to #50 for now as apparently core itself isn't too strict either about using module_implements() before full bootstrap.

Fabianx’s picture

However its either #51 or #24 as especially the module_implements pollution is the problem.

And even clearing all static caches can break, because drupal_exit() could be called before full bootstrap.

However if you do:

- Remove the trigger_error - that won't fly for BC reasons
- Clear the static cache when we reach full bootstrap
- Ensure that the module_implements cache is never persisted to disk before full bootstrap

Then I think that could fly and is simpler than #24.

stefan.r’s picture

The persisting only happens in drupal_page_footer() (which is only called during full bootstrap), not in hook_exit(), so maybe the extra check isn't needed?

stefan.r’s picture

FileSize
1.24 KB

But just in case it is needed, here's a revised patch :)

Status: Needs review » Needs work

The last submitted patch, 56: 496170-56.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.25 KB
Fabianx’s picture

Issue tags: +Needs tests

Can we get a test for that, please?

I am also okay without a test, but if we could it would be great.

Patch looks nice to me :).

stefan.r’s picture

Issue tags: -Needs tests
FileSize
4.17 KB
2.92 KB

The last submitted patch, 60: 496170-59.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 60: 496170-59-testonly.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
3.65 KB
4.9 KB

Previous patches were missing .info files

The last submitted patch, 63: 496170-63-testonly.patch, failed testing.

stefan.r’s picture

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC, there are more issues related to this:

- When the result of an incomplete module_implements() is cached in e.g. entity_get_info() or the theme registry, but those should be follow-ups.

This patch is functionally equivalent to #24, but way way simpler. Great work!

Lets get this in.

rwohleb’s picture

I believe I found an additional edge case that needs a follow-up. If you have code that generates a PHP warning (such as "Call-time pass-by-reference has been deprecated") when the module file is first loaded, it can trigger a watchdog entry, which uses module_implements. The way the bootstrap system works, it has already marked it as a full bootstrap before it actually finishes doing the full bootstrap. As such, when the watchdog entry triggers module_implements early into the run of module_load_all, it thinks all modules are available even though some/many haven't been loaded yet. Tests against function_exists() fail and the corresponding hook implementations are left out.

The upside is that many hooks will remain unaffected since they won't get called until later, but hooks like "module_implements_alter" end up getting cached way too soon. You can end up with things like the token module no longer providing field tokens due to it's reliance on "module_implements_alter", and the hair-pulling experience of trying to track down why. To make it more frustrating, PHP warnings that are generated are dependent on how you have your error reporting level set, so replicating it can be frustrating.

I'm attaching a picture of the callstack for reference.

stefan.r’s picture

@rwohleb wouldnt that issue be solved by the patch in this issue?

rwohleb’s picture

Unfortunately the patch does not fix this since it resets the cache right before the module_load_all().

+  // Reset drupal_alter() and module_implements() static caches as these
+  // include implementations for vital modules only when called early on
+  // in the bootstrap.
+  drupal_static_reset('drupal_alter');
+  drupal_static_reset('module_implements');
   // Load all enabled modules
   module_load_all();

Maybe if the patch was modified to do the reset after module_load_all(), but I haven't looked at the consequences of that.

stefan.r’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
716 bytes
5.09 KB

As far as I can see, clearing the cache after module_load_all() ought to be fine as well -- only in the mentioned edge case should module_implements() be loaded anyway during the call to module_load_all().

rwohleb’s picture

I believe the patch in #70 to be sufficient, but looking at this again I'm wondering if just moving the static resets will be enough. If the error generated during module_load_all() is a fatal error, it will display the error immediately through the maintenance theme and exit. I don't believe that in this situation drupal_page_footer() ends up getting called, but if it does, then module_implements_write_cache() will be called. At this point it will write the bad cache since it believes there is a full bootstrap. Correct me if I'm wrong.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

#71: Yes, that is a generic problem with our state machine, that a phase is considered to be active when its entered, not when its successful ...

However: A fatal error would indeed exit early, so we should be okay there. (see _drupal_log_error)

-- That race is tricky for sure, but as it is an edge case including an error condition, I still think doing it afterwards is okay.

Should probably open a follow-up to harden module_load_all() though ...

stefan.r’s picture

#71: drupal_page_footer() is not called in such case, so it should be fine

pounard’s picture

Ignore this, the most recent patch actually should solve most problems here.

stefan.r’s picture

Title: module_implements() cache can be polluted by hook_boot() implementations calling module_invoke_all() directly or indirectly » module_implements() cache can be polluted by module_invoke_all() being called (in)directly prior to full bootstrap completion

Updating issue title to reflect #67 as I just ran into the same problem on a site with an early watchdog call during the module_load_all() in drupal_bootstrap_full() causing this exact problem.

Hoping we can get this into the next bugfix release...

stefan.r’s picture

Issue summary: View changes
pounard’s picture

+1 for seeing this in the next bugfix release, I experience this bug from time to time and it's always very hard to debug.

catch’s picture

Status: Reviewed & tested by the community » Needs work

When module_implements() is called early, and there's a cache hit on the module implements cache, then it looks to me like always clearing the drupal_static is going to result in a second cache_get(). Might need to store the results of that cache get separately, so that it's not fetched twice.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
2.09 KB
7.27 KB
bember’s picture

I have a related issue with Sub-pathauto module, which sometimes has its hook_url_inbound_alter() implementation ignored.

#79 fixes the problem.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Reviewed the changes and this looks great to me!

DrColossos’s picture

Any updates on this? Will we see it in D7 soon due to its RTBC status?

stefan.r’s picture

Hopefully, just waiting for a new bugfix release now

pounard’s picture

Status: Reviewed & tested by the community » Needs work

I'm afraid the latest version adds another layer of useless caching drupal_static() in the newly created function and will break some modules directly emptying this static cache entry of module_implements by calling the drupal_static_reset('module_implements') manually. I know this is not supposed to be public API but it is probably used this way (I have to admit that sometime, for performance reasons, I do it myself).

If you keep the module_implements static cache then it's useless to keep another one in the _module_implements_get_implementations() function, it's the same job twice, you probably should remove one of the statics and call the remaining one module_implements to keep full compatibility.

In most cases:

// Because we reset the "module_implements" static at the end of the
// bootstrap, we store the cache result into another static variable
// so as to prevent having a second cache_get() in case
// module_implements() is called early in the bootstrap.
// @see _drupal_bootstrap_full()

This attempt to solve problems for buggy modules at the cost of complexity, I think that the buggy modules should be fixed instead.

stefan.r’s picture

Let's see if @Fabianx can have a look at this and see what we should do here. I do wonder if it's only buggy modules causing early module_implements calls -- I believe core does this as well in come cases.

Fabianx’s picture

Assigned: Unassigned » catch
Status: Needs work » Reviewed & tested by the community

#84: I am afraid you will have to live with the static cache being cleared as bootstrap is a completely different phase.

My more complex solution also would have implicitly thrown away the static cache used during bootstrap.

As for the static cache, that is something catch asked for and I am indifferent, so back to RTBC for now, but assigned to catch.

catch’s picture

Assigned: catch » Unassigned

I think we should avoid the double cache_get() - prefer the extra static caching to that.

However if people prefer to have that, don't mind that much - just wanted to point out that the new patch was functionally different (and a regression in that respect) to the original one on this issue.

@pounard yes I also think it's a bug that 7.x modules invoke non-bootstrap hooks from bootstrap hooks, but unfortunately there are dozens of modules doing that. In 8.x this is effectively impossible because we removed hook_boot()/exit() so a workaround for the actually existing situation in 7.x seems OK. This is extremely hard to debug so there could be a lot of sites running into it without realising the cause.

Since we're detecting the error condition here, we could trigger_error()/watchdog() something to notify people this is happening, but that'd be up to David really.

stefan.r’s picture

So David_Rothstein will have 2 patches to pick from, #70 or #79. If it's the one with the new static, let's mention it in the release notes just in case.

The trigger_error() was already proposed earlier in #44, I think it was decided against.

pounard’s picture

I'm just thinking about something, but hook_watchdog() could be defined as a bootstrap hook, it would solve, at least, the problem of watchdog() calls during bootstrap. Even thought it'd not fix other buggy cases.

stefan.r’s picture

Ah yes that makes sense, care to roll a patch?

pounard’s picture

Maybe this worth a new issue ?

Fabianx’s picture

#91: Yes, please.

rwohleb’s picture

I vote for #70 as it's the simplest implementation without changing expectations of how the cacheing works. A double cache_get won't kill anyone. Further improvements and warning messages could be implemented later, but with D8 on the horizon I'm not convinced it's worth while. The patch from #70 gets the job done as minimally as possible.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs subsystem maintainer review
FileSize
10.65 KB

Not sure I'm qualified to decide, but the criticisms of both approaches (#70 and #79) actually seem valid to me :)

There's also a fair amount of "action at a distance going" on here, which makes the patches a bit scarier. A solution that is fully self-contained (within the problematic functions) would feel better.

And there's no guarantee that clearing the static caches at this point in the bootstrap fully solves the problem, right? (If contrib modules are doing crazy enough things in hook_boot() to trigger this bug, then it wouldn't surprise me at all if some of them are loading non-bootstrap modules early also.)

I'm also concerned that when the earlier patch in this issue was tested on production sites, multiple people reported that it was causing major problems (e.g. #32 and #37), and we still don't seem to fully understand why that happened... I am not sure if the newer patches have had that level of testing or not.

I looked at this for a while and began to wonder if we couldn't fix this in a conceptually simpler way - just by refusing to write to either the static or persistent cache until after all modules have been loaded? It seems like the right thing to do; if all modules aren't loaded yet there's no reason to think the current list of hook implementations will ever be valid later on, so we should not pollute any cache with it. And I don't think we need to micro-optimize performance for sites that are running tons of code in hook_boot() anyway.

That approach looks like the attached patch and seems to work fine when I tried it. It changes more lines of code than the previous patches, but most of the changes are just renaming variables to avoid writing to the static cache except when explicitly desired. I left the tests from #79 in place - the only functional changes are now to includes/module.inc.

Curious what people think of this one.

pounard’s picture

I think this is also valid, just like the two others, I have no strong opinion about the right way to do it, but I still prefer the simpler patch in #70 just for the sake of simplicity. I'm all in for removing code not adding it.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

+1 to #94 - that makes a ton of sense.

Back to RTBC for now - to have some people test the new patch.

Note: A similar strategy will work for schema, theme registry and entity definition caches - which all are also prone to that error (though those are different issues of course).

stefan.r’s picture

#94 seems fine to me too

catch’s picture

Most of those caches are using DrupalCacheArray and already check bootstrap phase before writing, i.e. https://api.drupal.org/api/drupal/includes%21theme.inc/function/ThemeReg...

So this is consistent with that, it's just that the hook implementations cache is inconsistent because it doesn't use DrupalCacheArray.

David_Rothstein’s picture

Cool, is anyone able to test #94 with a real site?

Would be particularly great if @joelpittet or @stockliasteroid could test it (if that's still a realistic possibility) since they had the sites where the earlier patch seemed to lead to problems.

joelpittet’s picture

@David_Rothstein I've been using the patch from #70 since Aug 18th without any problems in production.

I'll swap it out for #94 and see what happens.

joelpittet’s picture

Hmm, #94 conflicts with #2263365: Second loop in module_implements() being repeated for no reason. which I've also been using and I've performance tested as well. So I'm a bit hesitant to change before the conflict is resolved because it's also RTBC.

stefan.r’s picture

@joelpittet care to do a reroll of a combined patch and try that on your production setup? Hoping we can still get either of these patches (#70 or #94) into the next bugfix release (which might be due this week).

We've also had no problems with #70 on some of our sites. Not sure about the concerns in #94 but I can understand the approach in the latest patch may feel a bit less disruptive (also given it's consistent with what we do in DrupalCacheArray)

David_Rothstein’s picture

I went ahead and committed #2263365: Second loop in module_implements() being repeated for no reason. now since it looked like it was ready to go.

So I actually didn't realize how much real-world testing #70 has already had. Maybe that plus the fact that it doesn't conflict means we should commit that now, then do #94 later. I think the two concrete reasons #94 is a better long-term fix than #70 are:
1. There are some situations (mostly edge cases) where #70 won't fix the bug but #94 would.
2. The double cache-get issue, which should not be a problem with #94.

I think we can live with both of those. The second is the only one that might be considered a regression, but it should be relatively minor (and should only affect sites that are actually invoking hooks early in the bootstrap). Nobody was completely opposed to committing #70 above, so maybe that plan will work for everyone.

Leaving this RTBC for now to see if anyone objects.

stefan.r’s picture

Sounds great! Re-uploading #70

David_Rothstein’s picture

#104 is missing the tests, but I'll just trigger a retest of #70 instead.

David_Rothstein queued 70: 496170-70.patch for re-testing.

Fabianx’s picture

I would still like to see #94 getting some real-world testing as it is conceptually more clean and more consistent with the rest of core as catch pointed out, so could we re-roll #94 on current D7 HEAD?

stefan.r’s picture

@Fabianx Ithink the proposal was to do both (committing #70 for the bugfix release due soon as it can't make things worse than they already are, and #94 at a later point as it got more testing) - are you OK with that?

In any case a re-roll of #94 would be good so it can be tested.

Fabianx’s picture

FWIW, I am fine with that approach.

stefan.r’s picture

Cool, let's see if we're still on time for this to be committed... Will be good to finally have this (mosty) fixed!

David_Rothstein’s picture

Priority: Critical » Major
Status: Reviewed & tested by the community » Needs work
Issue tags: +7.40 release notes

OK, committed #70 to 7.x. Thanks!

So I guess we should leave this issue open (but at lower priority) for rerolling #94 and getting that in, which should hopefully fix any remaining edge cases here.

  • David_Rothstein committed e105186 on 7.x
    Issue #496170 by stefan.r, chx, Fabianx, jbrauer, David_Rothstein,...
jenlampton’s picture

edit: Oh, #70 nevermind. :)

catch’s picture

https://www.drupal.org/node/496170#comment-10410241 has the test coverage. The idea here seemed to be to get the whole patch fro #94 in, but would be worth getting the test coverage in by itself anyway.

davidwbarratt’s picture

szantog’s picture

Hmm, something is not good.
I used #24 before core upgrade on 3 sites, it worked well.

After upgrade 2 of them are ok, but the last 1 (it's a real monster, commerce, flag, etc)..

It starts to produce rules and flag issues on my localhost but not on our dev environment, for example: #1925922: Unsupported operand types in factory_by_row() in flag_flag.inc I check our custom codes, do some changes, then it seems, it is fixed. But no, the error comes back once, only drush rr fixes it. Then comes back again 2 days later total randomly. Then I see the differences: On my localhost I switched back to memcache! On db cache the problem came back immediately, on memcache we just saw it rarely, so I found out, it might be our old module_implements trouble.

Here are my experiences:
This whole issue might be some kind of sizing thing. Something overgrow immediately when I use db cache, something later on memcache.

I tried to play with _drupal_bootstrap_full(), this is my current state which is obviously not a fix, I just hope, it will give some idea.

@@ -5227,6 +5227,9 @@ function _drupal_bootstrap_full() {
   // in the bootstrap.
   drupal_static_reset('drupal_alter');
   drupal_static_reset('module_implements');
+  drupal_static_reset('module_hook_info');
+  cache_clear_all('hook_info', 'cache_bootstrap');
+  cache_clear_all('module_implements', 'cache_bootstrap');
   // Make sure all stream wrappers are registered.
   file_get_stream_wrappers();

The interesting part is: drupal_static_reset('module_hook_info'); Without this only drush rr can reanimate the site after getting error! drush cc all produces the same error, code changes doesn't matter anymore.
I have no idea, but it seems, the caches still remain corrupted somehow. With the patch above, the site works perfectly, but I though, clear caches at this point not a performance friendly solution

David_Rothstein’s picture

https://www.drupal.org/node/496170#comment-10410241 has the test coverage. The idea here seemed to be to get the whole patch fro #94 in, but would be worth getting the test coverage in by itself anyway.

I believe all the test coverage was already committed as part of #112. So we'd just need an updated version of the refactoring itself to continue here.

Haven't thought much about this myself but #2351731: hook_element_info_alter() does not always get invoked in themes reminded me that for drupal_alter(), we need to think a bit about how either of these solutions affect themes, which can implement alter hooks too....

David_Rothstein’s picture

@szantog, hm, module_hook_info() already ignores the cache if it's called early in the bootstrap - see https://api.drupal.org/api/drupal/includes!module.inc/function/module_ho... - so it's interesting that that cache clear is needed needed.

It's possible that an approach like #94 will help with that, but I'm not sure; part of the reason for proposing that was that there are some extreme edge cases where things can theoretically not be fixed even with the patch that was committed here.

Mixologic’s picture

Just ran into this again on drupal.org - this time the backtrace in the core dumps looks like so:

(gdb) zbacktrace
[0x7fb9fcdbda78] module_invoke_all() /var/www/drupal.org/htdocs/includes/module.inc:951
[0x7fb9fcdbd8d8] module_invoke_all("stream_wrappers") /var/www/drupal.org/htdocs/includes/file.inc:127
[0x7fb9fcdbd7b0] file_get_stream_wrappers() /var/www/drupal.org/htdocs/includes/common.inc:5232
[0x7fb9fcdbd638] _drupal_bootstrap_full() /var/www/drupal.org/htdocs/includes/bootstrap.inc:2266
[0x7fb9fcdbd558] drupal_bootstrap(7) /var/www/drupal.org/htdocs/index.php:20

And the actual bt:

#0  x86_64_fallback_frame_state (context=0x7fffb7093cf0, fs=0x7fffb7093b70) at ../../../gcc/config/i386/linux-unwind.h:47
#1  uw_frame_state_for (context=0x7fffb7093cf0, fs=0x7fffb7093b70) at ../../../gcc/unwind-dw2.c:1210
#2  0x00007fb9ed24b119 in _Unwind_Backtrace (trace=0x7fb9fa7547a0 <backtrace_helper>, trace_argument=0x7fffb7093e30) at ../../../gcc/unwind.inc:290
#3  0x00007fb9fa754936 in __backtrace (array=<value optimized out>, size=100) at ../sysdeps/ia64/backtrace.c:110
#4  0x00007fb9ecdb74fe in nr_signal_tracer_common () from /usr/lib64/php/modules/newrelic.so
#5  0x00007fb9ecdb7528 in agent_fatal_signal_handler () from /usr/lib64/php/modules/newrelic.so
#6  <signal handler called>
#7  0x0000000001c9a638 in ?? ()
#8  0x0000000000667818 in execute (op_array=0xb94e00) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:410
#9  0x00007fb9ecd87e80 in nr_php_execute () from /usr/lib64/php/modules/newrelic.so
#10 0x00007fb9f3723c33 in suhosin_execute_ex (op_array=0xb94e00) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:621
#11 suhosin_execute (op_array=0xb94e00) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:635
#12 0x000000000067a468 in zend_do_fcall_common_helper_SPEC (execute_data=<value optimized out>) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:673
#13 0x0000000000667818 in execute (op_array=0x11cc4a0) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:410
#14 0x00007fb9ecd87e80 in nr_php_execute () from /usr/lib64/php/modules/newrelic.so
#15 0x00007fb9f3723c33 in suhosin_execute_ex (op_array=0x11cc4a0) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:621
#16 suhosin_execute (op_array=0x11cc4a0) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:635
#17 0x000000000067a468 in zend_do_fcall_common_helper_SPEC (execute_data=<value optimized out>) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:673
#18 0x0000000000667818 in execute (op_array=0xfc9090) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:410
#19 0x00007fb9ecd87e80 in nr_php_execute () from /usr/lib64/php/modules/newrelic.so
#20 0x00007fb9f3723c33 in suhosin_execute_ex (op_array=0xfc9090) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:621
#21 suhosin_execute (op_array=0xfc9090) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:635
#22 0x000000000067a468 in zend_do_fcall_common_helper_SPEC (execute_data=<value optimized out>) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:673
#23 0x0000000000667818 in execute (op_array=0xf8ea60) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:410
#24 0x00007fb9ecd87e80 in nr_php_execute () from /usr/lib64/php/modules/newrelic.so
#25 0x00007fb9f3723c33 in suhosin_execute_ex (op_array=0xf8ea60) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:621
#26 suhosin_execute (op_array=0xf8ea60) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:635
#27 0x000000000067a468 in zend_do_fcall_common_helper_SPEC (execute_data=<value optimized out>) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:673
#28 0x0000000000667818 in execute (op_array=0x7fb9fcdf71b0) at /usr/src/debug/php-5.4.45/Zend/zend_vm_execute.h:410
#29 0x00007fb9ecd87e80 in nr_php_execute () from /usr/lib64/php/modules/newrelic.so
#30 0x00007fb9f3723c33 in suhosin_execute_ex (op_array=0x7fb9fcdf71b0) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:621
#31 suhosin_execute (op_array=0x7fb9fcdf71b0) at /usr/src/debug/php-suhosin-0.9.38/NTS/execute.c:635
#32 0x00000000005fce40 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /usr/src/debug/php-5.4.45/Zend/zend.c:1329
#33 0x000000000059f958 in php_execute_script (primary_file=0x7fffb7097790) at /usr/src/debug/php-5.4.45/main/main.c:2502
#34 0x00000000006b1b8b in main (argc=<value optimized out>, argv=<value optimized out>) at /usr/src/debug/php-5.4.45/sapi/fpm/fpm/fpm_main.c:1949
greenSkin’s picture

xjm’s picture

Fabianx’s picture

While we never got #94 in, it also has other problems.

That said I think we can fix the remaining edge cases, by doing this:

diff --git a/includes/module.inc b/includes/module.inc
index 494924f..24e5dd7 100644
--- a/includes/module.inc
+++ b/includes/module.inc
@@ -833,6 +833,11 @@ function module_hook_info() {
  * @see module_implements()
  */
 function module_implements_write_cache() {
+  // The list of implementations includes vital modules only before full
+  // bootstrap, so do not write cache if we are not fully bootstrapped yet.
+  if (drupal_static("drupal_bootstrap_finished") != TRUE {
+    return;
+  }

and then set the drupal_static at the end of the bootstrap full function.

That way - that at phase is set when it's entered, not when it's completed - would no longer bite us here.

joseph.olstad’s picture

#122 makes sense to me. I have not yet tested this however.
I believe most of my projects have been for years using a version of one of the above patches.

marco’s picture

The attached patch implements #122, with an addendum from @peximo to make sure that calling a full bootstrap more than once does not break the logic.

apaderno’s picture

Issue tags: -7.40 release notes
Darren Oh’s picture

Status: Needs review » Reviewed & tested by the community