Problem
- A stale 'controller class' definition is not updated.
Details
- Entity info is statically cached. Once defined, it sticks.
- drupal_flush_all_caches()
- does not reset any statics.
- triggers all kind of core rebuilds that depend on entity info.
- triggers flushing of core cache tables.
- triggers flushing of module cache tables, which also rebuilds data, depending on entity info.
- is invoked after full bootstrap, but entity info might have been prepopulated during bootstrap already (hook_menu_custom_theme(), hook_admin_paths(), hook_init(), etc).
- Stale entity info can easily lead to WSODs elsewhere.
- The cache flushing operation is effectively never reached, because Drupal dies before already. Hence, the entity info cache is never cleared.
_____________________
In a contributed module, I added custom 'controller class' to its hook_entity_info() implementation.
Upon trying to load any page, including admin_menu's flush all caches menu callback as well as drush cc all
,
WSOD:
Fatal error: Class name must be an object or a string [...]
Further analysis showed that
- {registry} contained the new controller class.
- {system}.info contained the new files[] definition and the filename that contains the new controller class.
- Directly invoking registry_rebuild() after a full bootstrap worked.
- Directly invoking drupal_flush_all_caches() after a full bootstrap did not work.
- Directly invoking entity_get_info() after a full bootstrap did not show the module's entity type at all (even though I changed the controller class property only; the entity type itself was registered before already).
Clearing the entity info cache in drupal_flush_all_caches(), as in attached patch, fixed the problem.
Not sure whether this can be tested in any way.
Comment | File | Size | Author |
---|---|---|---|
#94 | drupal.entity-info-cache-clear.94.patch | 3.18 KB | sun |
#89 | drupal_entity_info_cache.patch | 3.11 KB | xjm |
#85 | drupal_entity_info_cache-D7.patch | 3.11 KB | fago |
#82 | drupal8.entity-cache-flush.81.patch | 3.97 KB | sun |
#77 | drupal_entity_info_cache_8.patch | 4.12 KB | fago |
Comments
Comment #1
sunSomehow, I can't believe that no one else ran into this issue yet.
The problem is that without this patch, the stale
'controller class'
definition cannot be updated in any way -- flushing caches leads to the WSOD, and manually rebuilding modules and the registry also leads to the WSOD.The entity info cache is effectively never cleared.
Comment #2
sunUpdating to latest code of Entity module requires this patch, too, due to major directory and file changes in the module.
Upon testing the patch once more in that situation, it became clear that we should rebuild the registry prior to flushing the entity info cache.
Comment #3
Anonymous (not verified) CreditAttribution: Anonymous commentedholy crap.
Comment #4
fagohm, what was the error message? It's usually only problematic if the *old* controller class cannot be found. So perhaps the problem was that the cache got cleared before the registry was rebuilt?
For the entity.module upgrade, this was may problem for the 'entity class'. This is why the old file is left there for the upgrade to work. At least for me it did work?
Anyway, drupal_flush_caches() clears the cache tables, thus it should clear the entity cache too. It just doesn't clear the static cache, but drupal_flush_caches() doesn't do that for others either.
Comment #5
fagofor completeness, here the description of my entity API upgrade troubles:
The problem is that update.php bootstraps drupal, what includes all enabled modules. Therefore if a module X has an entity class, the parent class is required. Consequently it tries to load it from the old path using the registry - but that happens already *before* the first cache-clear.
Comment #6
sunEntity module's major update was just the second occasion that made me run into this bug. Related to that: Contrary to what you're saying, I was successfully able to run update.php -- until the final 'finished' op/page.
Yes, basically, drupal_flush_all_caches() already flushes the cache tables. However, it does not clear the static entity info cache, even though registry_build() might have written new class and interface information to the registry already.
Only the subsequently invoked menu_rebuild() in drupal_flush_all_caches() actually leads to the WSOD, because even though we have new registry and module information, the entity system still tries to access/load the old information.
As explained in the OP, detailed inspection and analysis showed that all of the new information was properly recorded already.
Repetitive invocation of drupal_flush_all_caches() does not help, because menu_rebuild() is invoked before the cache tables are flushed, so the entity system still reloads the stale/wrong entity info data from the cache.
Lastly, even if we would change the order of menu_rebuild() and cache_clear_all(), there would still be no guarantee that entity_get_info() was not invoked before drupal_flush_all_caches() in the request, so subsequent invocations of entity_get_info() via menu_rebuild() may still act on stale statically cached entity info data.
Comment #7
fagoYes, I see the problem with the static cache. However I think drupal_flush_all_caches() is not designed to clear any static caches, as it just deals with "permanent" caches and doesn't clear any other static caches.
However, having menu_rebuild() in there is problematic as it immediately triggers the menu rebuild. In particular it is odd that it is invoked before hook_flush_caches() gets called. Thus this problem is very similar to #986024: Inconsistent static entity cache in hook_entity_[insert|update|delete] - in both cases the static caches are inaccurate so the immediate menu_rebuild() may cause troubles.
So if we include the static cache reset for entities, why not others too? What about calling drupal_static_reset() there then? Also I think we have to move menu_rebuild() down, as it issues the menu cache rebuild and so rebuilding of other caches.
I do think menu_rebuild() shouldn't issue the immediate rebuild either, but I guess for d7 that's set in stone.
Comment #8
sunShort of: it's a mess.
Comment #10
sunI have no idea why the Drupal installation fails with that patch.
Comment #11
pillarsdotnet CreditAttribution: pillarsdotnet commented#8: drupal.entity-cache-flush.8.patch queued for re-testing.
Comment #13
BerdirUpdating entity.module certainly is troubling every time :)
I tried installing manually with the patch. Everyone works fine except that the success message is on a otherwise empty page, looks like the theme can't be found. Once the "go to your site" link is clicked, everything looks ok.
Comment #14
pillarsdotnet CreditAttribution: pillarsdotnet commentedRe-roll using
git format-patch
.Comment #16
pillarsdotnet CreditAttribution: pillarsdotnet commentedWondering if it would be acceptable to call
entity_info_cache_clear()
instead ofdrupal_static_reset()
.Comment #18
pillarsdotnet CreditAttribution: pillarsdotnet commented#16: clear_entity_cache-996236-16.patch queued for re-testing.
Comment #20
pillarsdotnet CreditAttribution: pillarsdotnet commentedOkay, so something in the comments module is b0rken when clearing caches also calls
entity_info_cache_clear()
But just to be sure, let's try with that particular call commented out.
Comment #21
pillarsdotnet CreditAttribution: pillarsdotnet commentedComment #23
pillarsdotnet CreditAttribution: pillarsdotnet commented#20: clear_entity_cache-996236-20.patch queued for re-testing.
Comment #25
pillarsdotnet CreditAttribution: pillarsdotnet commentedMaybe if I fiddle with the order of things...
Comment #27
pillarsdotnet CreditAttribution: pillarsdotnet commentedFixed typo:
Comment #29
sun@pillarsdotnet: I think you can stop trying here, please, your follow-ups along the submitted patches sound too much like a guessing game, which is not particularly helpful and rather noise for this issue.
We have to go back to the discussion that ended in #8, and precisely evaluate how this issue may be fixed. Meanwhile, I'm 99% confident that this issue cannot be resolved by shuffling around some lines of code. There are too many recursive interdependencies.
Comment #30
catchI'm a bit confused by this issue. If we had #534594: [meta] system info cache writing and registry rebuilding is broken without a full bootstrap in and this actually ran on update.php, would that help or not?
Comment #31
carlos8f CreditAttribution: carlos8f commentedsubscribing
Comment #32
sun@catch: AFAICS, that only affects a small subset of the problem; only providing a workaround for recovering from a WSOD due to a stale registered class.
#1 tried to clarify, but let me try again:
Problem
Details
_________________________
OK, a proper issue summary really helps... ;)
Attached patch might resolve the issue and explains the effin' mess of interdependencies in inline comments.
Comment #34
sunsigh. This patch reveals yet another interdependency between theme registry and caches/rebuilds.
Comment #35
sunResorted to specifically clearing the entity info cache only, clarified in a comment.
Comment #37
sunBack to drupal_static_reset(), but in the proper location (first).
Only remaining problem is that the invocation of drupal_flush_all_caches() in install_finished() breaks the maintenance theme for an unknown reason. I've spent the past three hours with stepping through the subsystems, but wasn't able to figure out what exactly goes wrong. theme.inc doesn't really use drupal_static(), and neither does theme.maintenance.inc, nor install.core.inc. But nevertheless, the drupal_static_reset() in drupal_flush_all_caches() makes the install_finished() page render without a theme (plain page output, as in Stark, but without Stark).
Contains debugging code, no need for testing.
Comment #38
andypostTesting last patch
Comment #40
rfay#37: drupal8.entity-cache-flush.37.patch queued for re-testing.
Comment #42
fago#1216454: Custom controller classes must not be deleted is probably a duplicate of this issue.
Comment #43
fagoAnother effect of this issue: #1325126: uninstallation troubles due to outdated entity info
Comment #44
fagook, let's try it with a patch keeping the changes to the bear minimum.
Comment #45
fagoWhere are my patches?
Comment #47
fagoStrange, the failing test worked for me with the D7 version of the patch.
Anyway, here is a try to properly clean up the mess for D8. Let's see what the bot thinks.
There are for sure some race conditions left, e.g. consider a page request coming in after the registry has been rebuilt but cache bins have not been flushed yet. First we've to get the basics right though.
Comment #49
fagook, ran into the installation problem as sun in #37. Figured out the problem: the added css is statically cached, and thus gone after flushing caches.
I've documented this behavior + also unset $theme so the whole theme is marked as un-initialized after clearing caches. That way, one can still output pages with a fresh initialized theme. Usually, one wants a redirect afterwards anyway though, so clearing added css/js should be acceptable. In turn, we get properly reset static caches everywhere, thus less bugs during the following cache rebuilds.
Let's see what the test bot thinks now.
Comment #51
fagoStrange, installation works for me. :/
Comment #52
fagoComment #53
fagoThe issue seems to occur more often recently, marked
#1314766: Fatal error when uninstalling module and #1314780: When uninstalling: Fatal error: Class 'Profile2TypeUIController' not found as duplicates.
Comment #54
sunRe-roll of #37 (without debugging code), as I wasn't sure what was changed since then. Hadn't time for in-depth comparison, so chose to simply re-roll it...
Comment #56
Pun-1 CreditAttribution: Pun-1 commenteddrupal.flush-entity-cache.0.patch queued for re-testing.
Comment #57
fagosry, here some explanations:
I compared it to #54 and incorporated improvements from it:
Why does node_menu() require entity info? On the contrary entity-info requires node-type-info (see node_entity_info()).
I'm not sure about that. Is those rebuilding really a valid operation? If it's duplicated data, rebuilt for performance or so, it's a cache. That means, don't do a rebuild operation but clear the cache using hook_flush_caches() and rebuild the cache on demand.
-> By always going with that approach caches automatically rebuild in the right order and we don't end up with accidentally wrong rebuild orders possibly resulting in wrong caches/rebuild info. Also, it's important to be able to initiate that rebuild without it to happen automatically.
E.g. I really think menu_rebuild should work using the "cache-clear approach". Now, a cache rebuild may trigger lots of menu rebuilds if lots of items trigger them. Added a new page manager page? Rebuild! A new profile type? Rebuild! A new node type? Rebuild! ...
So maybe instead we should document clearing-caches (bot not issuing rebuilds) from hook_flush_caches() is intended behavior?
Merged patch attached, please review. Still, I've no idea why the test-bot fails. Tests basically work for me, as well as installing manually.
Comment #58
fagoComment #60
fagoWell, looks like cleaning this up isn't an easy task, but we need to fix the entity-info cache problem for D7 soon. So let's better do a quick fix first and properly clean up the mess for d8 afterwards. Patch attached.
Comment #61
sunPoint being, the simple fix didn't solve the problem in 2 out of 3 occasions or so. (actually I'm no longer sure whether it ever worked or whether the situation resolved itself through repetitive and/or concurrent brute-force cache clearing)
Contrary to that, #54 even worked for the extreme case of the /core directory change; i.e., completely bogus registry and controller/class autoloading info.
Comment #62
sunYes it is a valid operation. The rebuilding accounts for the fact that various things are merely known to the system via info hooks. The rebuild operations are not caches, they actually re-retrieve, collect, prepare, and create/update known, semi-dynamic information registered via module hooks, and also delete previously known info.
Rebuild operations cannot be compared to caches. The module or subsystem (re)building data almost always has a cache layer in front of the built data, and that cache is also cleared as part of the rebuilding, but that's not the point of rebuild operations.
Changing and revamping rebuild operations is a different issue and requires lots of discussion first. Hence the @todo only. Let's keep this discussion out of this issue, please.
In light of that, the node type rebuild changes in the latest patch need to be reverted, because they break current expectations.
You're totally right on that. No longer sure what made me believe that.
Comment #63
fagoI get regularly bug reports in the entity api queue by people stumbling over this. Thus, let's fix this asap. Ideally before 7.10 ships.
Still, #54 has issues. Doing a whole drupal_static_reset() sounds good, but it's no simple change and needs work. I think best, we clean up the whole mess in D8 first and then look at what makes sense to backport. But that's going to take time and people run into this issue now, so let's do the 1-line stop-gap fix now.
I've attached the d7 version of #60. I'm setting the issue temporarily to d7 to let the test-bot run for d7 too.
Thinking about it, I must agree. I've even already implemented one: rebuilding exported entities upon cache clear. A separate hook would be important here, to ensure it doesn't conflict with cache-clears. But yes, let's handle that in another issue.
Comment #64
fagodouble post
Comment #65
fagoagain d7 version for the bot.
Comment #66
fagotriple post...
Comment #67
fagotriple post...
Comment #68
rjulio CreditAttribution: rjulio commented#60: drupal_entity_cache_quick.patch queued for re-testing.
Comment #69
rjulio CreditAttribution: rjulio commented#45: drupal_entity_cache.patch queued for re-testing.
Comment #70
nerdacus CreditAttribution: nerdacus commentedI was having a problem when disabling my custom modules, it would error with basically "unable to find the entity's UIController".
I manually applied the patch proposed in #67 and now everything works as it should.
Thanks,
-Marcus
Comment #71
xjmThe stop-gap fix looks good to me. But maybe we can add a test with it so that we have that to test a "proper" solution?
Comment #72
catchMoving back to 8.x.
Comment #73
fagoI agree that this needs tests -> I took a stab on that and implemented tests that prove changes to entity info appear in the cache after calling drupal_flush_all_caches() + after disabling a module.
It turned out the latter is already solved for d8, as the entity.module patch added an entity-info-cache-clear upon module_disable() already. Thus, I've improved the D7 version to do so too, but directly in module_disable() due to the lack of an entity module.
Changes to entity info do not appear after calling drupal_flush_all_caches() in both D7 and D8 though.
Updated patches for d7 and d8 attached, whereas I've added test-only patches to prove they fail too.
Comment #75
xjmComment #77
fagoyep, the test without the fix fails.
re-uploading the complete patch. equals the patch from #73.
Comment #78
fagoPeople still ran into this, e.g. see #1352422: Uninstall breaks site.
Any remarks to the patches in ##73?
Comment #79
catchThis looks good to me for fixing the bug but it is really a sad state that we have this mixture of 'registry' rebuilds, clears of individual static and persistent caches and then full bin clushes in drupal_flush_all_caches(). sun was mentioning trying to split this into distinct registry vs. cache steps - is there an issue for that?
Comment #80
sun#77: drupal_entity_info_cache_8.patch queued for re-testing.
Comment #82
sunPatch will likely fail due to #375397: Make Node module optional
While this patch seems to fix a bug and is proven by tests, it slightly contradicts what I was after in #54; i.e., flushing/clearing everything first, and trigger any rebuilds only afterwards (from a clean slate).
In other words, I do not understand why you explicitly want to rebuild node types before clearing the entity info caches.
What is the rationale behind that?
Second, is it possible that the test also passes if you reverse the order? (i.e., is it possible that the test only passes due to the mere entity_info_cache_clear() being added, regardless of its position?)
Let me double-check that in attached patch.
Comment #83
fagoEntity-info contains bundle info, what in case of nodes are the types. Thus, the node rebuild needs to run before entity info is built, as e.g. issued during menu_rebuild(). That said you are right that the entity-info cache clear could happen before node-type rebuilding as long as node-type rebuilding does not trigger rebuilding entity info itself.
Afaik it doesn't, but still that might happen on another page request in the meantime.... Thus, clearing entity-info afterwards ensures we are on the safe side.
Yes, the tests generally make sure entity info gets updated, but not in any way specific to node-types.
I've opened #1404198: Separate database cache clearing from static cache clearing and data structure rebuilding so we can concentrate on the stop-gap fix for entity-info here.
Comment #84
sunI'd prefer #82 then. It merely clears the entity info cache early, and does not involve the previously added comment about a interdependency specific to node type rebuilding.
Comment #85
fagook, let's go with #82 then.
Attached the d7 version of #82.
Comment #86
sunAlright then.
Comment #87
catchThanks! Committed/pushed to 8.x, moving to 7.x.
Comment #88
kenorb CreditAttribution: kenorb commented+1
Comment #89
xjmNo commit credit; all I did was change the name of the file.
Comment #90
fagoAnything left from setting the port RTBC? It's a straight port from what d8 does.
Comment #91
sunWhy is there an additional clear here? The D8 patch only has the one in dfac().
Comment #92
fagoYep, as explained previously:
Thus, it has been already fixed in d8, but is so far missing for d7.
Comment #93
sunSorry for blatantly missing that clarification, my fault. :-/
I've double-checked the D8 code and since we're invoking entity_info_cache_clear() as part of hook_modules_disabled(), it is invoked before the registry_update(). Also, entity.module is likely invoked as one of the first hooks.
However, the D7 patch clears the entity info cache after hook_modules_disabled() and after registry_rebuild(). Given how extremely fragile this entire system is, I'd prefer to move the entity_info_cache_clear() right before hook_modules_disabled().
Comment #94
sunAdjusted for #93. No other changes. RTBC if it comes back green.
Comment #95
sunComment #96
webchickThis looks like it still needs review. This also means it won't make it to the February 1 release, unless there is absolutely no chance it could possibly break anything. But it can be part of the release after that!
Comment #97
fagoRight, sry for missing this order. :/ -> Your change looks good to me. I also tested your patch manually, it works fine. -> back to RTBC.
@webchick:
I doubt the added cache-clear will break anything. Instead, the breakage is already happening.. E.g. here are some issues I'm aware of:
#1316834: How to Clear Entity info cache ?
#1325126: uninstallation troubles due to outdated entity info
#1357640: SOS unchecked some modules related to support ticketing leads to critcal failure ...
#1294200: Complete uninstall still leaves phantom entries in module list
#1274692: Error on Disabling Module
#1412606: Fatal error: Class 'Profile2MetadataController' not found in /home/unitedor/public_html/sites/all/modules/entity/entity.info.inc
Thus, to avoid more headaches I'd recommend shipping this fix asap.
Update: I forgot to mention that the patch really helps, e.g. this broke a site I was working on yesterday. After applying the fix, the site was recovered. Without the fix, the only way to recover the site is manually clearing cache tables.
Comment #98
webchick*wince* Ok...
Committed and pushed to 7.x. Thanks!
Comment #99
sunThanks! Yes, this patch is known to only improve the situation.
The issue merely got lengthy as we were discussing what the actual root cause and "proper" approach/solution would be. This patch here is a stop-gap fix. The major API change to fix it properly happens over in #1404198: Separate database cache clearing from static cache clearing and data structure rebuilding
Comment #101
fagoThanks! :)
Comment #101.0
fagoUpdated issue summary.
Comment #102
cchoe1 CreditAttribution: cchoe1 commentedDoes anyone know if this issue persists to 8.x? I've tried using this function call but it doesn't seem to match the functionality from drush/UI. I've also seen like 30+ different implementations of this (bootstrapping drupal beforehand, using it in combination with registry clears, etc.) so I may not even be using it correctly.