Problem/Motivation
The issue involved the menu_mask variable being empty, resulting in WSOD. This sometimes could result in the inability to install D7/D6.
Proposed resolution
The proposed solution in #33 explains the problem occurring when "2 requests come in at exactly the same time and menu_masks variable is not defined", resulting in the lack of a request lock and the variable remaining undefined. Proposed solution in #53 is RTBC.
Remaining tasks
Comments by mikeytown in #57 in response to webchick in #55 to clarify the solution in the patch comment need review.
User interface changes
none
API changes
none
Original report by [username]
I haev seen too many requests to this and now it came to me that there is a way to rebuild without running a query to see whether router is empty.
Comments
Comment #1
chx CreditAttribution: chx commentedComment #2
moshe weitzman CreditAttribution: moshe weitzman commentedlooks good to me, and auto-recovers when you get into a nasty situation of your site going permanent WSOD.
Comment #3
chx CreditAttribution: chx commentedFor D6 too.
Comment #4
Gábor HojtsyWhy should this happen in the middle of an API function?
Comment #5
chx CreditAttribution: chx commented'Cos if the menu build does not happen (or fail) then your masks will be empty -- and this is the only case when your masks can be empty -- so it's a good time and time place to try to recover.
Comment #6
pwolanin CreditAttribution: pwolanin commentedAs an alternative, can we set the variable that says a rebuild is needed and maybe hard code a few masks in for this just this page request?
Comment #7
pwolanin CreditAttribution: pwolanin commentedhow about this? A quick test shows it works if the masks are deleted/emptied.
Comment #8
alexanderpas CreditAttribution: alexanderpas commented+1 to #1 nice patch, also fixes #307937: MYSQL Syntax error on installation with pre-filled tables
-1 to #7, that just looks ugly...
Comment #9
chx CreditAttribution: chx commentedso how do we proceed?
Comment #10
alexanderpas CreditAttribution: alexanderpas commentedhow about RTBC on the #1 patch?
Comment #11
robertDouglass CreditAttribution: robertDouglass commentedI've been bitten by this bug. Subscribe. Will review.
Comment #12
webchickI can't currently install D7 because of this, which is seriously impeding my ability to commit patches. ;) It'd be nice of chx/pwolanin could have a discussion about the best way to proceed.
Comment #13
alexanderpas CreditAttribution: alexanderpas commentedwebchick, did applying #1 fixed it for you?
Comment #14
dropcube CreditAttribution: dropcube commentedIt's happening for me every time I try to install D7 after the DBTNG patch. Would be good to find what is causing the issue, for what reason the menu rebuild does not happen during a fresh installation. Subscribing.
Comment #15
catchI've had this on occasion but not been able to consistently reproduce.
Comment #16
chx CreditAttribution: chx commentedGive me file level access to some place where I can reproduce something and I will figure out. I can't help a bug I can't see.
Comment #17
pwolanin CreditAttribution: pwolanin commentedPeople have seen this on D6 and D7, so it's some kind of race condition I think. But I'm at a loss as to what.
Actually, the D7 code has a flaw that's partly fixable... let me go find it.
Comment #18
pwolanin CreditAttribution: pwolanin commentedsome of the D7 problems may be fixed by this: http://drupal.org/node/238760#comment-1023094
Comment #19
alexanderpas CreditAttribution: alexanderpas commentedchx: see #307937: MYSQL Syntax error on installation with pre-filled tables for a way to forcefully reproduce this bug...
Comment #20
dropcube CreditAttribution: dropcube commented@alexanderpas: I can not reproduce the bug with the instrucctions there, the first step is "instal site normally", how is that if I can not even install Drupal. The error appeared during the installation process.
Comment #21
pwolanin CreditAttribution: pwolanin commentedI think this is a simpler way to achieve what chx's patch does in #1. With this patch in place, I can variable_del the menu masks via a PHP node and life goes on normally. Without the patch, Drupal dies until I hit update.php or otherwise manually force a menu rebuild.
Patch is against 6.x, though applies to 7.x also with offset.
Comment #22
chx CreditAttribution: chx commentedMasterstroke.
Comment #23
pwolanin CreditAttribution: pwolanin commentedwith added code comment
Comment #24
alexanderpas CreditAttribution: alexanderpas commentedeven better!
Comment #25
alexanderpas CreditAttribution: alexanderpas commentedmarked #280015: cannot install drupal - menu_router error as duplicate.
Comment #26
webchickOk, I committed this. I've run into this bug before, and it's good to have a fix in. However, I'd LOVE to figure out WHY this happens so we can fix the underlying issue. Talking to Peter and chx though it doesn't sound like it's so easy to determine since it's very erratic.
Moving down to 6.x for consideration.
Comment #27
Gábor HojtsySeconded webchick's comment on the fact that it would be better to know the underlying cause. Committed to 6.x now as a stop-gap solution.
Comment #28
Anonymous (not verified) CreditAttribution: Anonymous commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #29
Yura Filimonov CreditAttribution: Yura Filimonov commentedSince you have marked #307937 as a duplicate of this issue, I'll let you know that:
- I have tried installing Drupal 6.6, downloaded recently on WAMP 2.0c
- during the installation, I get the MySQL syntax error described in #307937
- if the installation is successful (happened once), after the site times out on anything, the menu isn't completely built
My cookies are enabled and PHP memory limit is set to 128Mb.
Anyone confirmed that the problem is completely fixed? Judging by posts elsewhere, it is not.
Thanks.
Comment #30
beginner CreditAttribution: beginner commentedSee:
#2946: Login fails and no warning is issued if cookies are not enabled
#234539: Critical SQL error in installer with {menu_router}
Comment #31
bklineThis bug is still present in the most recent release version of Drupal 6. It doesn't happen every time, but I've had it occur twice: once with 6.4 and once with 6.19, both on Ubuntu 10.10. Cookies are definitely enabled. The first time it happened I was installing from the O'Reilly Using Drupal pre-packaged archive. I found a posting from someone who reported having solved the problem by replacing that archive with the latest production release of Drupal 6. So I pulled down 6.19 and tried again. Same problem. I found another posting that said installation had failed using Chrome as the browser but succeeded with Firefox. So I cleared out the settings.php file and the database and tried again using Firefox. This time it succeeded. However, I was skeptical that the browser could actually be the problem so I did some additional testing using fresh downloads of 6.19 and was able to get successful installations using either browser. So it's much more likely to be a timing problem. Yes, I know that makes the bug harder to reproduce, and therefore harder to fix. But it's still a bug which hasn't been fixed. If you do some googling, you'll see plenty of postings from frustrated users struggling with the bug well after this issue was marked as "closed (fixed)" two years ago. I've changed the "Version" field from "6.x-dev" to "6.19"; there's no indication in this interface whether the "Version" field is intended to refer to the version in which I'm reporting the problem, or a target version for the fix. Apologies if I guessed wrong. If the setting to "6.x-dev" was intended to mean "we fixed this in the development branch two years ago but we have no intention of ever deploying the fix in the released version of the system" then the bug is in the process, not the code. It's hard to believe the Drupal development community would really expect all of its users to dig through the bug database and apply all of the approved patches. And I'd also like to think that comment #26 isn't just being ignored.
Comment #32
mgriego CreditAttribution: mgriego commentedI ran into this one just now while I was attempting an install of Open Atrium. I figured it was an issue with Atrium's install profile, until I tried to do a "standard" installation and failed there too. When I searched d.o, I saw #280015: cannot install drupal - menu_router error. Comment #40 mentioned the system date. I was installing onto a VMware guest that had been suspended for a long time, and ntpd had died at some point. My system date was about 40 days in the past. Once I resynced my system time, the install went off with no problems.
Comment #33
mikeytown2 CreditAttribution: mikeytown2 commentedThe Error
How to trigger error:
2 requests come in at exactly the same time and
menu_masks
variable is not defined. menu_execute_active_handler() runs for both of them. In both cases themenu_masks
variable doesn't exist thus menu_rebuild() will run on both. The first request to get a lock will complete fine. The request that doesn't get a lock will wait and not run the _menu_router_build() function; this function sets the menu_masks variable, but in the request that didn't acquire the lock, the variable is still not defined. Thus later on when menu_get_ancestors() is ran, the foreach loop (in this function) is not ran and we get 2 empty arrays returned causing the SQL query to blow up on us inside of menu_get_item().Possible Solutions:
$write_database = FALSE
like we do in module_rebuild_cache() and system_theme_data() and duplicate work.menu_masks
variable and set it in the $conf global (thus allowing variable_get to work correctly in this request).menu_masks
variable is empty provide an array that goes from 255 down to 1.My Opinion:
I think we should read from the database and set the
menu_masks
variable after lock_wait runs inside of menu_rebuild. If for some reason that variable doesn't exist we will still be ok because inside of menu_get_ancestors we will have an alternate foreach loop that doesn't require this variable. Patch attached doesn't contain the alt menu_get_ancestors logic, only a place holder for it.Thoughts?
Comment #34
mikeytown2 CreditAttribution: mikeytown2 commentedComment #36
mikeytown2 CreditAttribution: mikeytown2 commentedRemoved the menu_get_ancestors placeholder logic as that could be causing issues.
Comment #37
chx CreditAttribution: chx commentedWhy is this set to D6 (and not D8 and backported down)?
Congratulations on the detective work! But #36 makes me want to cry like a baby. That's sooo fugly. Can't we just throw an 503 from menu_get_ancestors? If you run into this when called from _menu_find_router_path then you are in so, so deep trouble.
Comment #38
noslokire CreditAttribution: noslokire commentedMan I am happy we are have now surrounded this bug after 8 months of digging. Nice work Mike.
Good news, I think you in the right place.
Bad news, applied patch didn't correct our issue.
Good news, we see this a couple times per day randomly across our network of sites so there are plenty of chances to test the approach in #37 or nail down a better approach to #36.
Thanks for the detective work on this!
Comment #39
mikeytown2 CreditAttribution: mikeytown2 commentedHere's an alt approach that is similar to #7. We will test and report back if this fixes the issue for us. I don't feel like pulling apart the bitwise math in the foreach loop so I'm trying the alt option here. Yeah it slightly slower but this happens about once every 500k requests for us. Patch attached.
Comment #40
chx CreditAttribution: chx commentedThanks for your persistence despite my slightly scathing review. That approach is indeed looking better, but I think
range(1, 256)
would be better. Perhaps with a comment that MENU_MAX_DEPTH is 9 so we need 9 bits at most?Comment #41
mikeytown2 CreditAttribution: mikeytown2 commented#39 fixes this bug on our sites.
Code wise the array needs to start from 255 and count backwards. 255 comes from the fact that the path column in the menu_router table is varchar(255). Something like this would work instead of the hard coded array if we want to go that route.
Comment #42
chx CreditAttribution: chx commentedReading http://php.net/range
So
range(511,1)
Why 511? What this masks things does, unless I completely forgot what I coded some years ago :) is that we map node/%/foo/%/bar to 10101 and the max depth is 9. So there can not be more than 9 bits. Which means we need 511 not 256. Correct me if I am wrong. (The leftmost bit is always 1 because we do not support %/foo so the binary mapping always work.)
Comment #43
mikeytown2 CreditAttribution: mikeytown2 commentedPatch against D8, D7 & D6 using chx's advice from #42
Comment #45
mikeytown2 CreditAttribution: mikeytown2 commentedMan I wish the test bot was a little smarter. Now with only the D8 patch attached. D7 & D6 patches attached in #43
Comment #46
chx CreditAttribution: chx commentedComment #47
mikeytown2 CreditAttribution: mikeytown2 commentedThis test might fail on your patch.
MenuRouterTestCase::testMenuGetItemNoAncestors
Comment #48
mikeytown2 CreditAttribution: mikeytown2 commented@chx
Is the default parameter getting removed in D8 for variable_get? If not, what is the advantage of your way? In either case it's doing what it should so I will yield to CHX on this point. Calling this RTBC.
Comment #49
catchThe difference with #46 is that we avoid calling range() on every request - instead that will only happen if nothing is returned from variable_get() - chx pointed this out to me in irc when I asked the same thing, it's easy to forget the variable_get() default actually gets called either way.
That's a good reason to add a comment here though explaining why it's doing that.
Comment #50
chx CreditAttribution: chx commentedComment #51
catchThanks :)
Comment #52
catchCommitted/pushed to 8.x, moving to 7.x for backport. Backport should be straightforward so marking novice for that.
Comment #53
mikeytown2 CreditAttribution: mikeytown2 commentedPatches for D7 & D6 based off of #50. The D6 patch also takes care of some white space issues.
Comment #54
aspilicious CreditAttribution: aspilicious commented7.X patch is exactly the same :)
Comment #55
webchickDamn. Nice sleithing! Committed and pushed to 7.x for parity with 8.x. That should also take care of the critical.
However, this comment doesn't explain the code below it. Or at least not to me.
How does range(511, 1) "use brute force to get the correct $ancestors and $placeholders returned"? Can we clarify this comment more? We need something along the lines of #40-#42. range() normally goes "little to big," and it's not clear what significance the number 511 has.
Comment #56
abautu CreditAttribution: abautu commentedI'm using this on D6, but I changed
$masks = range(511, 1);
to
$masks = range($end, 1);
since values larger than $end will be ignored anyway in the foreach loop.
Comment #57
mikeytown2 CreditAttribution: mikeytown2 commentedI think I've explained this better and I've also used the tip from abautu in #56
The 511 comes from (2^9) - 1. 9 comes from MENU_MAX_PARTS.
Patches for D8, D7, and D6. The D6 patch also takes care of some white space issues.
Comment #58
mikeytown2 CreditAttribution: mikeytown2 commentedComment #59
BrockBoland CreditAttribution: BrockBoland commentedNeeds issue summary
Comment #60
h4rrydog CreditAttribution: h4rrydog commentedUpdated issue summary.
Comment #60.0
h4rrydog CreditAttribution: h4rrydog commentedUpdated issue summary.
Comment #63
alansaviolobo CreditAttribution: alansaviolobo commentedComment #64
mikeytown2 CreditAttribution: mikeytown2 commentedNo longer needed in D8; RouteProvider::getCandidateOutlines() seems like this is where the code landed and it uses $end instead of 511 so this change has been incorporated in D8.
Comment #65
mikeytown2 CreditAttribution: mikeytown2 commentedPatch from #57 applies cleanly to latest D7 so this is essentially the same patch. Will post the D6 one once this goes green.
Comment #66
catch#356399: Optimize the route rebuilding process to rebuild on write is related and specifically fixes on race condition where the variable gets empty after a lock_wait() on menu_get_item(), this might be a duplicate.
Comment #67
joelpittetShould this be a duplicate or should we re-open #356399: Optimize the route rebuilding process to rebuild on write for D7?
Comment #72
vijaycs85#65 still applies to current 7.x.