I'm posting this here because the original issue #238760: menu_router table truncated and site goes down was moved to D7 branch (then back to D6, then to D7 and over and over again...).
I think that is incorrect because the D6 issue was fixed and it doesn't appear as such in any place.
Although, I don't think the issue was fixed for Drupal 6.3 at all.
On the other hand there is this other issue #246653: Duplicate entry in menu_router: Different symptoms, same issue which is in fact a duplicate of #238760 (but how can it be if the second one is targeting D7?)
So, I think this would be a nice place to reference that the same issue is reflected in two ways and being treated for D6 and D7.
So please, refer to the discussion at ... where?
I think discussion may be followed in a centralized place, but issues should be kept separately and making reference to wherever such place for discussion might be.
As examples of the confusion provoked these are some question from the users:
from #238760
So if I understand the status of this issue : A patch has been commited but it do not completely resolve the problem, isn't it ?
I have 6.3 version of drupal and I still have this problem of duplicate entry in menu.inc on line 2371.
there is minor cleanup around the patch committed, which is why this issue is open. There seems to be a different bug, so someone should open a new issue and link to it
from #246653
During the discussion in #238760 two things were committed: first, a stop-loss solution in case a module performed badly and the rebuild operation failed (hence the title), second a mitigation for the race condition you are describing.
The fix that was release in 6.3 is only a mitigation. The real issue cannot go away until Drupal has a locking framework, ie. a way to prevent two batch operations (like menu_rebuild) to happen at the same time.
Please, treat D6 and D7 issues separately to allow each of them reach an independent state.
Keep the discussion in one centered place, I think #238760: menu_router table truncated and site goes down brings more info an alternative patch discussion.
Comments
Comment #1
arhak commentedNevertheless, for D6 solution it would be hard to follow #238760: menu_router table truncated and site goes down which in addition is targeting D7.
So, I will make this comment here replying many cites from that discussion.
In Drupal 6.3 I set timeout to 200, but it always can be repeated trying to enable a bunch of modules at once.
Current workarounds:
I don't agree. If this issue should be posted to the D7 branch then it should stand as an independent issue probably referencing this one. But this IS a D6 issue. And when it get fixed in D7 it will remain pending to be backported to D6.
It would be totally incorrect that someone review the D6 branch and see no critical bug on it, thus it would be declared cleared of issues.
Shortening the window is a start (maybe a good one) but definitively NOT the solution because the site still may go down.
I agree. If there are tons of docs stating the requirements then it can be used. Nevertheless, transactions are the answer.
But if it can't be done with transaction, then trying the lock will do for many sites, taking into account that the lock might fail on some hostings and then the code should be try-catched falling back to the "minimal window"-"race condition" fashion for those servers not having the lock permissions; it will be then a requirement violation, not a Drupal's bug.
Bad news if it's "almost", but again, when solved the problem in D7 this will still be a D6 active issue.
Is not a bad idea. Come on, progress is nocking the door; And BTW transaction is an old concept and requirement on DBs.
Don't blame the module. Certantly it was just trying to create a bunch of tables and timed out.
How many modules did you tryied to enable at the same time?
Isn't this catcheabe at all?
Catching it can allow to fallback to non-locking fashion minimizing the critical window.
To which version of D6?
This is why it shouldn't be changed to D7 branch.
This issue isn't fixed for Drupal 6.3
But it still happening on Drupal 6.3 when trying to enable several modules at the same time.
You should stop playing around with D6 -> D7 and D7 -> D6 is an issue of both with different solution, but even if it would be the same solution it must be known as 2 different issues. If the intention is to discuss the same issue for both cases then post in D7 issue that the discussion will take place in D6 or vice-versa. But one of the branches might achieve the solution while the other one might still open or maybe reopened.
Obviously reducing the window is a starting point for optimization, but not a solution at all.
I think a transaction is required even if it would have to be provided by some new patch code.
Is asking too much to say "transaction" when talking about DBs?
The same issue it is indeed!
Yes, once the menu_route table is empty and then filled in a loop the timeout breaks the loop and there is a menu_route half filled, when the menu_route is rebuilt for second chance then it finds a lot of duplicated ids.
Once again: Transaction is required.
Disagree. It is the same issue. Just the timeout happened in the middle of the loop and after that a rebuild was attempted. That's why the duplicate ids and that's why it get fixed anyway.
Sorry for the long post, but I have not time to recreate it by pieces.
I think this can be a quick reference to that discussion which turned to D7 (#238760: menu_router table truncated and site goes down)
Comment #2
arhak commentedresuming a list of related issues (several duplicated ones, all of them seems to be deviated to #238760 but that one is now just for D7, at least I hope they stop jumping from D6 to D7 back and forward):
#238760: menu_router table truncated and site goes down
#246653: Duplicate entry in menu_router: Different symptoms, same issue
#248739: Duplicate entry errors - lots of them
Nevertheless I think the D7 issue should be concentrated at #251792: Implement a locking framework for long operations while leaving other issues be D6
Other related issues:
#249185: Concurrency problem with variable caching leading to cache inconsistency
Comment #3
arhak commentedIn reply to http://drupal.org/node/246653#comment-947182
(please don't jump issues from 6.x to 7.x back and forward over and over, because they won't be traceable and this is an issue tracker)
Yes it is the same issue.
Lets see
if you rebuild the menu with devel module and the problem goes off (if you ever get with this issue it will be appearing over and over almost for sure) then we are talking about the same issue.
Different symptoms, same issue
Once the menu fail to be rebuilt different symptoms might come up.
1- site goes down / administrative task are unavailable / some URLs are unavailable
2- site keeps complaining about duplicated keys in menu_router table
The cause is a timeout or whatever might interrupt a non-Transactional manipulation of the data (when I run manual test looking for robustness I suddenly shutdown the MySQL server), this is (IMO) a critical flaw of Drupal 6 (don't know what about 5 but I think it has some sort of table locking)
1- If the menu doesn't get rebuilt then the menu_router table might be almost empty causing the unavailability of the site
2- If the menu get partial rebuilt but isn't flagged as totally rebuilt the site might be usable and in further request it will attempt to rebuild it, but as it is partially built then it will complain about duplicated keys.
Comment #4
arhak commentedPlease, this issue was originally posted for 6.x
if this isn't a duplicated, then find, mark it as such, but don't change it to next generation of Drupal, the issue is present in 6.x and as such must remain until marked as fixed, won't fix, or whatever. Once the issue is marked maybe as "won't fix" then incoming issues might be marked targeting 7.x, but for now, this issue is present in 6.x, it's a major issue and as such deserves to be answered (in the worse case as "won't fix")
I won't mark it as duplicated again because you don't agree, but I'm returning it to the 6.x where it belongs to.
See what happened on #238760: menu_router table truncated and site goes down jumping to 6.x and to 7.x over and over, it's not traceable.
Please read #289618: menu_router table truncated and site goes down: Transaction or Lock required for D6 again and try to agree with me why this issue must be kept on both 6.x and 7.x but as different issues (for tracking purpose) and maybe both referencing a unique discussion until they split up (sooner or later one of them will be marked as fixed or won't fix while the other one will not)
Comment #5
arhak commented- being afraid to use D6 on production: #254616: SIte corrupted with no administrative blocks any more after going to admin/build/modules
- another one bites the dust: #248286: Module Development and Potential Vulnerability -- {menu_router} table can be erased.
Comment #6
arhak commentedin reply to http://drupal.org/node/246653#comment-950307
Well, I disagree with pointing every issue to D7 because there are proposed and tested patches for D6 (follow the references in my previous comment) and then the issue is changed back and forward every time someone think has the solution for D6 and later someone change it back to D7 because the patch is rather a improvement but not a solution.
The deletion always succeed. The insertion is what fails because of timeout. Now, if the menu is half built and some change (module enabling/disabling, accessing modules pages I think causes it also, and other situations) triggers the rebuild menu task, then there will be duplicate entries, of course. Now I don't remember when I looked into the code, but for some reason, rebuilding the menu with devel most of the time succeeds, deleting the whole menu_router table and recreating it again, but in other situations a rebuild is attempted without deleting the table first.
There are proposed patches that improve this awful situation in several ways without providing a definitive solution:
- the first one reorders the code so the computation is done first and the delete is near the insertion (originally the deletion is first, then computation and later an insertion loop, the time window for failing such task is huge)
- the second one remove the deletion attempt and uses ALTER instead of insert (valid only when menu is augmented, wrong otherwise)
please, follow the pointed issues on my previous comment
Comment #7
arhak commentedThe problem continues arising (http://drupal.org/node/291056 "Admin Module page won't load and produces confusing error message")
So, this is the fast workaround for those stocked out of the site
- Create a repair.php file with the following code
- upload the file to the server
- access the page and wait
- if there is no timeout it would do the trick.
- if there is a timeout, go to php.ini and increase the max_execution_time and increase it's value (up to 600 if needed, even more if it's a slow PC, later you can tune this to your needs)
Comment #8
damien tournoud commentedThanks for mudding this issue a little more, arhak. Let me clarify a few things.
(1) Let me tell you again that during discussions around #238760, a patch went into the D6 branch, that later got released at 6.3. That patch moved the DELETE *after* the most intensive part of the calculation, in order to reduce the window during which the race condition could appear. This should mitigate the issue on D6.
With that patch, duplicate entry errors on menu_router can *only* happen when two requests try to concurrently rebuild the menu table and fall inside the (shorter) window of vulnerability. Those errors *cannot* happen if a previous rebuild was incomplete. What you are describing in this particular issue looks like something different (a timeout that can lead to menu not being completely rebuilt), and should *not* lead to duplicate entry errors.
(2) The process here, for Drupal core, is to fix issues affecting several branches in the *same* issue ticket. Having several separate tickets for the same issue is the best way to forget to fix an issue in one of the branches. Also, our process involves fixing the issue in the development branch (today Drupal-7), before backporting in order to always have an up-to-date development branch.
The process is not something you can or should discuss, at least not in the middle of an issue ticket.
Because this ticket deals with the same issue as #238760: menu_router table truncated and site goes down (it has the same exact title), I'm marking this one as a duplicate. Please don't reopen. If your problem is the same as #238760, it should be discussed there, regardless of the current version of that ticket. If you believe you are facing a different problem, please open a new ticket, this one is already muddy enough.
Comment #9
arhak commentedThat was what I thought at first, but they always come in pairs... why?
Comment #10
arhak commentedFor those trying to find out which is the real state of this issue I repeat it's discussion is at #238760
I replied there because must people won't read an issue marked as "duplicate" as it seems this issue will remain.
But those reading above that this issue is patched already in 6.3 I say IMO it isn't.
So I repeat here what I said in #238760
a shorter window of vulnerability is still a window and thus a vulnerability
How many administrators are allowed by Drupal?
If the answer is "only 1" then we can rely in a shortened window and a workaround documentation for exceptional cases.
But two or more administrators might be doing different stuff and thus causing a race condition.
That goes fine for the discussion about the issue as I stated on that issue of mine, but I also think that another opened ticked should exist to avoid forgetting for instance to backport it or whatever would be the final answer to the other branch.
That's why I don't agree with
but if it's that the way you go...
How do you keep track of what needs to be backported or fixed in older versions if all the issues are marked as duplicates in that branch.
OTOH, think I'm a newbie to Drupal (which is not far from the truth, think I'm another newbie) and I find such problem with Drupal, I go to the issue tracker and search the branch 6.x to see if some body saw that too.
What do I find then?
Only duplicates.
Why?
Because patches, fixes, and everything else is in D7 branch. Which I won't look for because I'm not a HEAD developer.
Two things:
- I strongly believe that being the same issue should be discussed one place, but having to possible different final solutions must be different issues. Once the fix is done for 7.x there will be a D6 issue pending for backporting or whatever it's own final solution would be (even a "don't fix")
- The issue wasn't solved with neither of those patches. I've being working with Drupal 6.3
Do you work with two browser. I guess you do, so do I. Now you know how I run into this race condition from time to time. Also, when working on any system that uses database one test realized before starting working is support for suddenly database server shutdown. Drupal 6.3 doesn't pass this test because the lack of transaction and locking.
I'm really sorry. I wasn't aware who had the truth. I just said what I thought was common sense. Look above this thread and tell me if this tracking system tells you for sure whether this is an issue present in 7.x, 6.x or both.
I always saw the issue tracking system as a tool which has that brief view on top letting me know quick and without reading the whole discussion the status of the branches. Allowing me also to query the health of a particular branch. If I find a cleared branch there is nothing more to talk about it: it's bug free! tested and approved by the community. Is that right?
My issue starts stating that is the same issue for a different branch, I also explain how D6 has patches improving it but not solving it. I also state that the discussion should be followed here. I made every issue/forum I found with the same or similar issue be aware of this discussion.
But I tell you again there is an issue in D6 which doesn't appear opened (for 6.3 I'm talking)
How ever searches the tracking system will find duplicated and the "original" ones are for D7. Thus, the issue will be opened over and over for D6, and other people will redirect them here and explain them why it's targeting D7.
Ok, this issue might be. But is a wrong policy for treating an issue queue. Same issues will keep showing up until people can search and found it reported, in the correct branch and with it's correct status, nobody will be hurt if the issues are different for live cycle and all of them point where the discussion must be.
Comment #11
kenorb commentedDamien #8:
if it's the same, so why people from this issue saying that's not:
http://drupal.org/node/238760#comment-1379898
And if it's fixed, there is still a problem with duplicates on 6.x.
Proper link to duplicate:
#333428: Duplicate entries in menu_router table (6.x branch)