When my site is very busy, I get duplicate entries for database keys. Let's deal with the 'node' table first as this is by far the most important:
Duplicate entry '0' for key 2
query: INSERT INTO node (vid, type, language, title, uid, status, created, changed, comment, promote, moderate, sticky, tnid, translate) VALUES (0, ...
... note that vid = 0 and I presume that 'key 2' is for vid.
Node.module always(?) does it's INSERT into the node table with vid = 0 (vid defaults to 0 in the node schema) through the function drupal_write_record(). But straight after, it does _node_save_revision which alters this vid.
HOWEVER... if between the drupal_write_record and the _node_save_revision another user is also inserting a node, its possible that mysql tries to insert this second node before the first node has had its vid changed away from 0. Hence the duplicate entry.
I've had this happen to me twice so far on the node table, and has caused my node_access for nid = 0 to get wiped, causing all kinds of errors until I add the default line back to node_access.
I also get duplicate entries from time to time (more often than for node) on the following tables:
- history
- cache_menu
- sessions
... I haven't looked into these in so much depth yet though.
| Comment | File | Size | Author |
|---|---|---|---|
| #56 | 230029-node-save-race-condition-rev2.patch | 3.22 KB | brianV |
| #53 | 230029-node-save-race-condition.patch | 3.32 KB | brianV |
| #40 | node-duplicate-entry-230029-39.patch | 1.44 KB | cburschka |
| #35 | 230029-duplicate-entry-vid-35.patch | 1.45 KB | salvis |
| #35 | 230029-duplicate-entry-vid-35.D6.patch | 826 bytes | salvis |
Comments
Comment #1
jakeg commentedapparently this is known as a 'race condition' and is similar to this:
http://drupal.org/node/213699
... which applies specifically to the sessions table (which also applies to me).
Comment #2
cburschkaSounds like you need to call db_lock_table there...
Comment #3
killes@www.drop.org commentedHere's a patch that tries to get around this. Not pretty, though. I really wonder how that code got in there.
Comment #4
damien_vancouver commentedI tested the patch from #3 and it works properly and appears that it will sidestep this issue.
node_revisions does not have a unique key on nid so it won't care if there is more than one row with nid=0.
I tested creating a node, making a 2nd revision of it and then making a new node again - in each case nid and vid were correctly saved.
The only issue is the patch applying to modules/node and not /.
Also I tested with MySQL 5, haven't tried another DB (though assuming the DB abstraction layer isn't broken it will work the same).
Someone should try it in postgres just to be sure and then a re-rolled one from / is RTBC I think.
Comment #5
jakeg commentedPatch seems to work for me too. Applied to my live 6.* site as otherwise this bug was basically making my site unusable for times until I re-added a line to node_access. Will let you know if it breaks or if I get the same errors back again.
Comment #6
jakeg commentedwith the patch applied to the 6.* site, the horrendous bug doesn't seem to have reared its ugly head again (yet). Neither are there other bugs which it seems to create, or any other new problems or changes with the site. Hence giving this a reviewed status.
Comment #7
damien tournoud commentedHum. This is ugly.
Why not changing the
{node}schema to make{node}.vidallow NULL values?Comment #8
damien_vancouver commentedIt's still just as ugly with the schema adjusted to allow nulls... If you're going to change the schema, then why don't we make it work properly for Drupal 7?
-It -should- be possible to insert a new node revision with just two inserts and no updates (and no transaction).
-The correct place to be determining which record is the newest revision is in the DB using SQL. Databases are really eficient at that sort of task. Instead we are doing it procedurally and making the database do a bunch of extra work by re-saving an indexed field. Plus it opens up the potential for race conditions and weird undefined behavior like this issue. Even if the DB doesn't explode from it, there shouldn't ever be bad data rows in there with vid=0 or nid=0!
PHP / webservers can and do die in the middle of this insert process leaving bad data rows in the database!
Those types of bad rows can really be a pain when updating your Drupal version, and cause strangeness later on (or several versions later on).
The way it is done now should only be happening inside an atomic DB transaction. If we aren't going to use a transaction, the inserts should be atomic...
I propose that we look at fixing it properly for HEAD:
-{node} does not need (and should not have) a vid field. It isn't necessary. The database can easily look up the current vid for us doing just as little work (and we save the database work on the insert, as explained below, and there is no more possibility of bad data).
-{node_revisions} can keep its vid field for use as a primary key, but each node_revision should also have a "published_timestamp" field which is the timestamp of the last time this revision was saved as published. This field should be indexed.
-Then the DB can easily and efficiently just look up the most recently published vid for that nid - i.e. the current revision.
Inserting one row with a NULL/0, then the second, then updating the first on an indexed field and forcing the DB to do a bunch of unnecessary busywork is what is ugly! :)
Also with the published timestamp approach you can insert new node revisions without extra saving. It resolves the race condition of two users publishing a node to see who gets to save its vid. Instead of the database having to update those indexes on {node} (the main entity table), it is just saving to node_revisions once per revision, and the newest one wins. (the same thing happens now but the database has to do a bunch of needless work updating indexes on {node} which really should not get affected like that if you're just saving a new revision in another table!)
meanwhile, a node_load we could still set the vid in the $node object, so it shouldn't even break anything as far as modules etc. are concerned. Users would see no difference (except less load on their databases).
I think an approach like I've suggested should be taken for Drupal 7. I can create that patch for review if people agree on the approach. I'm not so sure what would be involved for the schema update but with some guidance I could do that too.
For Drupal 6 I think we should just fix it using killes' from #3, because it doesn't require a schema modification for the next 6.x release.
Comment #9
jakeg commentedAgree - we need a fix for 6.* and we need it *now*. We cannot leave a critical bug hanging whilst we contemplate the best way to do things in the next release while live sites such as mine die because of it.
Killes patch is working fine my my site - busiest day yet for my site yet yesterday, and it didn't fall over again like it did before I applied his patch.
Get the critical 6.* patch in, then start talking about improvements for 7.*, or even another better patch against 6.* which may include db updates.
Comment #10
damien tournoud commented@damien_vancouver#9: I agree with you when you suggest that the solution to that sort of mess is implementing transaction support in Drupal. That is the next big step, and that is the route to follow in Drupal 7.
I also agree that this is one of the critical bug for Drupal 6.x, at that it should be solved as quickly as possible (one of the others is #223820: Book module tests for postgres error, which simply breaks book for PostgreSQL users).
However, I don't like the solution you suggested in #9 for Drupal 7. There is a one-to-one relationship between the node and its currently published revision, and that's why there is a
vidfield in{node}. That's all structurally sound. Putting the information about the "currently published revision" inside{node_revisions}would break that relationship, and thus break or decrease performance of most queries that try to filter / order or get information from the currently published revision of several nodes.Comment #11
bdragon commentedI actually ended up with a crashed site (luckily only a staging site) due to this...
A row with vid0 was left in my node table, and that prevented further insertion of nodes.
By the time I figured out what was going on, there were 46 nodes in {node_revisions} with nid 0 and no corresponding entry in {node}.
Not cool.
Comment #12
bdragon commentedFrankly, I'd solve the problem by making {node}.vid an INDEX instead of a UNIQUE KEY.
Comment #13
killes@www.drop.org commentedMoving to 7 so it can get applied there.
I don't think changing the table fedinition is a good idea. Drupal expectss the vid to be unique per nid.
Comment #14
damien_vancouver commented@damz,
OK, that makes sense about the 1:1 relationship. Also agreed that the correct way to fix it keeping that relationship is transactional support in D7.
back to this issue itself, I still vote for the fix from killes in #3 because it doesn't require a schema update.... seems to me that would be the more reliable way of fixing it for more people (and there is a lot less to go wrong - for us testing the schema update statements and site users managing to correctly update with update.php).
I feel there is some urgency to fix this before the next release, but maybe not... What happens if there is stuff like this outstanding and being tested/discussing and there is a new release for security reasons? Is there an effort made to review critical outstanding bugs and get them RTBC/committed first, or could a new security release just come out tomorrow with this bug still at large? If so I think we should hurry up and make up our minds here...
update: I see killes just bumped it to D7, that is encouraging!
Comment #15
jakeg commenteddamien: agree. without this patch applied to my server (it is at the moment) my whole site can pretty much go down at any time. killes patch works well, and doesn't involve changes to the scheme, and I can confirm that it works on a live busy site
Comment #16
damien tournoud commentedkilles' patch in #3 solves our immediate problem, and we are still in "reviewed & tested". This should get in as soon as possible.
Comment #17
jakeg commented6.2 just came out without this patch applied. I'm putting this back to 6.* so it gets added asap.
This is a massively critical patch. I cannot upgrade my site to 6.2 without manually patching this again. Other uses with busy websites will experience the same problems. If Drupal.org was on 6.* it too would have this problem.
I strongly recommend a 6.3 release with this patch applied as soon as possible.
Comment #18
bdragon commentedI still liked my idea better, but either way, we need this in. Seriously.
Comment #19
damien tournoud commentedSorry to bump that, but this bug would break any site with a moderate traffic...
Comment #20
damien tournoud commentedHere is a correct (patch level 0) reroll of Killes patch in #3.
Comment #21
gábor hojtsyLooks like this just moves stuff closer so that the occurance of the race condition would be less likely(?) It would be great to document this with the UPDATEs.
Comment #22
killes@www.drop.org commentedThe patch IMO exposes a problem with the schema API.
In Drupal 5 we did for a new node:
$node->nid = db_next_id('{node}_nid');
$node->vid = db_next_id('{node_revisions}_vid');
ie we reserved the IDs before doing anything else. I am not sure this is possible to do with schema API.
Drupal 6 does the saving (and the generating of the IDs) rather at the end of the function.
Comment #23
killes@www.drop.org commentedActually, I think that Gabor is wrong. The patch doesn't just merely make the race condition less likely, it avoids it alltogether because it changes the order of the writes to the node and node_revision table:
- drupal_write_record('node', $node);
_node_save_revision($node, $user->uid);
+ drupal_write_record('node', $node);
+ db_query('UPDATE {node_revisions} SET nid = %d WHERE vid = %d', $node->nid, $node->vid);
After the patch, the first write is to the node_revisions table where a unique vid is acquired (but nid is left as 0 (which doesn't matter for that table)). Afterwards we write to the node table and get our unique nid. As a ladt step we update the node_revisions table with that nid. The whole process is now not a race condition anymore because it doesn't matter if a nid gets a higher or lower vid than it's the nid before it.
I've changed the status again to RTBC. This patch should not miss the next Drupal 6 release. For Drupal 7 we might find a better solution.
Comment #24
david straussI think the proposed patch is semantically wrong. The insert sequence should be:
(1) In {node}: nid = $new_nid, vid = NULL
(2) In {node_revisions}: nid = $new_nid, vid = $new_vid
(3) Update {node}: vid = $new_vid
UNIQUE keys are allowed to have multiple NULL values. (NULL does not equal NULL in SQL.)
Comment #25
killes@www.drop.org commentedProblem is that our DB API doesn't allow for NULL to be inserted.
Comment #26
david straussOh, I should explain why the semantics of the proposed patch are wrong.
There are two implicit foreign keys at play here:
(1) In {node}, vid refers to {node_revisions}.vid
(2) In {node_revisions}, nid refers to {node}.nid
Foreign key (1) is actually just a denormalization. You can always look up its value as the maximum vid in {node_revisions} for the given nid. However, foreign key (2) is genuine: there is no way to recover its value.
Therefore, given that we have two tables which reference each other, I think the real foreign key should be the one respected by our implementation.
Comment #27
david straussIn an effort to not just be a pedantic annoyance, I'll go ahead and agree that this patch is a step forward and fixes the immediate issue with the race condition.
Comment #28
pwolanin commenteddoes the same patch apply for 7.x? I'm not sure we'll see it in 6.x if it doesn't go in HEAD first.
Comment #29
cburschkaUnless this solution is so good that we can use it as a long-term fix too, I'd put in the bugfix now and figure it out for D7 afterward...
Comment #30
gábor hojtsyRe killes in #25: I just committed http://drupal.org/node/227677 which lifted the limitation for drupal_write_record() to save NULLs (was a requirement for clean CCK integration), that solution might also work, and as David points out, it would be semantically more correct.
Anyway, in the interest of fixing this sooner then later, I committed the current patch to Drupal 6. Now moving to Drupal 7 for inclusion, and you decide whether you proceed with a nicer solution and backport that to Drupal 6 as well (depending on whether the NULL patch lands in Drupal 7, which might be delayed by PDO work).
Comment #31
mfbI cannot create a node in Drupal 6 after cvs updating and getting this patch. I haven't yet tested on a core-only sandbox, so it could be an interaction with contrib module like CCK.
I get an entry in the node table with a vid of 0, and no entry in the node_revisions table. A notice is printed, 'The post could not be saved.'
Comment #32
catchComment #33
gábor hojtsySee http://drupal.org/node/227677#comment-895644 which I submitted earlier. It might be that this patch needs to be rerolled, but I rolled that back, since the issue was with the NULL columns, not the ordering of the node saves, so that issue was the more relevant (and somewhat less important to have in Drupal core ASAP).
Comment #34
pwolanin commentedhere is a straight re-roll for 7.x
Comment #35
salvisRemove the cruft, too.
Comment #36
damien tournoud commentedGood catch, salvis.
Comment #37
arhak commentedsubscribing
Comment #38
dries commentedIt is not clear whether this patch is the best solution now we can insert NULLs with drupal_write_record(). Please advice.
Comment #39
Anonymous (not verified) commentedThe last submitted patch failed testing.
Comment #40
cburschkaLooks like fuzz only. Rerolled.
Comment #42
lilou commentedSee: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476
Comment #43
catchThis needs to be updated to use the new database layer.
Comment #44
panchoTo finally finish the D6 part of this issue, please don't forget to commit Salvis' cleanup patch for D6 which was already RTBC in July. That variable is indeed not needed anymore.
After this has been committed please switch back to CNW.
(I didn't change to D6 and minor, to make sure the critical D7 issue doesn't get lost.)
Comment #45
david straussWhy does #44 set this to "reviewed and tested"?
Comment #46
catchhttp://drupal.org/files/issues/230029-duplicate-entry-vid-35.D6.patch is the RTBC patch. I think we might need to split the D6 cleanup and the D7 issues into separate ones though.
Comment #48
david strauss@catch If the patches to D6 and D7 are not ports of each other, please do separate the issues.
Comment #49
pancho@DavidStrauss, catch:
Sorry, I thought we'd get that one quickly committed and then turn back to the D7 solution. After all there have been no posts for some weeks. Now I separated out the D6 cleanup to #344052: Cleanup node.module after #230029, so this is of no concern here anymore.
Comment #50
gábor hojtsy#344052: Cleanup node.module after #230029 was committed to Drupal 6, so this issue can be focused on Drupal 7.
Comment #51
brianV commentedThis is no longer relevant in D7 / DBTNG.
The entire node_save() process is done within a transaction, so a race condition like this can no longer occur. Yay!
Comment #52
david straussTransaction support only runs on MySQL if you explicitly enable it. So transaction support won't be broadly deployed enough to ignore problems occurring in non-transactional environments.
Comment #53
brianV commentedGood point, David.
In that case, patch attached, rerolled and uses DBTNG.
Edit: looks like my editor also auto-fixed some whitespace issues. Sorry about the noise in the patch, but I guess they are good to be committed...
Comment #55
pwolanin commentedLooks like a lot of unrelated whitespace changes in the patch.
Comment #56
brianV commentedThat's why we can't have nice things...
Comment #58
sun.core commentedLooking at http://api.drupal.org/api/function/node_save/7, it seems like this has been fixed elsewhere already.
Comment #59
pwolanin commentedIs there another issue?
I think this needs to be backported to 6.x
Comment #60
pwolanin commentedI'm very confused by the comment thread and whether there was a parallel issue.
Looking at node_save() in 6, seems like we do save a new node revision first- so maybe this is fixed?
Comment #61
slashrsm commentedThis could be related #1576716: Make node_save calls atomic