Download & Extend

Avoid deadlock issues on the {node} table

Project:Drupal core
Version:7.x-dev
Component:node system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed (fixed)
Issue tags:MySQL, needs backport to D7

Issue Summary

Problem/Motivation

The way we currently save node forces MySQL to lock the {node}.vid index longer then necessary, increasing the possibility of deadlock errors happening under load.

Test script:

<?php
for ($i = 0; $i < 5000; $i++) {
 
$node = (object) array(
   
'type' => 'article',
   
'status' => 1,
   
'uid' => 1,
   
'title' => 'Dummy node',
   
'body' => array(
     
LANGUAGE_NONE => array(
       
0 => array(
         
'value' => 'Dummy body',
         
'format' => 'filtered_html',
        ),
      ),
    ),
  );
 
node_save($node);
}
?>

Execute with concurrency (for example with seq 1 100 | xargs -L 1 -P 15 drush php-script stress-node.script), and you should see errors like:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {node} (type, title, uid, status, created, changed) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5);

Proposed resolution

The problem lies in the fact that we insert 0 (the default value) in the {node}.vid for a short period of time during save. As a consequence, MySQL has to lock the begining of index attached to the {node}.vid unique key until the transaction ends. This increases the window during which a deadlock is possible.

Proposed resolution is that we insert NULL instead. NULL not being comparable to anything, it doesn't participate in uniqueness constraints and MySQL doesn't have to lock the index.

API changes

None.

Comments

#1

Status:active» needs review

Patch for D8. No upgrade path is necessary for D8 as I expect this to be applied to D7 too.

AttachmentSizeStatusTest resultOperations
1369332-deadlock-node.patch570 bytesIdlePASSED: [[SimpleTest]]: [MySQL] 34,177 pass(es).View details

#2

Version:8.x-dev» 7.x-dev

Patch for D7 with the upgrade path.

AttachmentSizeStatusTest resultOperations
1369332-deadlock-node7.patch1016 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 1369332-deadlock-node7.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.View details

#3

Version:7.x-dev» 8.x-dev

#9

The test bot wants to apply the D7 patch to D8. Not sure why.

#10

This looks like an easy way to prevent deadlocks in heavily used node tables. I have seen this issue cropping up with a number of different D7 sites recently.

Are there any other places where the NULL work around could reduce locking?

#11

Version:8.x-dev» 7.x-dev

Trying 7.x test again

AttachmentSizeStatusTest resultOperations
1369332-deadlock-node7.patch1016 bytesIdleFAILED: [[SimpleTest]]: [MySQL] 37,180 pass(es), 68 fail(s), and 84 exception(es).View details

#12

I think that's the only place we play with a unique key like this.

#13

Status:needs review» needs work

The last submitted patch, 1369332-deadlock-node7.patch, failed testing.

#14

Status:needs work» needs review

I broke the upgrade path.

AttachmentSizeStatusTest resultOperations
1369332-deadlock-node7.patch1023 bytesIdlePASSED: [[SimpleTest]]: [MySQL] 37,321 pass(es).View details

#15

I believe this is an important fix for high traffic D7 users. I'm ready to RTBC, but will wait for some more feedback.

#16

There definitely needs to be a comment in node_schema()

#17

Add a comment referencing this issue with details on the use of NULL vs 0.

AttachmentSizeStatusTest resultOperations
node_deadlock-1369332-17.patch1.27 KBIdlePASSED: [[SimpleTest]]: [MySQL] 37,341 pass(es).View details

#18

This require testing for pgsql and sqlite too

#19

Fixed the comment syntax and formatting.

AttachmentSizeStatusTest resultOperations
drupal.node-vid-deadlock.19.patch1.24 KBIdlePASSED: [[SimpleTest]]: [MySQL] 37,372 pass(es).View details

#20

This works on a local sqlite db afaics (tests passed). Is anyone available to easily test pgsql? I can spin up an environment but it will take some time.

#21

Version:7.x-dev» 8.x-dev
Status:needs review» reviewed & tested by the community

Tested on PostgreSQL: works with or without the patch.

Here is a straight reroll for D8 without the upgrade function.

Seems ready to roll for me. RTBC-ed.

AttachmentSizeStatusTest resultOperations
1369332-drupal.node-vid-deadlock.patch1018 bytesIdlePASSED: [[SimpleTest]]: [MySQL] 34,141 pass(es).View details

#22

Excellent, looks good to me. Thanks Damien.

#23

This needs to be committed to D7 (#19) and D8 (#21). Hoping to get it in before the next release.

#24

Status:reviewed & tested by the community» needs work

This looks good but I think we can drop the @see, git blame is good enough for that. Nice find.

#25

Status:needs work» reviewed & tested by the community

Removed the @see, back to RTBC.

AttachmentSizeStatusTest resultOperations
drupal8.node-vid-deadlock.25.patch770 bytesIdlePASSED: [[SimpleTest]]: [MySQL] 34,584 pass(es).View details

#26

I'm seeing this error during installs of Drupal7 sites. Is there a patch for drupal-7.12? Thx.

#27

AttachmentSizeStatusTest resultOperations
1369332-27-d7.patch750 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 1369332-27-d7.patch. Unable to apply patch. See the log in the details link for more information.View details

#28

@dude4linux -- The issue is tagged for backport to D7, so there is a fair chance it will be included in the next D7 release if nothing else comes up. A D7 version is in #27 for convenience.

#29

@xjm -- Thanks, that patch worked for me. At least it allowed me to get past the errors and continue until the next problem caused a crash on out-of-memory. :)

#30

Don't we need an update hook for this in D7?

#31

Status:reviewed & tested by the community» needs work

The last submitted patch, 1369332-27-d7.patch, failed testing.

#32

Assigned to:Anonymous» webchick
Status:needs work» needs review

re: Update hook? Yes, totally, but given the current state of #1333898: Support unstable2unstable upgrades? [policy, no patch], it's completely and totally unclear to me which patches should have a D8 update and which not.

#33

Assigned to:webchick» Anonymous
Status:needs review» reviewed & tested by the community

#25 is the patch for D8, which doesn't need an upgrade path because it will be in the D7 to D8 upgrade path.

(until the policy is changed)

#34

Right, agreed. We need an update hook in the backport for D7, but not in the D8 patch.

#35

Version:8.x-dev» 7.x-dev
Status:reviewed & tested by the community» patch (to be ported)

Sorry, that's what I meant, sorry for the brevity. #27 isn't complete for D7 because it requires an update function. Since the current policy is still "no update hooks for head2head until beta", the D8 patch is good to go.

Committed and pushed #25 to 8.x, but the 7.x version still needs to be ported.

#36

D7 patch (from #19) without the @see

AttachmentSizeStatusTest resultOperations
drupal7.node-vid-deadlock.36.patch1.19 KBIdlePASSED: [[SimpleTest]]: [MySQL] 38,107 pass(es).View details

#37

Status:patch (to be ported)» needs review

#38

Status:needs review» reviewed & tested by the community

Looks good to me.

#39

Status:reviewed & tested by the community» fixed

Committed and pushed to 7.x. Thanks!

#40

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#41

Has this already been committed to the downloadable d7 dev release last updated on March 31st?

#42

@Gastonia: dev releases are build twice a day, so yes.

#43

Has this been rolled into a recommended release for core? I just upgraded to 7.15 yesterday and I'm still getting this error. It's locking us out of critical parts of our live site, so I'm trying to figure out how to fix it asap.

#44

Edit: Looks like this was committed in 7.14. I've also experienced this error since the patch (in 7.14), suggesting we're encountering a seperate but similar issue with similar symptoms.

Edit 2: Just occurred to me that this patch is specific to nodes, I think my crash was a different entity type. Is 0 or null the default default (sic) for other entities? If 0 is the default default, might it make sense to apply this logic to all entity types?

#45

I am still getting the error as of 7.15

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {location_search_work} (lid) VALUES (:db_insert_placeholder_0); Array
(
[:db_insert_placeholder_0] => 16636
)
in location_search_locationapi() (line 446 of /Applications/MAMP/htdocs/bab/web/sites/all/modules/location/contrib/location_search/location_search.module).

can anyone confirm that this is no longer an issue after applying the patch?

#46

Gastonia - it looks like, like me you're having an issue with a different type of entity to Nodes. Sounds like you're having the problem with Location entities, something to do with a Location Search module, which I'm guessing is part of or an extension of the Location module? I think my problem was with Relation entities.

So it would be great if someone could answer my question: is the 0 / NULL that causes this problem in Nodes the default default for entity types that don't have their own default? If it is, could there be serious consequences to re-working this patch, either:

  1. ...for the default entity objects, or...
  2. ...on a case-by-case basis in response to errors (e.g. for relations, for locations, etc)?

#47

#48

Am I interpreting correctly what alanomaly is saying above is that each module and maintainer is responsible for putting in the appropriate logic to prevent this from happening?

Should I put a copy of my error in the Locations project issue queue as well as others as this comes up for {there table}?

Our team has written serveral modules that use tables. Should we be rewriting our db code to avoid this? If so, what should we be looking for or incorporating? I am still constantly getting errors like this:

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {location_instance} (nid, vid, uid, genid, lid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4); Array ( [:db_insert_placeholder_0] => 1059 [:db_insert_placeholder_1] => 1059 [:db_insert_placeholder_2] => 0 [:db_insert_placeholder_3] => cck:field_item_location:1059 [:db_insert_placeholder_4] => 1000 ) in location_save_locations() (line 974 of /var/www/buyagainbaby.com/sites/all/modules/location/location.module).

We have an extremely high traffic site for input via a team of people using the node forms and coders batch uploading content as data comes in.

What can be done to address this?

Thanks

#49

@Gastonia - My knowledge of the Drupal entity system is pretty thin. But if you can prove that it is Location entities that cause the problem, then alerting the Location guys to the existence of the issue certainly can't hurt.

It's just a theory of mine that this issue exists at the generic entity layer and has been fixed only at the node layer - but it's a theory that fits all the available evidence so far. I'd suggest flagging it up with the Location guys who can hopefully cast an expert eye on the issue, and, if it's causing you a serious problem, maybe create an issue on the core queue opening discussion on getting a generic fix for all entities, not just nodes. Hopefully you can get input from the Location guys soon enough to be useful, and maybe you might see some useful progress on the Core issue in 2013.

Actually, scratch that last point - if you can demonstrate that this is indeed a generic issue with all entities, while the fix only applies to nodes, then I'd argue that the best action (in addition to flagging it up to the Location guys short-term) is probably re-opening this issue re-titled for Entity focus with 'Component' switched to 'Entity system', since relevant people are already following this thread and since relevant content and code is already in this thread. Potentially might get some useful action on it at core level this year that way (or, confirmation that my theory is wrong, which would also be useful).