Drupal currently doesn't provide a generic locking mechanism. As Drupal powered websites get more popular, the lack of locking results in increasingly frequent race conditions. This can result in error messages being cached for an indefinate amount of time.

Without locking, it is also possible for content to be edited by two people at the same time. Because of this, it is possible to loose important changes.

Attached is a patch that introduces 'lock.inc' and a new 'locks' database table allowing for generic locking functions. It also updates node administration to utilize this new locking functionality and present a powerful example of how this new include file can be used.

The attached lock.patch:

  - installs includes/lock.inc
  - includes lock.inc from includes/bootstrap.inc
  - adds lock disable/configure functionality to system.module
  - wraps administration of nodes in access lock, permitting only one user
    at a time to edit a node
     o lock is grabbed on first edit
     o lock is renewed with each preview
     o lock is freed with submit or delete
     o lock will timeout (auto-free) after configurable time period
  - should work with all node types (tested on story, poll and forum).
  - only locks administrative editing of nodes.

To test, use two web browsers to generate two seperate sessions.

More information available here.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

moshe weitzman’s picture

unless absolutely necessary to serve a cached page, the lock.inc should be included from common.inc, not bootstrap.inc.

Jeremy’s picture

FileSize
17.88 KB

Updated attachment to include 'lock.inc' from 'common.inc' instead of 'bootstrap.inc'. I can see locking being used to create a cached page, but not when actually serving the cached page...

Jeremy’s picture

FileSize
17.8 KB

I attached the wrong patch the second time -- it was unchanged from the first patch.

This attached patch includes 'lock.inc' from 'common.inc' instead of 'bootstrap.inc'.

Jeremy’s picture

FileSize
21.99 KB

This updated version of the lock patch is re-synchronized with the latest Drupal HEAD code, and includes the necessary updates to update.inc and the database files.

lock.patch modifies the following files:

  • database/database.mssql
  • database/database.mysql
  • database/database.pgsql
  • database/updates.inc
  • includes/common.inc
  • includes/lock.inc
  • modules/node.module
Anonymous’s picture

FileSize
21.4 KB

Resynced lock patch against HEAD. Review earlier posts for details

Changes:
- removed patch to no-longer-existent database.mssql file
- moved back to end of updates.inc

moshe weitzman’s picture

Dries: it boils down to, how badly do we need it?

Sounds like we need to post some use cases here. I'll start:

The forum block, the taxonomy_dhtml blocks, and more rely on the drupal cache API to store their data structures. They do so because these structures are expensive to generate, in terms of server resources.

When a user posts a new node, these cached structures are deleted. No problem so far. The problem happens when the next several users request a page. They might *simultaneously race* to become the user who generates the blocks and inserts into the cache table. This can spike the server in a bad way.

This patch enables forum and taxonomy_dhtml to add a lock such that only 1 user at a time tries to generate the block.

As a bonus, this patch comes with nice PHPDoc comments in lock.inc

Gábor Hojtsy’s picture

Locale cache (in core) and filtercache (contrib module) is also involved. We use these two caches as well on our site, and have seen ugly duplicate cache storage requests by the caching code...

Jeremy’s picture

Assigned: Unassigned » Jeremy

Also note that all that lock.inc could be added to core alone, without the rest of this patch. The functionality this patch adds to the node.module is simply one useful example of lock.inc. (However, I am quite happily running with this patch to KernelTrap.org)

If merged, my next step is to extend the lock api to also support file-based locks via file.inc.

Jeremy’s picture

FileSize
15.02 KB

This patch introduces lock.inc, and provides a generic locking mechanism. It wraps cache creation in a lock to prevent errors that have been reported before when two users simultaneous attempt creation of cache content.

It also provides the necessary database schema, updating update.inc, etc.

Changes to patch:
- resynchronized with CVS head.
- removed node lock functionality
- added lock around cache creation

To use:
1) checkout the latest version of Drupal from cvs
2) apply the patch from the top level "patch -p0 < lock.patch"
3) run update.php

This patch has been applied to KernelTrap.org (but for 4.4.1) without any problems.

killes@www.drop.org’s picture

I think this patch should be included. I occassionally see mysql errors due to the fact that Drupal attempts to insert cache for an already present cache ID.

Gábor Hojtsy’s picture

I often see those SQL errors, using a lot of stuff cached: archive, filtercache, locale, etc.

killes@www.drop.org’s picture

In the last coupld of days I've had 38 such errors. 30 of them related to the cache and 8 to the history table. I am now testing ths lock.inc patch and will see how many errors I get.

Jeremy’s picture

FileSize
966 bytes

Wonderful! Please let me know if you run into any problems, or if you have any suggestions after using lock.inc.

BTW: Here's a patch for Drupal 4.4.1 to add lock.inc to node.module, preventing history collisions. It even removes an extra db query that wasn't necessary.

Jeremy’s picture

FileSize
966 bytes

And here's the same node.module patch for CVS.

killes@www.drop.org’s picture

The patch for cvs was not neccessary, the 4.4 version applied just fine.

I observe the following:

- I get less (only one) cache insert errors.

-. I still get some history insert errors (| 7752 | user error: Duplicate entry '281-1116' for key 1)

- and: I get lock table insert errors:
| 8194 |user error: Duplicate entry '715778' for key 1 query: INSERT INTO locks VALUES('715778', '0.51026200 1086946323', 15, 1, 1)

Jeremy’s picture

FileSize
1.25 KB
The patch for cvs was not neccessary, the 
4.4 version applied just fine.

Yes, I didn't notice until I'd uploaded them both that they were identical... ;)

- I get less (only one) cache insert errors.
Wow, I really didn't think this would be possible. It implies you've hit a race where two lock attempts had the same microtime(). In all my testing, I've been unable to trigger this. In the attached patch I add a sleep that should prevent this.

-. I still get some history insert errors 
(| 7752 | user error: Duplicate entry '281-1116'
 for key 1)

I'm not going to worry about this for now. The history patch is new, and something I know very little about. I may have not put the lock in the correct place.

- and: I get lock table insert errors:
Do you get a lot if these? Basically, for every time you used to get a cache insert error, do you now get a lock insert error? The use of '@' in the db_query that inserts the lock should prevent this error. I assume you must have something different in your php.ini causing the '@' operator to be ignored, though I've been unable to find any such configurable. The attached patch manually emulates the same functionality, temporarily turning off logging.

Can you try the attached patch? It makes small changes to lock.inc.

Jeremy’s picture

Killes, is there any chance of you testing this with the updated patch?

I do not get any race conditions on my installation using lock.inc, and having you try this patch and then provide feedback will help me to understand what's different about your configuration and to create a fully working solution.

BTW: What OS/webserver are you using? What version of PHP?

Bart Jansens’s picture

The reason the @ operator doesn't work is because drupals error handler simply ignores it. The error handler uses the value of error_reporting only to decide whether or not the error should be printed. There have been patches for this before, but they were never applied for some reason.

Assuming this will be fixed someday, do we really need locking in cache_set? Basically the only change is that the errors now occur in lock.inc instead and that a few more queries are executed. Why not just use @db_query in cache_set?

oth the lock.inc is useful in many other cases, i've been using it for a few months now to prevent race conditions caused by simultaneous cron runs (poormanscron) and it works great, haven't seen a single race condition since.

Bart Jansens’s picture

Ignore that attachment, i didn't even attach it. Must be a project.module bug

killes@www.drop.org’s picture

Jeremy, sorry for the delay. I have now appied your patch against lock.inc.
I actually did only get very few errors in the last weeks (4) and those might be related to some other error not due to file locking at all..

Jeremy’s picture

Hi Killes,
A few quick questions:
1) are you seeing errors displayed when generating your page, or just in the watchdog logs?
2) did these error go away after applying the patch to lock.inc, or before?
3) what's an example of an error that you're not sure if is lock related?

A comment:
- lock.inc is in obvious need of a garbage collection function. If you do "select count(*) from locks;" you'll probably get a big number as a result.

killes@www.drop.org’s picture

1) I only saw those erros in the watchdog
2) I didn't see any errors after applying this last patch but this is only a few hours ago.
3) Example:

user error: Duplicate entry '352-1059' for key 1 query: INSERT INTO history (uid, nid, timestamp) VALUES (352, 1059, 1089978066) in /home/vdst.net/www/htdocs/includes/database.mysql.inc on line 97.

I only found three entries in the lock table. The site in question isn't very lively.

Jeremy’s picture

Killes, I misunderstood your earlier bug report. The lock patch will not fix the collisions showing up in the database. I just submitted another patch that will. The other patch would also benefit the lock patch, as then lock collisions would not have to be reported if the administrator isn't interested in superfluous errors.

Jerk Face’s picture

-1

I think this is the wrong direction to take Drupal. This adds a lot of complexity and doesn't solve the problem. Locks don't work well for interactive multiuser apps because a poorly behaved client can keep a lock out too long. Allowing locks to time out completely erases their usefulness.

For example:
Alice opens a node for editing (taking out a lock on it)
5 minutes pass, Alice is still writing.
The lock times out
Bob opens a node for editing
Bob saves his changes
Alice saves her changes
Bob's changes are gone.

We should deal with race conditions "lazily," as they happen, rather than explicitly locking resources. We should let the database handle locking. That's what it's good at. MediaWiki (the software used by Wikipedia) does things this way.

What should happen is something more along the lines of:
db_begin_transaction();
// Check if conditions are okay for this update
if ($conditions_ok) {
// run queries
}
else {
// Deal with race condition
}
db_commit_transaction();

Dealing with race conditions for editing a node could be as simple as throwing the user back into the edit screen and complaining that the node has changed since they last edited it. My revisions table patch has implementations of db_begin_transaction() and db_commit_transaction() for both MySQL and PostgreSQL.

Jeremy’s picture

FileSize
1.27 KB

First, it is important to understand what lock.inc accomplishes: it provides a database independent mechanism for temporarily obtaining exclusive rights to a resource. This is the sole objective, and I believe that this works perfectly.

Second, you are wrong in your assumption with Bob and Alice. Review the patch and you will see that actually Alice will be unable to save her changes if Bob sneaks in after the lock expired -- instead, she will get an error telling her that someone else has modified the content.

Third, your example does not actually prevent racing. It might minimize it, but two seperate people could both check if "conditions are okay" at the same time, and then they could both make their changes. One will win, and one will lose. That is where lock.inc comes in.

That said, the attached patch implements something like your "lazy method". It uses the existing 'changed' field for a node to detect if another user has modified the same node you are currently editing, and if so prevents you from saving your changes. In this case, you are returned to the preview page and see the error "This content has been modified by another user, unable to save changes."

Please consider this patch for inclusion into 4.5.

A small race condition remains which can not be solved without a lock. I will submit another patch to demonstrate the lock if this first patch is accepted. Even without the lock, this is a big improvement to minimize one user accidently walking on another user's changes.

Anonymous’s picture

There are two seperate issues here. One is having locks within the program. The other is locking nodes.

We obviously need locks within the program to prevent race. As you point out, in my example, we need to make sure that there is no race condition when people check that "conditions are ok." Database transactions/locking allow us to do that in a much simpler way that will still work across different databases, without inventing our own locking system. There is no point in re-inventing the wheel.

Locking nodes, on the other hand, is high level user interface behavior. It completely reduces usability. Editing conflicts are comlpicted to deal with, so we should try to minimize the number of times users have to deal with editing conflicts. "Locking" nodes doesn't do this. It makes a node uneditable for a period of time, so if I accidentilly click edit and leave the room, no one else will be able to touch the node until my lock expires. Telling people that they need to re-submit their post because their lock timed out is also very bad behavior. We should be as lazy as possible when dealing with editing conflicts in order to minimize the chance that the end user has to deal with it.

Jerk Face’s picture

That last comment was me. I forgot to log in.

Sorry.

drumm’s picture

I agree with a few of the others, lets keep this simple. No unnecessary UI or API should be added. That will make everyone's lives easier.

For a specific (not entirely thought out) idea maybe "branching" or "conflicting" versions could be built into the revisions system somehow with appropriate messages and UI flow.

Can you post a comprehensive and updated patch for a more detailed review? The code seems rather scattered now and I can't figure out what exactly to concentrate on reading to make a better review.

Besides node editing, what other use use cases would be possible for lock.inc in core Drupal and contributed modules?

Holmes Wilson’s picture

Title: Locking Support » Locking Support vs. true wiki functionality

From a users' perspective, I can't see how it makes sense to lock nodes for an arbitrary time period (here, 5 minutes) after somebody clicks, "edit". If two users both want to contribute to the content on a given node, we want them both to be able to contribute. There's no reason to force users to race the clock in submitting content (5 minutes is not that much time for an intelligent contribution). And it's ENTIRELY UNACCEPTABLE to have users lose the content they submit. If people are spending time making contributions to our site's content, and their changes are lost as soon as they submit them, that's going to demoralize the most helpful participants; if you're running a community-supported site, you can't just ask people for help and then make their contributions disappear into the ether. This patch violates both of those principles.

Consider the following situations:

Alice takes 15 minutes to finish an edit to a collaboratively editable part of the site. At 6 minutes, when the file is no longer locked, Bob comes along and makes an edit that takes him 5 minutes to finish. The system lets Bob begin his revision, and at 11 minutes Bob posts his revision. Four minutes later Alice unknowingly makes Bob's revision disappear into space. Not cool.

We could say "you have 5 minutes to submit your revision, or you might zap someone else's revision" but most people will probably just ignore that (after all, *their* work isn't at stake). We could require them to finish their revisions within 5 minutes, but that won't do much for the quality of our content. And say we extend this 5 minutes to another arbitrary--but longer--15 minutes, to give people time to write. Now say Alice starts a revision and gets an important phonecall, Bob wants to edit the file too, because it contains an embarassing error. Unfortunately for Bob, he's unnecessarily locked out for a good 15 minutes.

If your rules are "no potential editor gets turned away" and "nobody's changes get unconsciously destroyed", there's a rather simple solution to this problem: the Wikipedia model. If somebody edits the same page between when you start and finish editing, you get a screen that lets you compare your changes and merge them. Do a little experiment for yourself here: wikipedia.org. The merging screen is not nearly as pretty as it could be, but one could definitely make something better. But the main point is it doesn't violate the two principles "don't reject / demoralize your users" principles.

tangent’s picture

Bugzilla supports collision detection, without (I presume) locking. While I haven't looked at the code I am guessing that the time is captured when the form is generated and when the form is submitted a check is made to see if the node was updated between the form start time and the form submit time. I have experienced collisions in bugzilla before and the system simply notifies the user that another user updated the node before you and gives you an opportunity to resubmit.

Jeremy’s picture

> Database transactions/locking allow us to do that in a much simpler 
> way that will still work across different databases, without inventing 
> our own locking system. There is no point in re-inventing the wheel.

I disagree. MySQL 3.x does not support row-level locking nor transactions, and at this time Drupal supports MySQL 3.x.

> We should be as lazy as possible when dealing with editing conflicts
> in order to minimize the chance that the end user has to deal with it.

Indeed. Which is precisely what the patch attached to my previous comment implements.

> Can you post a comprehensive and updated patch for a more detailed review?

Yes, please refer to the patch attached to my previous comment. (The linked patch does not use any locking -- I have other plans for lock.inc, when time permits.)

> I am guessing that the time is captured when the form is generated and when
> the form is submitted a check is made to see if the node was updated between the
> form start time and the form submit time.

This is what I've done in the patch attached to my previous comment.

Jeremy’s picture

Title: Locking Support vs. true wiki functionality » two people editing a node at the same time can result in loss of first changes
Category: feature » bug
> Besides node editing, what other use use cases 
>would be possible for lock.inc in core Drupal and 
>contributed modules?

I'm renaming this issue to focus on node editing. I'll start a new issue for locking when I've had a chance to implement recent ideas. I'll address this question at that time.

Jerk Face’s picture

So, why not do table level locking for MySQL 3.x and transactions for databases that support it?

It seems to make more sense than writing our own locking system.

Jeremy’s picture

>So, why not do table level locking for MySQL 3.x and transactions for

Please, the patch in question does not have any locking involved at all. It is short, only a few lines, and prevents two people from accidently modifying the same node at the same time.

The patch was attached earlier in this thread. You can find it here.

moshe weitzman’s picture

Despite all the chatter in this issue, the final patch is small and useful IMO.

jhriggs’s picture

+1 This is a good solution, if only temporary while we may debate more thorough/extensive ways to handle it.

Jerk Face’s picture

I agree. The concept of this latest patch is good, but as Jeremy said, it still has race conditions.

Specifically, the following scenario could happen in node_submit():
* Alice starts editing the node
* Bob starts editing the node
* Alice starts saving the node.
* Bob starts saving the node.
* Alice's process gets control of the CPU first and passes the line"$node = node_validate($node);" in node_submit()
* Bob's process gets the CPU and completely validates and saves the node
* Alice's process has already passed the validation line, so it continues saving the node, overwriting Bob's changes.

A lock or transaction starting above the $node = node_validate($node); line and ending below node_save() will make sure that the two processes can't enter that region at once and prevent the race condition.

We should fix this before we commit it to CVS. If we don't fix the race conditions issue, then sites with heavy traffic will still lose data.

Jeremy’s picture

The previously attached patch closes the window from minutes, hours, even days, to a second or less. While not 100% perfect (because of the lack of lock), it closes the gap significantly. I say fix this one step at the time, and that this is the logical first step.

Putting back to patch, as I'd like to see this merged. Then, if there's interest from the core team members, we can worry about adding a lock.

Jerk Face’s picture

I'm not sure about that. From an end user point-of-view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point-of-view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point-of-view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point of view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often. A predictable bug is better than an unpredictable one.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point of view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often. A predictable bug is better than an unpredictable one.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point of view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often. A predictable bug is better than an unpredictable one.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point of view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often. A predictable bug is better than an unpredictable one.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jerk Face’s picture

I'm not sure about that. From an end user point of view, making the gap smaller is worse because it will make Drupal seem buggier for people who need to do multiple edits. As is, someone's post it lost every time there's an editing conflict. It's very noticable, and people can intuitively understand what's going on because that's how files work with programs that don't do locking. People who have to edit simultaneously can work around it. If we apply this patch without dealing with the race condition, most of the time it will work, but sometimes you'll lose a post. So, Drupal will seem flakier, even though it will be losing posts due to editing conflicts less often. A predictable bug is better than an unpredictable one.

I think it's a good policy, in general, not to commit known buggy code, even if it is arguably less buggy than the current code.

Jeremy’s picture

Adding locks to Drupal is a larger task, as there are many other places in Drupal that should be wrapped in a lock. Additionally, you will not be able to use the database's own locking functionality because this is not consistent from database to database (or even version to version). I do have a solution for this, but there has been little interest.

This patch simply focuses on cleaning up a logic bug. The bug is this: Alice opens a web page to edit it. She takes x minutes to make her changes. During that time, Bob open the same web page to edit it. He takes y minutes to make his changes. Alice finally finishes and saves her changes. Perhaps hours later, Bob finally finishes and saves his changes. Alice's changes have now been lost.

This error in logic is fixed by the formentioned patch without introducing locks, simply by utilizing an existing "changed" timestamp. The patch is only a few lines long.

You are talking another issue. You are talking about when Alice and Bob both click "save" at the same time. however, if you have not first addressed the problem above, then it becomes much more difficult to address two people clicking save at the same time. I believe we should fix the logic error before we fix the race.

Do you have any technical problems with the formentioned patch? Have you looked at it / tried it?

Jerk Face’s picture

I still say we shouldn't commit known buggy code to CVS, even if it fixes an existing bug.

That's not acceptable.

Jeremy’s picture

Component: base system » node.module
FileSize
1.29 KB

Resynch.

What it does:
User A opens node N and starts making changes. User A steps out of the room to grab a quick bite to eat, not yet finished editing node N. User B opens node N and starts making changes. User B saves changes to node N. User A returns from eating, and tries to save changes to node N. With this tiny patch, the node module will detect that node N has changed since user A opened it, and thus will display an error. Prior to this tiny patch, when user A clicked save, all of user B's changes would be lost.

How it works:
The patch utlizes the existing $node->changed field. Every time the node is previewed or saved, it first checks if $node->changed has been modified in the database. If so, this means that someone has modified the node and an error is displayed.

Why it's important:
If multiple people are trying to manage a website, it can be very frustrating when one person accidentally over writes another person's changes.

Dries’s picture

Committed to HEAD.

Anonymous’s picture

telex4’s picture

Version: » 4.6.5

Sorry for dredging this closed issue up again, but there is one other small change that would massively reduce frustration caused by this patch: display a warning message when the second person starts to edit the page, telling them that another person is currently working on it.

If you're working on a page where you're making non-trivial changes then you might waste an awful lot of time making the changes all over again when you find another person has slightly altered the content you've started to work on.

It doesn't help the first person, of course, if the second person saves their changes first. But it would be a nice improvement.