memcached implementation of D7 locking [#596908]

Comment	File	Size	Author
#46	memcache-lock.inc__4.txt	3.25 KB	catch
#45	memcache-lock.inc_.txt	3.25 KB	jeremy
#44	memcache_lock.patch	3.33 KB	catch
#42	memcache_lock.patch	3.46 KB	catch
#32	yh.patch	513 bytes	yhager
#30	memcache-lock.inc_.txt	3.09 KB	yhager
#28	memcache-lock.inc_.txt	3.1 KB	yhager
#27	memcache-lock.inc_.txt	3.1 KB	yhager
#10	memcache-lock.inc_.txt	3.73 KB	yhager
	memcache-lock.inc_.txt	3.69 KB	yhager

Comment #1

chx commented 3 December 2009 at 15:04

You have a race window between get and set. You must use add to begin with. That's atomic.

Log in or register to post comments

Comment #2

yhager commented 3 December 2009 at 21:51

but I only set if I have the lock, and I assume those two lines of code would take less than a full second.

Log in or register to post comments

Comment #3

chx commented 6 December 2009 at 19:10

A full second! Some get thirty page loads a second...

Log in or register to post comments

Comment #5

chx commented 6 December 2009 at 20:27

set is not atomic. add is. Edit: sorry I mean, set is not what you want. Two processes can run set and succeed. But only one add can succeed.

Log in or register to post comments

Comment #6

yhager commented 7 December 2009 at 17:03

I only get to this phase after I know I am the one with the lock, and no other process can get here for at least one full second. Also 'add' here should fail, since the key should exist at the server. I just want to extend its expiration.

Log in or register to post comments

Comment #7

chx commented 7 December 2009 at 19:05

Then try to add a lock_$key something that you remove when appropriate.

Log in or register to post comments

Comment #8

yhager commented 7 December 2009 at 20:45

I don't understand what scenario this is meant to protect.
I am first using 'add', which is atomic. Then I have the lock for 30 seconds. If I need to renew the lock, I check the key and expiration time. If I still have the lock for at least another second, then I can safely *set* the new expiration time.

adding another key around the 'set' just to remove it after would have no effect I guess.

Do you see something I am missing?

Log in or register to post comments

Comment #9

chx commented 14 December 2009 at 05:55

Oh OK, tricky. So on first run you run an add and add a lock uniq to this process and later on try to get your uniq lock. One last small thing however and then it's good to go:

Expiration time of the item. If it's equal to zero, the item will never expire. You can also use Unix timestamp or a number of seconds starting from current time, but in the latter case the number of seconds may not exceed 2592000 (30 days).

And it's an int. It's not ms perfect so when you set it to 0.001 it wont ever expire.

Log in or register to post comments

Comment #10

yhager commented 16 December 2009 at 11:20

Status	File	Size
new	memcache-lock.inc_.txt	3.73 KB

Wow... your coding review skills rock!

Here's a revised version with the change from #4, and changing expiration time to be an int (minimum of 1 sec).

Log in or register to post comments

Comment #11

yhager commented 16 December 2009 at 11:21

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #12

david strauss

he/him

commented 17 December 2009 at 00:48

Status:

Needs review

» Needs work

Unless there's a need to have locks persist from one request to another, I don't see the need for generating and using unique lock IDs. A script should be able to know internally whether its lock has expired, which is the sole use I see in this file for the lock IDs.

Log in or register to post comments

Comment #13

david strauss

he/him

commented 17 December 2009 at 00:50

Version:

6.x-1.x-dev

» 7.x-1.x-dev

Oh, and this is clearly a 7.x issue.

Log in or register to post comments

Comment #14

slantview commented 17 December 2009 at 01:07

TBH, this is not clearly a 7.x issue. #251792: Implement a locking framework for long operations has both a 6.x and 7.x patch. It looks like this lock.inc framework will be back ported into 6.x. Once the back port happens, this will become a 6.x issue as well.

Do you think that this should be added to 6.x before or after the backport? Clearly people are already using the 6.x lock.inc patch since that is how that issue was derived in the first place.

-s

Log in or register to post comments

Comment #15

slantview commented 17 December 2009 at 01:11

Also, in http://drupal.org/node/251792#comment-1943286 peter has 4 open tickets to which the locking framework is used. It is obviously necessary to have unique ids for locks since we don't know when the timing of those operations will occur. If you have a cron run and menu rebuild happen simultaneously, one may not be able to complete without releasing the lock first which will obviously be a significant performance hit.

Log in or register to post comments

Comment #16

david strauss

he/him

commented 17 December 2009 at 01:18

@slantview The standard is to fix it for HEAD first (with the issue targeted at HEAD) and then, if appropriate, switch to targeting the latest stable version (and so on).

one may not be able to complete without releasing the lock first which will obviously be a significant performance hit

What does this mean? Given the memcached architecture, I do not understand why the unique lock ID is necessary. Given the name of the key in memcached, we can already handle expiration and atomic capture. And as long as the current script knows when it acquired a lock (assuming it did), it can know whether said lock is eligible for renewal.

Log in or register to post comments

Comment #17

slantview commented 17 December 2009 at 01:36

What does this mean? Given the memcached architecture, I do not understand why the unique lock ID is necessary. Given the name of the key in memcached, we can already handle expiration and atomic capture. And as long as the current script knows when it acquired a lock (assuming it did), it can know whether said lock is eligible for renewal.

EDIT:

I get what you are saying. It doesn't matter because you key the memcache entry off of the lock name, not the lock id. Gotcha. I agree.

Log in or register to post comments

Comment #18

yhager commented 17 December 2009 at 06:16

Given the name of the key in memcached, we can already handle expiration and atomic capture. And as long as the current script knows when it acquired a lock (assuming it did), it can know whether said lock is eligible for renewal.

Consider the following scenario: Process A locks for a long time, A's lock expires. process B locks using the same key, then process A wakes up and asks for a renew. Without a unique lock_id, If B's lock have yet to expire, A will renew B's lock, thinking it's his own.

On a side note, I find the "lock renewing" capability adding more complexity than gain. I can live happily with a locking API that does not allow renew - but I guess this is not to be discussed in this issue.

Log in or register to post comments

Comment #19

david strauss

he/him

commented 17 December 2009 at 17:42

@yhager No. Process A is quite capable of tracking the age of its own lock without any central help. Just store a timestamp into a static or global and check that the lock has not expired before trying to renew.

Log in or register to post comments

Comment #20

yhager commented 17 December 2009 at 19:59

@David Strauss: Ok, I agree, I will work on it. However, I took include/lock.inc as an example code - and it does contain the lock_id code - why is it needed in the DB case, and not in the memcache case?

Log in or register to post comments

Comment #21

david strauss

he/him

commented 17 December 2009 at 20:41

@yhager lock_id is possibly unnecessary for the existing code, too. I just haven't checked.

Log in or register to post comments

Comment #22

david strauss

he/him

commented 17 December 2009 at 20:44

Title:

memcache imlementation of D7 locking

» memcached implementation of D7 locking

Log in or register to post comments

Comment #23

yhager commented 18 December 2009 at 08:58

@David Strauss: Thinking more about the pathological cases re lock_id removal, Let me know if you think these scenarios are benign/can be safely ignored/easily fixed/dead wrong:

* Process A locks, memcached flushes the key (manually/misconfiguration/memcached restart), Process B locks, Process A, thinking it has the lock, extends it.
* Process A thinks it has enough time to extend the lock, but the machine locks for a couple of seconds (spinlock/bad hw/other). If process B from another machine grabbed the lock, process A has no way to know that when he wakes up and extends the lock.
* Process A locks, then NTP modifies system clock a few seconds back. When process A wants to extend the lock, the lock might already have been taken by another process - Process A will extend the lock happily, leading again to a double-lock situation.

These might not be very common, but if the critical section protected by locking is indeed critical, it might lead to unexpected results.

Log in or register to post comments

Comment #24

yhager commented 18 December 2009 at 09:06

arguments against the scenarios outlined in #23 (yes, contradicting myself is fun :)
Since memcahed can evict lock entries when it wants to, we can never rely on it for critical sections, just for stampede protection. If this an assumption we can live with, it is not worth dealing with these scenarios.

Log in or register to post comments

Comment #25

david strauss

he/him

commented 18 December 2009 at 09:23

@yhager

Process A locks, memcached flushes the key (manually/misconfiguration/memcached restart), Process B locks, Process A, thinking it has the lock, extends it.

If memcached restarts, we'd have bad behavior, anyway, because a running script may think it has a lock it acquired while another script could freshly acquire the lock. This is outside of any renewal behavior. But, since we already get() the lock before renewing, our check would fail on memcached restart regardless of whether we use a lock_id -- unless another process has already inappropriately swiped the lock, but that means we've already failed.

Process A thinks it has enough time to extend the lock, but the machine locks for a couple of seconds (spinlock/bad hw/other). If process B from another machine grabbed the lock, process A has no way to know that when he wakes up and extends the lock.

The only difference with my proposal is that the expiration time would be tracked locally instead of in memcached. The system locking up for a few seconds wouldn't affect that. However, the verification of the expiration is against the system clock. If that's wrong, that check will fail regardless of having a lock_id.

Process A locks, then NTP modifies system clock a few seconds back. When process A wants to extend the lock, the lock might already have been taken by another process - Process A will extend the lock happily, leading again to a double-lock situation.

Same as the last item, this problem exists independently of whether we use a lock ID or not. The only change is the source of the expiration timestamp (local versus memcached).

Log in or register to post comments

Comment #26

yhager commented 18 December 2009 at 10:39

But, since we already get() the lock before renewing, our check would fail on memcached restart regardless of whether we use a lock_id

I thought that was redundant too. I'll put it back in my code and post a revised version soon.

The only difference with my proposal is that the expiration time would be tracked locally instead of in memcached

lock_ids provide another level of insurance, protecting against these and other edge cases. As I mentioned in #24, these are probably not too important to handle.

Log in or register to post comments

Comment #27

yhager commented 18 December 2009 at 11:07

Status:

Needs work

» Needs review

Status	File	Size
new	memcache-lock.inc_.txt	3.1 KB

See revised version.

Modifications:
* removed lock_id
* some minor fixes
* simplified renew code
* use $key and not $name upon renew
* lock_may_be_available() was wrong
* lock_release removes the lock from the static table regardless of memcached response

Log in or register to post comments

Comment #28

yhager commented 18 December 2009 at 11:14

Status	File	Size
new	memcache-lock.inc_.txt	3.1 KB

* removed unused $success variable

Log in or register to post comments

Comment #29

chx commented 18 December 2009 at 11:20

return !(bool)dmemcache_get($name, 'semaphore'); <= no need to cast (bool) when ! will do it anyways.

Log in or register to post comments

Comment #30

yhager commented 18 December 2009 at 12:22

Status	File	Size
new	memcache-lock.inc_.txt	3.09 KB

@chx: wasn't sure about that.

Changes:
* removed unnecessary bool casting.

Log in or register to post comments

Comment #31

david strauss

he/him

commented 18 December 2009 at 15:58

If you want to add a little more confidence to the checks, we could also store the expiration timestamp in memcached (instead of the dummy content item "semaphore"). The timestamp in memcached on the get() for renewal should match the locally tracked expiration time. It would at least protect against cases with lock acquisition separated by more than a second.

Log in or register to post comments

Comment #32

yhager commented 19 December 2009 at 00:13

Status	File	Size
new	yh.patch	513 bytes

Instead of looking at this as a fool-proof locking system, I am looking at this now as a protection against cache stampede only. Therefore, it is not so critical if a handful processes get the lock in extreme cases. As long as the stampede protection is kept - this is fine.

The attached patch will add this check to the code in #30. I am not rerolling because I am not sure which way you want it. As far as I am concerned, #30 is good enough in this regard.

Log in or register to post comments

Comment #33

crea commented 22 December 2009 at 05:27

yhager, isn't the cache stampede protection already a feature of Memcache API project ? I remember stampede protection mentioned in the code.

Log in or register to post comments

Comment #34

yhager commented 22 December 2009 at 18:29

@crea: memcache API contains *some* protection from stampede - when items already present in the cache. The locking API supplements that by protecting situations of cold cache as well.

Log in or register to post comments

Comment #35

crea commented 29 December 2009 at 18:19

My 2 cents.

Problem of missing lock because of evictions

It's possible to run separate Memcached instance for locks with an option to disable evictions ("-M"). For lock data, which is small, this makes sense: it's possible to keep memory usage small and under control. Though it is not really a solution, because it hides the consequence of problem instead of solving the problem itself (no memory).
Separate memcached instance for locks should be good protection against evictions on it's own . Site administrators also can monitor dedicated "lock" bin both in terms of memory usage and evictions (they really should).

Problem of missing lock because of flush or Memcached restart

It can be dealt with by using "hot cache" flag, which is add-ed on first "cold" Memcache request. Every lock operation would check if cache is hot or cold. If cache is cold, we can assume the worst, i.e. lock was acquired but lost due to flush/restart, so we assume it's locked for full lock timeout. This gives safe window for previous operations to finish. Don't know if there can be problems due to operations not being able to acquire the lock during this window. I guess this is safe, since this framework was implemented for long operations that can wait some time.

Hope this helps.

Log in or register to post comments

Comment #36

david strauss

he/him

commented 29 December 2009 at 20:44

@crea The "hot cache" flag is a clever idea.

Log in or register to post comments

Comment #37

crea commented 29 December 2009 at 21:43

It's rather cold -> heating -> hot. "Heating" would become that safe window. Implementing this in atomic way is tricky, though.

Log in or register to post comments

Comment #38

crea commented 29 December 2009 at 22:51

If we use dedicated Memcached instance for locks (which is required anyway to avoid evictions), we can assume that a lock can only disappear because of Memcached restart. So we can simply use Memcached uptime information. http://ru.php.net/manual/en/function.memcache-getextendedstats.php

If ($uptime >= $lock_timeout) {
  // Locking is safe.
}
else {
  // Assume lock was acquired but lost. Wait until timeout.
}

Log in or register to post comments

Comment #39

david strauss

he/him

commented 15 January 2010 at 16:58

@crea Extended statistics from memcached are not guaranteed to have any consistency between memcached releases.

Log in or register to post comments

Comment #40

drewish commented 5 March 2010 at 22:44

subscribing... excited to see that lock.inc made it into 6.x

Log in or register to post comments

Comment #41

catch

he/him

English

commented 20 May 2010 at 07:32

subscribing.

Log in or register to post comments

Comment #42

catch

he/him

English

commented 23 June 2010 at 07:58

Status	File	Size
new	memcache_lock.patch	3.46 KB

Here's #32 as a patch. Will be testing shortly.

Log in or register to post comments

Comment #43

carlos8f commented 24 June 2010 at 19:21

subscribing

Log in or register to post comments

Comment #44

catch

he/him

English

commented 25 June 2010 at 05:34

Status	File	Size
new	memcache_lock.patch	3.33 KB

Update patch which uses dmemcache_add() instead of accessing the object directly - was getting wrong number of arguments errors.

Log in or register to post comments

Comment #45

jeremy commented 25 June 2010 at 21:20

Status	File	Size
new	memcache-lock.inc_.txt	3.25 KB

I've made some changes:

the "lock object" wasn't being used for anything so I replaced it with a simple define
the call to dmemcache_key was redundant and incorrect, I removed it (the other dmemcache_ functions call it for you)
the call to dmemcache_set() was passing in parameters in the wrong order so the expire wasn't being set correctly, fixed
the call to dmemcache_add() was also passing in wrong parameters, again causing the expire to not get set correctly, fixed
added additional comments and made them comply with Drupal coding standards

Additional testing would be appreciated, then we can get this committed.

Log in or register to post comments

Comment #46

catch

he/him

English

commented 26 June 2010 at 02:07

Status	File	Size
new	memcache-lock.inc__4.txt	3.25 KB

One line fix for the define, was missing single quotes.

Log in or register to post comments

Comment #47

jeremy commented 28 June 2010 at 20:03

Version:	7.x-1.x-dev	» 6.x-1.x-dev
Status:	Needs review	» Patch (to be ported)

Committed in the the 7.x-1.x-dev branch:
http://drupal.org/cvs?commit=385410

Moving to 6.x, waiting to be ported...

Log in or register to post comments

Comment #48

iamer commented 7 August 2010 at 14:51

There is a port at http://drupal.org/node/874932 , can you please review it

Log in or register to post comments

Comment #49

jeremy commented 25 November 2010 at 17:50

Status:

Patch (to be ported)

» Fixed

A D6 port was committed here:
#874932: Memcache lock implementation for drupal 6

Log in or register to post comments

Comment #50

9 December 2010 at 18:00

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

memcached implementation of D7 locking

Comments

Problem of missing lock because of evictions

Problem of missing lock because of flush or Memcached restart