Project:Memcache API and Integration
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

I've written this for version 5 (don't ask...) - but the code is version agnostic anyway.

#239798: Provide an interface for add would have been useful here...

AttachmentSize
memcache-lock.inc_.txt3.69 KB

Comments

#1

You have a race window between get and set. You must use add to begin with. That's atomic.

#2

but I only set if I have the lock, and I assume those two lines of code would take less than a full second.

#3

A full second! Some get thirty page loads a second...

#4

Status:needs review» needs work

You mean I got that condition backwards. Good catch!

How about if I change it to:

<?php
     
// renew if we have at least 1 second more. This is required to ensure atomic renew.
     
$success = ($result->lock_expire > $now + 1) && (bool)$mc->set($name, $lock, FALSE, $timeout);
?>

#5

set is not atomic. add is. Edit: sorry I mean, set is not what you want. Two processes can run set and succeed. But only one add can succeed.

#6

I only get to this phase after I know I am the one with the lock, and no other process can get here for at least one full second. Also 'add' here should fail, since the key should exist at the server. I just want to extend its expiration.

#7

Then try to add a lock_$key something that you remove when appropriate.

#8

I don't understand what scenario this is meant to protect.
I am first using 'add', which is atomic. Then I have the lock for 30 seconds. If I need to renew the lock, I check the key and expiration time. If I still have the lock for at least another second, then I can safely *set* the new expiration time.

adding another key around the 'set' just to remove it after would have no effect I guess.

Do you see something I am missing?

#9

Oh OK, tricky. So on first run you run an add and add a lock uniq to this process and later on try to get your uniq lock. One last small thing however and then it's good to go:

Expiration time of the item. If it's equal to zero, the item will never expire. You can also use Unix timestamp or a number of seconds starting from current time, but in the latter case the number of seconds may not exceed 2592000 (30 days).

And it's an int. It's not ms perfect so when you set it to 0.001 it wont ever expire.

#10

Wow... your coding review skills rock!

Here's a revised version with the change from #4, and changing expiration time to be an int (minimum of 1 sec).

AttachmentSize
memcache-lock.inc_.txt 3.73 KB

#11

Status:needs work» needs review

#12

Status:needs review» needs work

Unless there's a need to have locks persist from one request to another, I don't see the need for generating and using unique lock IDs. A script should be able to know internally whether its lock has expired, which is the sole use I see in this file for the lock IDs.

#13

Version:6.x-1.x-dev» 7.x-1.x-dev

Oh, and this is clearly a 7.x issue.

#14

TBH, this is not clearly a 7.x issue. #251792: Implement a locking framework for long operations has both a 6.x and 7.x patch. It looks like this lock.inc framework will be back ported into 6.x. Once the back port happens, this will become a 6.x issue as well.

Do you think that this should be added to 6.x before or after the backport? Clearly people are already using the 6.x lock.inc patch since that is how that issue was derived in the first place.

-s

#15

Also, in http://drupal.org/node/251792#comment-1943286 peter has 4 open tickets to which the locking framework is used. It is obviously necessary to have unique ids for locks since we don't know when the timing of those operations will occur. If you have a cron run and menu rebuild happen simultaneously, one may not be able to complete without releasing the lock first which will obviously be a significant performance hit.

#16

@slantview The standard is to fix it for HEAD first (with the issue targeted at HEAD) and then, if appropriate, switch to targeting the latest stable version (and so on).

one may not be able to complete without releasing the lock first which will obviously be a significant performance hit

What does this mean? Given the memcached architecture, I do not understand why the unique lock ID is necessary. Given the name of the key in memcached, we can already handle expiration and atomic capture. And as long as the current script knows when it acquired a lock (assuming it did), it can know whether said lock is eligible for renewal.

#17

What does this mean? Given the memcached architecture, I do not understand why the unique lock ID is necessary. Given the name of the key in memcached, we can already handle expiration and atomic capture. And as long as the current script knows when it acquired a lock (assuming it did), it can know whether said lock is eligible for renewal.

EDIT:

I get what you are saying. It doesn't matter because you key the memcache entry off of the lock name, not the lock id. Gotcha. I agree.

#18

Given the name of the key in memcached, we can already handle expiration and atomic capture. And as long as the current script knows when it acquired a lock (assuming it did), it can know whether said lock is eligible for renewal.

Consider the following scenario: Process A locks for a long time, A's lock expires. process B locks using the same key, then process A wakes up and asks for a renew. Without a unique lock_id, If B's lock have yet to expire, A will renew B's lock, thinking it's his own.

On a side note, I find the "lock renewing" capability adding more complexity than gain. I can live happily with a locking API that does not allow renew - but I guess this is not to be discussed in this issue.

#19

@yhager No. Process A is quite capable of tracking the age of its own lock without any central help. Just store a timestamp into a static or global and check that the lock has not expired before trying to renew.

#20

@David Strauss: Ok, I agree, I will work on it. However, I took include/lock.inc as an example code - and it does contain the lock_id code - why is it needed in the DB case, and not in the memcache case?

#21

@yhager lock_id is possibly unnecessary for the existing code, too. I just haven't checked.

#22

Title:memcache imlementation of D7 locking» memcached implementation of D7 locking

#23

@David Strauss: Thinking more about the pathological cases re lock_id removal, Let me know if you think these scenarios are benign/can be safely ignored/easily fixed/dead wrong:

* Process A locks, memcached flushes the key (manually/misconfiguration/memcached restart), Process B locks, Process A, thinking it has the lock, extends it.
* Process A thinks it has enough time to extend the lock, but the machine locks for a couple of seconds (spinlock/bad hw/other). If process B from another machine grabbed the lock, process A has no way to know that when he wakes up and extends the lock.
* Process A locks, then NTP modifies system clock a few seconds back. When process A wants to extend the lock, the lock might already have been taken by another process - Process A will extend the lock happily, leading again to a double-lock situation.

These might not be very common, but if the critical section protected by locking is indeed critical, it might lead to unexpected results.

#24

arguments against the scenarios outlined in #23 (yes, contradicting myself is fun :)
Since memcahed can evict lock entries when it wants to, we can never rely on it for critical sections, just for stampede protection. If this an assumption we can live with, it is not worth dealing with these scenarios.

#25

@yhager

Process A locks, memcached flushes the key (manually/misconfiguration/memcached restart), Process B locks, Process A, thinking it has the lock, extends it.

If memcached restarts, we'd have bad behavior, anyway, because a running script may think it has a lock it acquired while another script could freshly acquire the lock. This is outside of any renewal behavior. But, since we already get() the lock before renewing, our check would fail on memcached restart regardless of whether we use a lock_id -- unless another process has already inappropriately swiped the lock, but that means we've already failed.

Process A thinks it has enough time to extend the lock, but the machine locks for a couple of seconds (spinlock/bad hw/other). If process B from another machine grabbed the lock, process A has no way to know that when he wakes up and extends the lock.

The only difference with my proposal is that the expiration time would be tracked locally instead of in memcached. The system locking up for a few seconds wouldn't affect that. However, the verification of the expiration is against the system clock. If that's wrong, that check will fail regardless of having a lock_id.

Process A locks, then NTP modifies system clock a few seconds back. When process A wants to extend the lock, the lock might already have been taken by another process - Process A will extend the lock happily, leading again to a double-lock situation.

Same as the last item, this problem exists independently of whether we use a lock ID or not. The only change is the source of the expiration timestamp (local versus memcached).

#26

But, since we already get() the lock before renewing, our check would fail on memcached restart regardless of whether we use a lock_id

I thought that was redundant too. I'll put it back in my code and post a revised version soon.

The only difference with my proposal is that the expiration time would be tracked locally instead of in memcached

lock_ids provide another level of insurance, protecting against these and other edge cases. As I mentioned in #24, these are probably not too important to handle.

#27

Status:needs work» needs review

See revised version.

Modifications:
* removed lock_id
* some minor fixes
* simplified renew code
* use $key and not $name upon renew
* lock_may_be_available() was wrong
* lock_release removes the lock from the static table regardless of memcached response

AttachmentSize
memcache-lock.inc_.txt 3.1 KB

#28

* removed unused $success variable

AttachmentSize
memcache-lock.inc_.txt 3.1 KB

#29

return !(bool)dmemcache_get($name, 'semaphore'); <= no need to cast (bool) when ! will do it anyways.

#30

@chx: wasn't sure about that.

Changes:
* removed unnecessary bool casting.

AttachmentSize
memcache-lock.inc_.txt 3.09 KB

#31

If you want to add a little more confidence to the checks, we could also store the expiration timestamp in memcached (instead of the dummy content item "semaphore"). The timestamp in memcached on the get() for renewal should match the locally tracked expiration time. It would at least protect against cases with lock acquisition separated by more than a second.

#32

Instead of looking at this as a fool-proof locking system, I am looking at this now as a protection against cache stampede only. Therefore, it is not so critical if a handful processes get the lock in extreme cases. As long as the stampede protection is kept - this is fine.

The attached patch will add this check to the code in #30. I am not rerolling because I am not sure which way you want it. As far as I am concerned, #30 is good enough in this regard.

AttachmentSize
yh.patch 513 bytes

#33

yhager, isn't the cache stampede protection already a feature of Memcache API project ? I remember stampede protection mentioned in the code.

#34

@crea: memcache API contains *some* protection from stampede - when items already present in the cache. The locking API supplements that by protecting situations of cold cache as well.

#35

My 2 cents.

Problem of missing lock because of evictions

It's possible to run separate Memcached instance for locks with an option to disable evictions ("-M"). For lock data, which is small, this makes sense: it's possible to keep memory usage small and under control. Though it is not really a solution, because it hides the consequence of problem instead of solving the problem itself (no memory).
Separate memcached instance for locks should be good protection against evictions on it's own . Site administrators also can monitor dedicated "lock" bin both in terms of memory usage and evictions (they really should).

Problem of missing lock because of flush or Memcached restart

It can be dealt with by using "hot cache" flag, which is add-ed on first "cold" Memcache request. Every lock operation would check if cache is hot or cold. If cache is cold, we can assume the worst, i.e. lock was acquired but lost due to flush/restart, so we assume it's locked for full lock timeout. This gives safe window for previous operations to finish. Don't know if there can be problems due to operations not being able to acquire the lock during this window. I guess this is safe, since this framework was implemented for long operations that can wait some time.

Hope this helps.

#36

@crea The "hot cache" flag is a clever idea.

#37

It's rather cold -> heating -> hot. "Heating" would become that safe window. Implementing this in atomic way is tricky, though.

#38

If we use dedicated Memcached instance for locks (which is required anyway to avoid evictions), we can assume that a lock can only disappear because of Memcached restart. So we can simply use Memcached uptime information. http://ru.php.net/manual/en/function.memcache-getextendedstats.php

<?php
If ($uptime >= $lock_timeout) {
 
// Locking is safe.
}
else {
 
// Assume lock was acquired but lost. Wait until timeout.
}
?>

#39

@crea Extended statistics from memcached are not guaranteed to have any consistency between memcached releases.

#40

subscribing... excited to see that lock.inc made it into 6.x

#41

subscribing.

#42

Here's #32 as a patch. Will be testing shortly.

AttachmentSize
memcache_lock.patch 3.46 KB

#43

subscribing

#44

Update patch which uses dmemcache_add() instead of accessing the object directly - was getting wrong number of arguments errors.

AttachmentSize
memcache_lock.patch 3.33 KB

#45

I've made some changes:

  • the "lock object" wasn't being used for anything so I replaced it with a simple define
  • the call to dmemcache_key was redundant and incorrect, I removed it (the other dmemcache_ functions call it for you)
  • the call to dmemcache_set() was passing in parameters in the wrong order so the expire wasn't being set correctly, fixed
  • the call to dmemcache_add() was also passing in wrong parameters, again causing the expire to not get set correctly, fixed
  • added additional comments and made them comply with Drupal coding standards

Additional testing would be appreciated, then we can get this committed.

AttachmentSize
memcache-lock.inc_.txt 3.25 KB

#46

One line fix for the define, was missing single quotes.

AttachmentSize
memcache-lock.inc__4.txt 3.25 KB

#47

Version:7.x-1.x-dev» 6.x-1.x-dev
Status:needs review» patch (to be ported)

Committed in the the 7.x-1.x-dev branch:
http://drupal.org/cvs?commit=385410

Moving to 6.x, waiting to be ported...

#48

There is a port at http://drupal.org/node/874932 , can you please review it

#49

Status:patch (to be ported)» fixed

A D6 port was committed here:
#874932: Memcache lock implementation for drupal 6

#50

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nobody click here