I am facing some troubles while using memcache engine in shared mode.
After flushing Drupal cache, some data remain in the cache. It seems to be cache keys that were not referenced in the lookup entries.

I checked the code of cacherouter/engines/memcache.php and I think there is a bug in the delete() function.
The algorithm to delete all entries of a cache table in shared mode is:

  • Get lookup entry of the cache table to get all the keys of this table
  • Delete each key of the table
  • Lock the table
  • Flush lookup entry
  • Unlock the table

If a cache_set coming from another process occurs during the loop which deletes the keys, at the end of the algorithm the new entry will be in the cache but not referenced in the lookup entry. Such unreferenced cache entry will never be flushed during the next calls to delete().
I am wondering why the lock is not taken at the beginning of the algorithm like in the flush() function. Is there any reason for that or is it a bug ?

Comments

andypost’s picture

Version: 6.x-1.0-rc1 » 6.x-1.x-dev

For this purpose we use lock() method of cache class. There's some issues with locking but most of them are fixed in -dev.

Please examine other issues with locking in the queue

sdelbosc’s picture

I checked engines/memcache.php in -dev and it seems that delete() function has not changed. The algorithm which clears the cache is still the one explained in the description of the issue.

I think cache should be cleared like this:

  • Lock the table
  • Get lookup entry of the cache table to get all the keys of this table
  • Delete each key of the table
  • Flush lookup entry
  • Unlock the table

Am I wrong?

andypost’s picture

You are right! But this way leads to expensive locking while having a fast locking with probably failed get is optimal. You could try to implement "right" logic and test by yourself.

Suppose, that some cache miss is not a big deal...

jgalletta’s picture

We have tested the new algorithm with a large lookup entry (35,000 keys in the value). The lock has been kept during 1.5 seconds. This is half less than the 3 seconds limit in the lock() function.

Two remarks regarding our tests:

  • We have limited the number of keys in the lookup to 35,000 to be sure that the size of the value is less than 1MB (more information about that here)
  • The tests have been performed on a server with 16GB and 8*2GHz

I would say this is ok to go on with this algorithm.

andypost’s picture

Priority: Critical » Major

I think measuring locking in *seconds* is not acceptable.
/me still dreaming about removing lookups and change to memcache module strategy

sdelbosc’s picture

For sure, locking a table for more than one second is bad but:

  • it is better to lock longer and do not have bugs due to simultaneous calls
  • it happens only when the lookup entry is full which should not occur so much

As far as your last comment is concerned, are you talking about a possible integration between memcache module and cacherouter or do you mean that cacherouter should be redesigned to use the same strategy as memcache module?

atouchard’s picture

@Sylvain, Andrey dreams to changed the lookup philosophy :)
What is your opinion about this patch ?
The lock strategy submited by Sylvain is the good way and you are right on the time to lock table.

andypost’s picture

That's right - we need to clean in locking but this locking could never ends because of php execution time limit (lock will stay in memcache and never clean).

Current implementation works but probably could lead to inconsistency of lookup but not a deadlock so was chosen...