Cache race conditions (manfests itself particularly on Drupal.org with the forum block).

Typical cache-using code does this:
1. Looks to see if there's something usable in the cache.
2. If not, generate a whole load of HTML, which is usually
expensive, or why bother caching it?
3. Stick the generated HTML in the cache using cache_set()

It's 3 that's the problem. cache_set() looks like this:

if (db_fetch_object(db_query("SELECT cid FROM cache WHERE cid = '%s'", $cid))) {
db_query("UPDATE cache SET data = '%s', created = %d, expire = %d WHERE cid = '%s'", $data, time(), $expire, $cid);
}
else {
db_query("INSERT INTO cache (cid, data, created, expire) VALUES('%s', '%s', %d, %d)", $cid, $data, time(), $expire); }

Now if two threads call it at about the same time, one thread can do the select and return negative so it thinks there's nothing there. Meanwhile, the other thread did exactly the same thing a millisecond ago and now is writing the value into the cache. By the time the first thread gets round to INSERTing the value, there's already one there, and you get a duplicate entry SQL value.

There are various solitions:
- Allow duplicate entries in the cache block for the same key. Pick the first available one, or the one with the largest timestamp.
- LOCK the row between the SELECT and the INSERT.
- Never delete cached data that's expired. Just keep it in the table but don't use it. That way you can always do an UPDATE and this problem goes away. Downside is that you'd need some way of registering stuff you're going to cache, so the row can be created in the first place, which is deeply ugly.

We should really LOCK the row. There are other places in Drupal where this would be a Very Good Idea, too. Is this hard to do in a cross-db-vendor way? I haven't had that much experience with row locking on multiple platforms.

Marking this NORMAL priority, as it's a bug I'm seeing increasingly much on drupal.org as we get more visitors. It'd be nice to fix this before we get hit by people when we do the 4.2 release and all the Freshmeater come and look. It's embarrassing to have SQL errors on your home page. Only local images are allowed.

Comments

marco’s picture

Title: forum » race conditions

PostgreSQL supports row level locking ("SELECT FOR UPDATE"); btw SELECT FOR UPDATE seems to be valid SQL92, e.g. this query:
SELECT 1 AS ignore FROM mytable WHERE 1=1 FOR UPDATE
is ANSI92 on SQL validator.

MySQL should support SELECT FOR UPDATE since 3.23.36 (Mar 2001) but I think it doesn't in MyISAM tables, which are still very common.

Both MySQL and PostgreSQL can lock tables (with a different syntax) but I don't know about other DBMS. Unfortunately neither Pear::DB nor ADOdb support this feature, but we could drop Pear::DB and write a database.pgsql.inc (some work has alredy been done) which, I'm sure, won't be slower than Pear::DB, and add a db_lock_table() function.

We could also pursue a completely different path, and use file locking. This is what I use now and it works, but needs correct file permission and couldn't work on a multithreaded server (it's implemented at the process level). I wrote two general functions which emulate semaphore handling; one advantage is that it locks just a part of the code instead of a whole table. Couldn't we do something like this using LOCK TABLE?

moshe weitzman’s picture

Title: race conditions » improved cache api - race conditions
Component: Base system » base system
Category: bug » feature

Is Jeremy's caching patch an adequate improvement for this problem? any other suggestions out there? moving this to a feature request since there is no obvious bug to fix.

bslade’s picture

Hmmm, it looks like this issue might be the same as http://drupal.org/node/10796 ("Cache bug in new drupal.org - with proposed solution"), which has been fixed as of Oct 2004 (probably in Drupal 4.6.x)

Just for history purposes, there are several forum postings complaining about sporadic "duplicate key" errors when inserting to cache:

http://drupal.org/node/27405
http://drupal.org/node/17029
http://drupal.org/node/3454
http://drupal.org/node/14108

Also see the related issue http://drupal.org/node/9620 ("Errors when writing to cache [option to not log them]")

Thanks
Ben in DC
PublicMailbox dot benslade dot com

luebbe’s picture

I can only confirm that this also happens with Drupal 4.6.2. We run a - not too busy site - (tortoisesvn.sourceforge.net) and since enabling the cache, I find one or two of these messages in the error log per day.

Turning the cache off gets rid of these messages as well.

The only (code) customization that I have is an additional cache_clear_all in cron.php, because otherwise some pages that contain php won't get updated for anonymous users once they are stored in the cache (see also http://drupal.org/node/23797).

I think that the 'cache' flag in the format filters isn't handled correctly. Php pages always get stored in the cache with the 'expired' date flag set. IMHO they shouldn't be stored at all if 'cache' is switched off, but that's - of course - debatable.

Is there any place where drupal removes expired entries from the cache? I haven't found one, that's why I added cache_clear_all to cron.php

bwynants’s picture

I Confirm that it still happens in 4.6.3, not to heavy site..., not happening often.

This time it happened for a loged in user, I was under the assumption cache was not used then....

Duplicate entry 'browscap:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)' for key 1 query: INSERT INTO cache (cid, data, created, expire, headers) VALUES ('browscap:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)', 'O:8:\"stdClass\":28:{s:9:\"useragent\";s:53:\"Mozilla/_.0 (compatible; MSIE 6.0%;%Windows NT 5.0%)%\";s:6:\"parent\";s:6:\"IE 6.0\";s:7:\"browser\";s:2:\"IE\";s:7:\"version\";s:3:\"6.0\";s:12:\"majorversion\";s:1:\"6\";s:12:\"minorversion\";s:1:\"0\";s:8:\"platform\";s:0:\"\";s:18:\"authenticodeupdate\";s:0:\"\";s:10:\"cssversion\";s:1:\"2\";s:6:\"frames\";s:1:\"1\";s:7:\"iframes\";s:1:\"1\";s:10:\"htmltables\";s:1:\"1\";s:7:\"cookies\";s:1:\"1\";s:16:\"backgroundsounds\";s:1:\"1\";s:8:\"vbscript\&qu in /opt/www/verhoevenw/web/www.goju-andy.be/includes/database.mysql.inc op lijn 66.

Duplicate entry 'menu:5:nl' for key 1 query: INSERT INTO cache (cid, data, created, expire, headers) VALUES ('menu:5:nl', 'a:3:{s:10:\"path index\";a:235:{s:16:\"admin/aggregator\";s:2:\"64\";s:26:\"admin/aggregator/edit/feed\";i:-2;s:30:\"admin/aggregator/edit/category\";i:-3;s:23:\"admin/aggregator/remove\";i:-4;s:23:\"admin/aggregator/update\";i:-5;s:21:\"admin/aggregator/list\";i:-6;s:25:\"admin/aggregator/add/feed\";i:-7;s:29:\"admin/aggregator/add/category\";i:-8;s:10:\"aggregator\";s:2:\"65\";s:18:\"aggregator/sources\";s:2:\"66\";s:21:\"aggregator/categories\";s:2:\"67\";s:20:\"aggregator/sources/4\";s:2:\"74\";s:25:\"aggregator/sources/4/view\";i:-13;s:31:\"aggregator/sources/4/categorize\";i:-14;s:30:\"aggregator/sources/4/configure\";i:-15;s:20:\"aggregator/sources/3\";s:2:\"7 in /opt/www/verhoevenw/web/www.goju-andy.be/includes/database.mysql.inc op lijn 66.

bslade’s picture

As mentioned by Al above, the "select, update if found, insert if not found" scenario is still a race condition.

Row locking won't do any good because you need to row lock on a row which doesn't exist yet. Table locking is needed (a read lock which prevents other threads from writing)

It might be possible to implement some sort of SQL table locking scheme using only generic SQL, or using some filesystem file semaphore, but you have to worry about the situation where a process holding the lock aborts in an uncontrolled fashion (eg. runs out of memory and can't even access an error handler routine to clean up the lock). This would hang up Drupal.

It seems to me, there needs to be different versions of this code snippet for different types of databases. Ie. use the above code snippet for generic SQL databases, but for MySQL have an "if" statement call a version of the insert statement where "insert" is replaced with "replace" (ie. if the row already exists, just overwrite it).

Database brand specific code is needed here, since it's beyond the capabilities of cross compatible SQL.

Ben in DC

dries’s picture

We use a table lock now. When different machines access the same database, a file-based locking scheme doesn't work.

bdragon’s picture

Version: x.y.z » 5.x-dev
Status: Active » Closed (won't fix)

Cache split, whole different beast now, yadda yadda.