I believe I watched this happen this morning, although don't have a reproducible test case yet.

process 1 - does a variable_set(), clears the cache
process 2 - does another variable_set() with a different variable
process 3 - gets a cache miss from the variables cache, the SELECT * is blocked on the SELECT ... FOR UPDATE  from process 2
process 4 - does another variable_set() - continues to block process 2.

This is excerbated by the fact that process 3 can also acquire a lock before hitting the SELECT * - which means any further requests which get a variable cache miss are waiting on that one process before they can continue executing. If you have something else going on which is causing UPDATE queries to take a long time, and a few are issued at once, this situation could last for some time.

I don't see any reason why we have to use SELECT ... FOR UPDATE here instead of SELECT ... LOCK IN SHARE MODE, the latter issues exactly the same write lock on the row, without blocking reads, so in this case process 2's SELECT * would issue immediately.

I can only see one reference to IN SHARE MODE on the original issue that introduced this - http://drupal.org/node/715108#comment-3093918 and that wasn't discussed further. Attaching a patch which does that for MySQL here.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Crell’s picture

You know, I'm really starting to hate merge queries...

Is this an issue on anything other than MySQL? Does SHARED MODE exist on other DBs?

Also, the first docblock has a whitespace error on the first line. :-)

catch’s picture

FileSize
4.11 KB

postgresql has LOCK TABLE foo IN SHARE MODE - which is a write-only lock, but full table level. I couldn't find a SELECT .. FOR UPDATE in postgres but didn't look too hard.

A full table lock would actually be better than the current situation though - since a full table lock blocks both writes and reads, eventually letting a SELECT query through. In this case, you can have potentially infinite locking write queries (if issued against different or alternating variables), while reads are permanently blocked - since SELECT * FROM {variables}; is as blocked by a row level lock as it is by a table level one, but SELECT 1 FROM {variables} WHERE name = 'some_var'; won't be blocked unless it's the exact same variable each time.

Also, requests doing a lock_wait() will eventually get to PHP timeout, however PHP requests waiting on a SELECT .. FOR UPDATE are in mysql time, so they likely won't hit the PHP memory limit, might hit apache timeout though. The only thing that would stop this going on indefinitely is that to get to the point of issuing a variable_set(), the request has to get past variable_initialize(), so eventually the threads doing a variable_set() will finish, releasing the lock, and let a SELECT get through.

Variables may be the only table in core that does this, and also the vast majority of variable_set() will be updates (think the cache flush variables from cache.inc or css aggregation maps). So another option may be be to special case variable_set() to do UPDATE then INSERT and not use db_merge().

Cleaned up the whitespace.

Damien Tournoud’s picture

We decided to remove the lock completely in another issue.

Also, locks should be released as soon as the transaction is committed. So I guess you have been bitten by our poor-man transaction nesting implementation.

catch’s picture

OK postgres details here http://developer.postgresql.org/pgdocs/postgres/explicit-locking.html

SELECT FOR UPDATE appears to not block reads in postgresql. It also has SELECT FOR SHARE but I don't think that's relevant here. So this might be exclusive to MySQL.

Crell’s picture

Damien: Are you saying that #2 won't work? I'm not clear on the intricacies here...

catch’s picture

Priority: Normal » Major

Where's the issue that removes the lock?

Also I'm not sure if my original example is clear, but with the current implementation multiple locks can be claimed on the variables table at any one time via overlapping variable_set()

SELECT 1 FROM variables WHERE name = 'foo' FOR UPDATE;

does not block

SELECT 1 FROM {variables} WHERE name = 'bar' FOR UPDATE;

but both can block

SELECT * FROM variables;

That means you could have a potentially infinite combination of the first two, without the SELECT * ever getting to run. I'm bumping this to major, because although it's a rare edge condition, it can lead to an outage (or in the case I saw this, I think led to an outage being worse than it was otherwise while MySQL was having i/o issues). If there's an issue removing the lock entirely, and that patch is more viable than the one here and is likely to go in, then I'm fine with this being marked duplicate, but I'd like to read the other issue before we do that and copy some of the stuff from here over there.

Crell’s picture

Let's revisit this after #844186: Clarify merge queries lands, as it's a total rewrite of merge queries. :-)

catch’s picture

Status: Needs review » Needs work

That issue is in, and is still using SELECT .. FOR UPDATE. However the patch obviously won't apply now.

Crell’s picture

Priority: Major » Normal

Talking with webchick, this is enough of an edge case that it's not a release blocker. There are potential fixes that are not API breaks, either, so it could be fixed post-7.0. Therefore marking down to normal.

catch’s picture

Priority: Normal » Major

Major issues by definition can be fixed post 7.0. But things which take your site down are still major IMO.

Anonymous’s picture

maybe i'm just doing it wrong, but on a stock mysql 5.1 on ubuntu 10.10, i can't reproduce the read-lock.

if a cache miss in variable_initialize() was really blocked by the merge query implementation, then shouldn't i be able to take a site down by doing this, and just leaving it uncommitted:

mysql> TRUNCATE TABLE cache_bootstrap;
mysql> BEGIN;
mysql> SELECT 1 FROM variable WHERE name = 'site_name' FOR UPDATE;

but this doesn't hold up the next request - the select in variable_initialize() returns straight away. catch, what tx_isolation value do you have?

i have NFI how this translates to postgres.

catch’s picture

Priority: Major » Normal
Status: Needs work » Postponed (maintainer needs more info)

Hmm, interesting. I need to get back to this, but for now I don't think this is major, and I have other variable patches around that would make this irrelevant.

catch’s picture

Version: 7.x-dev » 8.x-dev
Priority: Normal » Major
Status: Postponed (maintainer needs more info) » Active

http://www.engineyard.com/blog/2011/3-common-rails-mysql-mistakes/

"2. Using SELECT FOR UPDATE".

Putting this back in the active queue until we have conclusive data either way.

catch’s picture

At least on my localhost the following is true:

Ran these from the command line with two separate sessions:

Session 1:
MariaDB [d7]> BEGIN;
Query OK, 0 rows affected (0.00 sec)

MariaDB [d7]> SELECT * FROM variable WHERE name = 'site_name' FOR UPDATE;
+-----------+----------------------+
| name      | value                |
+-----------+----------------------+
| site_name | s:12:"Site-Install"; |
+-----------+----------------------+
1 row in set (0.00 sec)


Session 2:
MariaDB [d7]> SELECT value FROM variable WHERE name = 'site_name';
+----------------------+
| value                |
+----------------------+
| s:12:"Site-Install"; |
+----------------------+
1 row in set (0.00 sec)

(returns immediately)

However:

Session 1:
MariaDB [d7]> BEGIN;
Query OK, 0 rows affected (0.00 sec)

MariaDB [d7]> SELECT * FROM variable WHERE name = 'site_name' FOR UPDATE;
+-----------+----------------------+
| name      | value                |
+-----------+----------------------+
| site_name | s:12:"Site-Install"; |
+-----------+----------------------+
1 row in set (0.00 sec)

Session 2:

MariaDB [d7]> SELECT value FROM variable WHERE name = 'site_name' FOR UPDATE;

Session 1:
COMMIT;

Session 2:
+----------------------+
| value                |
+----------------------+
| s:12:"Site-Install"; |
+----------------------+
1 row in set (12.08 sec)

The SELECT .. FOR UPDATE is locked until transaction commit.

Looking a bit further, this appears to depend on the transaction isolation type - http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_rea...

I don't have time to test all combinations of this today, but it looks like while this may block a straight SELECT with one of these configurations, you'd have to set that on purpose to trigger this.

Note a similar issue came up with the history table in the memcache queue #1080882: Pages served by MemCacheDrupal hold transactions open longer than the same pages served by DrupalDatabaseCache.

catch’s picture

Priority: Major » Normal
Status: Active » Postponed (maintainer needs more info)

Moving to normal/needs more info, not sure there's anything much to do here now at least until there are better steps to reproduce.

Damien Tournoud’s picture

Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)

In the standard isolation mode, SELECT FOR UPDATE clearly doesn't block SELECT. So let's close this (feel free to reopen if you bump into this again).