When error in a query occures inside a transaction (e.g. because of db_lock_table()), the query and all following queries (also queries that log that error) are ignored.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Cvbge’s picture

Status: Active » Needs review
FileSize
2.73 KB

This isn't good patch, but I have no idea how to make it better.

Example of the error:

Warning: pg_query(): Query failed: ERROR: current transaction is aborted, commands ignored until end of transaction block in /var/www/dt/d/includes/database.pgsql.inc on line 78

Fatal error: ERROR: current transaction is aborted, commands ignored until end of transaction block query: INSERT INTO dt.watchdog (uid, type, message, severity, link, location, referer, hostname, timestamp) VALUES (1, 'php', 'pg_query(): Query failed: ERROR: syntax error at or near "blah" at character 1 in /var/www/dt/d/includes/database.pgsql.inc on line 78.', 2, '', '/dt/d/node/2', 'http://localhost/dt/d/node/2/edit', '127.0.0.1', 1133382989) in /var/www/dt/d/includes/database.pgsql.inc on line 95

Some more insight on what's happening.

Normally you will use db_lock_table() for example like this:

db_lock_table('foo');
db_query('some code');
db_query('some other code');
db_unlock_tables();

db_lock_table() starts a transaction. It's necessary for table locking. Then there are some db_query() and then db_unlock_tables() does COMMIT.

If a db_query('foo')[1] fails, the php errit does trigger_error() which calls error_handler() which calls watchdog() which calls db_query('bar')[2] ;)
Because db_query()[1] failed, then db_query[2] and all following db_query() calls, until db_unlock_tables(), also fail (and produce ugly error for user). This means this error won't be logged, and 'some other code' won't be executed.

This

This patch fixes this by checking if query failed. If yes then the transaction is rolled back. This means that all queries between db_lock_table() and the query are reverted.
As an alternative, we could COMMIT in case of errors... But I think that's worse solution.

Another thing, should we reenter the transation and relock the tables? Again, I think no.

Currently we are locking tables in two places:
in cache_set():

  db_lock_table('cache');
  db_query("UPDATE {cache} SET dat_a = %b, created = %d, expire = %d, headers = '%s' WHERE cid = '%s'", $data, time(), $expire, $headers, $cid);
  if (!db_affected_rows()) {
    db_query("INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', %b, %d, %d, '%s')", $cid, $data, time(), $expire, $headers);
  }
  db_unlock_tables();

and in variable_set():

  db_lock_table('variable');
  db_query("DELETE FROM {variable} WHERE name = '%s'", $name);
  db_query("INSERT INTO {variable} (name, value) VALUES ('%s', '%s')", $name, serialize($value));
  db_unlock_tables();
ralfm’s picture

Status: Needs review » Active

I am wondering why do we need in this function a 'LOCK TABLE {%s} IN EXCLUSIVE MODE' when we start the transaction. I can not see an example of a situation where we need to define this.
-------------------------------------------
function db_lock_table($table) {
db_query('BEGIN; LOCK TABLE {%s} IN EXCLUSIVE MODE', $table);
}
-------------------------------------------

From the postgresql documentation:
'...... most PostgreSQL commands automatically acquire locks of appropriate modes to ensure that referenced tables are not dropped or modified in incompatible ways while the command executes. ......".

ralfm’s picture

I think function db_unlock_tables() should be modified. A suggestion would be something like this:

function db_unlock_tables($code) {
  if($code == '0')  
    db_query('COMMIT');
  elseif($code == '1')
    db_query('ROLLBACK');
}

The code you are talking about should also be modified, for example like this:


  db_lock_table('cache');

  if (!db_query("UPDATE {cache} SET dat_a = %b, created = %d, expire = %d, headers = '%s' WHERE cid = '%s'", $data, time(), $expire, $headers, $cid)){
     db_unlock_tables('1');
  }
  else{   
     if (!db_query("INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', %b, %d, %d, '%s')", $cid, $data, time(), $expire, $headers)){
      db_unlock_tables('1');
     else{
     db_unlock_tables('0');
     }
  }

I have not checked this code. This is a suggestion, but I think you get the idea. The same should be done in variable_set()

ralfm’s picture

More, more, ;-)

I think 'db_lock_table' and 'db_unlock_tables' are not good names in the postgresql version. We do not lock the tables, like in mysql, we start a transaction and this is quite different.

Shouldn't they be called, for example, 'db_start_transaction' and 'db_finnish_transaction' or something like this?

Cvbge’s picture

Status: Active » Needs review

Hi,
glad to see a comment ;)

The functions are called lock/unlock() because their main purpose is to lock tables. For mysql they only LOCK. For postgresql, you need to have a transaction. So the transaction is only a byproduct of table locking.

I believe that LOCK is needed to prevent two transactions trying to INSERT the same data. MVCC won't help with this.

About proposed changed to cache_set():
db_query() returns FALSE in case of failed (bad) query, and "query result" otherwise. It doesn't say if it modified (UPDATEd) anything.
And it still does not solve problems with failed queries inside of transaction block.

BTW: I've written before that "we could COMMIT in case of errors". That would not change anything, if query is bad (produces errors) then all transaction is automatically rolled back.

ralfm’s picture

The functions are called lock/unlock() because their main purpose is to lock tables. For mysql they only LOCK. For postgresql, you need to have a transaction. So the transaction is only a byproduct of table locking.

I think it is the other way around, locks are a byproduct of transactions. Transactions are there because postgresql is fully ACID compliance and changes in the database are Atomic, Consistent, Isolated and Durable, not because it is needed by a lock. Transactions in postgresql use different types of locks to make its work. Anyway, this was just a suggestion, because I think the function name was better for the mysql world than postgresql.

I believe that if you use Innodb tables instead of Myisam with mysql, you do not need to lock the table either because Innodb uses also transactions, it uses row-level locking instead of MVCC.

I believe that LOCK is needed to prevent two transactions trying to INSERT the same data. MVCC won't help with this.

MVCC will work out what to do with two transactions that want to modify the same data and running an EXCLUSIVE lock does not help in this situation. An EXCLUSIVE lock in a table will make things work ****much more slower with many users**** because only one process can update the table at the same time. This is one of the big problems with mysql/myisam tables that postgresql does not have with MVCC. If we used an EXCLUSIVE lock on a table with postgresql is like using mysql with myisam tables, many of the benefits of MVCC disappear.

Any UPDATE, DELETE, and INSERT command will adquire automatically a ROW EXCLUSIVE lock on the rows that are going to be changed to avoid problems with transactions trying to modify the same data, but other transactions can modify other data on the table without problems, so reading never blocks writing and writing never blocks reading or writing as long as they do not modify the same row (not table).

Maybe (I do not know without reading all the code) we need to use SELECT FOR UPDATE in some places to ensure the current validity of a row and protect it against concurrent updates but I am sure that we do not need EXCLUSIVE locks in our case.

About proposed changed to cache_set():
db_query() returns FALSE in case of failed (bad) query, and "query result" otherwise. It doesn't say if it modified (UPDATEd) anything.
And it still does not solve problems with failed queries inside of transaction block.

Ok, I misinterpret this, we run the insert only if the update modified anything. I thought that the insert only would be done if the update did not fail.

BTW: I've written before that "we could COMMIT in case of errors". That would not change anything, if query is bad (produces errors) then all transaction is automatically rolled back.

Ok, it sounds good, mea culpa, I can see that it is not necessary to run a rollback because the commit will run an automatic rollback if there are any errors in the transaction.

BTW: Another thing we should start thinking about is to run any insert/update and delete in the code inside a transaction to guarantee ACID compliance.

regards
Ralfm

Cvbge’s picture

Any UPDATE, DELETE, and INSERT command will adquire automatically a ROW EXCLUSIVE lock on the rows that are going to be changed to avoid problems with transactions trying to modify the same data, but other transactions can modify other data on the table without problems, so reading never blocks writing and writing never blocks reading or writing as long as they do not modify the same row (not table).

Right. But remember we have a UPDATE/INSERT pair. Let's assume that cache table is empty and 2 concurrent sessions try to cache_set('foo'), i.e. on the same key. It's possibile that they do:
UPDATE1
UPDATE2
INSERT1
INSERT2 <- error! INSERT1 already inserted 'foo'

UPDATE1/2 won't lock table/rows because it's empty.
INSERT2 will wait for INSERT1 to finish (or not, not relevant) and then try to insert duplicate key. MVCC won't help here.

Maybe (I do not know without reading all the code) we need to use SELECT FOR UPDATE in some places to ensure the current validity of a row and protect it against concurrent updates but I am sure that we do not need EXCLUSIVE locks in our case.

For the same reason (empty cache table or the key does not exists) FOR UPDATE won't change anything.

Cvbge’s picture

Damn, it's eaten my explanation.

So, writing briefly: you'll get error in the INSERT2 because INSERT1 already inserted 'foo'. INSERT1 will lock the table (or not, does not matter), INSERT2 will wait or not, but then it'll try to insert the same value again.

For the same reason FOR UPDATE won't help you - does not lock if there is no value.

ralfm’s picture

.... you'll get error in the INSERT2 because INSERT1 already inserted 'foo'. INSERT1 will lock the table (or not, does not matter), INSERT2 will wait or not, but then it'll try to insert the same value again.

This is like it supposes to work when using transactions, if they are trying to insert the same data, the second transaction will get a rollback because it fails but the table is updated with the right data by the first transaction.

I do not see the problem with this, the data is still consistent and many users can update the table at the same time (huge different in speed in busy systems).

chears
Ralfm

Cvbge’s picture

There would be errors in logs (I'm quite sure, I have to check this). You would have to differentiate somehow between errors caused by duplicated values and other errors. I think it's possibile, not sure how. Would all duplicate-value errors be the result of a concurrency?

How big would be the difference in speed? Do you have some measurements?

Would it be possibile to have it work with MySQL without problems?

If the speed difference is big it's certainly worthy to look at.

ralfm’s picture

Hello again :)

I had time to put together a test case. The code used is this and the results are here: http://folk.uio.no/rafael/drupal_pg_locks.html
$num = rand(1, 2000); and $sql = sprintf("BEGIN; "); are changed in the different tests.

<?php
$conn = pg_pconnect("dbname=test user=postgres host=localhost");

if ($conn) {
  $num = rand(1, 2000);
  
  $sql = sprintf("BEGIN; ");
  $result = pg_query($conn,$sql);
  
  $sql = sprintf("UPDATE test01 SET id2 = '%s' where id = '%s';", $num+1000, $num);
  $result = pg_query($conn,$sql);
  
  if (!pg_affected_rows($result)){
    $sql = sprintf("INSERT INTO test01 VALUES('%s','%s','%s');",$num,$num+100,$num+200);
    $result = pg_query($conn,$sql);
  }
  
  $sql = sprintf("COMMIT;");
  $result = pg_query($conn,$sql);
  
}
?>

test=# \d test01 
         Table "public.test01"
 Column |  Type   |     Modifiers      
--------+---------+--------------------
 id     | integer | not null
 id2    | integer | not null
 id3    | integer | not null
Indexes:
    "test01_pkey" PRIMARY KEY, btree (id)

I tested with 50 and 2000 rows tables and with 10 and 120 concurrent clients.

There would be errors in logs (I'm quite sure, I have to check this). You would have to differentiate somehow between errors caused by duplicated values and other errors. I think it's possibile, not sure how. Would all duplicate-value errors be the result of a concurrency?

In the result page you have the results of how many errors I get in every test. And yes, I think all duplicate-value errors will be the result of a concurrency.

How big would be the difference in speed? Do you have some measurements?

Now we have measurements. We get the best results without using locking when concurrency is high and the number of rows is also high

Would it be possibile to have it work with MySQL without problems?

I do not understand what do you mean with this.

If the speed difference is big it's certainly worthy to look at.

What do you think from the results? ;-)

Cvbge’s picture

Here are some thoughts:

1. It'd be good to get rid of LOCK TABLE because of mysql, some providers don't grant LOCK rights

2. Standard deviation is missing

3. What have you used to generate the results? I.e. any standard script/program?

4. The test checks only LOCK/BEGIN + UPDATE,INSERT + UNLOCK/COMMIT performance. I wonder how is it related to whole Drupal speed. I.e. maybe more meaningfull test would be to test drupal with LOCK vs with BEGIN only? We don't know how the rest of Drupal would influence the results. Furthermore,

5. Drupal uses LOCKs in 2 cases (for postgresql, more for mysql):
a) for variable_set() which simply deletes and then inserts small data. There are a lot of variable_set() calls.
b) in cache_set() which does update/insert of much larger data chunks. It's called less frequently than variable_set().
SELECT max(length(data)), min(length(data)), avg(length(data)) from {cache} on a recently set up site with maybe 10 nodes shows

  max  | min |        avg
-------+-----+--------------------
 89321 | 229 | 10521.666667

Another result, this time from a big drupal site (running on mysql): 45000 rows in {cache} and 111096 max, 7 min, 1552 avg on cache(data).
Maybe some tests with inserting/updating bigger chunks of data.

ralfm’s picture

I will try to put together a test case this week with your suggestions

Cvbge’s picture

I've done a simple test: LOCK vs without LOCK with 10 concurrent connections (I don't have enough memory to test higher concurrency).
I've tested a couple of different pages, but the speed was the same. Maybe I didn't tested it correctly, because I didn't get any errors (which should happen when 2 transactions try to insert duplicate PK).

Cvbge’s picture

Haha, this is funny. The errors (if there were any) were not logged because of this issue bug :)
I'll do correct tests later.

Cvbge’s picture

So I did, I think correctly this time. But still no errors. Maybe number of connections (100 to perform and 10 concurrent) is too low?

Cvbge’s picture

For now, I'll focus on fixing the error.
The first patch is still "valid" (might need to be updated), I still see no really better solution.

Note to myself: need to review all db_lock_table() calls and check if it needs to check if the queries succeded or not.

Cvbge’s picture

Killes wondered if it is possibile to allways do ROLLBACK in error_handler().

I think we could, but only for errors from the database.

That is, if error is triggered by PHP, we can't do ROLLBACK. The reason is that PHP might not die automatically after an error.
Let's look at following code:

db_lock_table('foo');
db_query('bar');
BLARGH; // some PHP that triggers error
db_query('baz');
db_unlock_tables();

'bar' query succeeds, BLARGH triggers error_handler(), we ROLLBACK the transaction which should not be rolled back, 'baz' is executed and succeeds. We have database in inconsistent state.

Can we differentiate between errors triggered by database and PHP errors?

It seems we could check for E_USER_WARNING error. In core all database errors are triggered using trigger_error(). But in contrib modules 1 module (fscache) uses E_USER_WARNING for not-database related error.

I think we should not make such assumption or rule, that with postgresql when locked a table you can use E_USER_WARNING only for database errors...

So the conclusion is: if we can't differentiate between database and PHP errors we can't allways ROLLBACK => we have to use global variables.

Cvbge’s picture

A problem which needs similar solution: http://drupal.org/node/53420#comment-80277

pwolanin’s picture

Component: postgresql database » database system

this issue of when/how to locak tables seems like it should be a priority for those worried about performance.

What about making the cache and variable tables InnoDB for MySQL so the LOCK won't be needed?

Perhaps this would go along with dropping support for MySQL 3.23 as i saw suggested recently?

paintbot’s picture

For Postgres, we should use transactions as much as possible and rollback on failure. For me, data integrity and acid compliant transactions are more important than speed. What good is speed if my database is messed up? It just gets messed up at warp speed. With financial data (ecom), this is important.

So for Postgres, we should not use db_lock_table, but do as ralfm suggested: db_start_transaction & db_end_transaction.

havran’s picture

I have (maybe) related problem with this issue. I logged in on my Drupal 5 beta 1 installation as admin. I have tried almost all functions (heavy testing :)) without problems and errors but when i logged out i give many DB warnings which seems related with cache table locking.

I had same problem with Drupal 5 cvs few days ago but if i tried complete new installation from scratch (i have erase database and create new) this problem go on.

Here is site:

http://www.fem.uniag.sk/drupal5/

havran’s picture

Through DEVEL module i discover query which make problems -> http://drupal.org/node/92503

coreb’s picture

Version: x.y.z » 5.x-dev

moving from x.y.z to 5.x-dev version queue. Looking at latest comments that appears to be the latest version where this is relevant.

drumm’s picture

Status: Needs review » Needs work

This puts postgres-specific code in common.inc:

+      db_query('ROLLBACK'); 
AlexisWilke’s picture

Note that we have the same problem in D6.

The best solution would be to accumulate all the errors in the watchdog() call, then on db_unlock_tables() to save all of those errors. This would require non-existent call backs.

Now, for those questioning the need for the lock, it is necessary in case two transactions (two people accessing the web site simultaneously) occur simultaneously. Only one of them must execute that code. There are many instances in the code that do not use the lock when it should. My question would be opposite: why isn't it checked everywhere?!

I hope that this will be fixed in D7 though.

Thank you.
Alexis

Damien Tournoud’s picture

Status: Needs work » Closed (won't fix)

I think it is safe to won't fix this. PostgreSQL is not properly supported by Drupal before Drupal 7, and we will not change a frozen API at this time for such a small user base.