Closed (duplicate)
Project:
Drupal core
Version:
7.x-dev
Component:
postgresql database
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
28 Aug 2008 at 11:16 UTC
Updated:
16 Jun 2010 at 08:34 UTC
Currently Postgres does not have a Postgres-specific implementation of MERGE queries. Instead, we're falling back to the slow, 2-query mechanism in the base driver that has a small race condition in it. We therefore need someone to write a proper Postgres-specific implementation.
After talking to some Postgres gurus, we've determined that the correct way to do that is to write a custom Postgres function that we can then call. However, I know absolutely nothing about writing Postgres functions. If you do, please speak up and send in a patch! :-)
Comments
Comment #1
linuxpoet commentedCan you define the parameters for the patch? What are you going to pass the function? What do you expect it to return?
Comment #2
Crell commentedThe API to the merge query is already defined. key(), fields(), etc. I don't know enough about Postgres to say how such a Postgres function should work, but it needs to mimic the behavior of the method calls to the object.
Comment #3
Crell commentedMerge queries are now documented here, too: http://drupal.org/node/310085
Comment #4
markus_petrux commentedAt the bottom of the following page there is an example that might be a start for this.
http://www.postgresql.org/docs/current/static/plpgsql-control-structures...
And here's how it has been solved in the MySQL driver:
http://drupal.org/node/301501
HTH+ :)
Comment #5
markus_petrux commentedA few more references, taken from: http://wiki.postgresql.org/wiki/Todo
Another possible reference from the PostgreSQL manual:
http://www.postgresql.org/docs/7.4/interactive/rules-update.html
Comment #6
mgiffordIs there someone with the expertise to write and/or review a patch here? Seems like this could provide significant performance improvements.
Comment #7
mgiffordadding performance tag
Comment #8
mgiffordand also PostgresSQL sorry.
Comment #9
damien tournoud commentedI twitted about that this morning, and @jwpye suggested to use http://mbk.projects.postgresql.org/. It is not GPL but apparently covers our use case.
Comment #10
damien tournoud commentedHum, scratch that, it does only work with simple conditions.
Comment #11
Crell commentedRefiling...
Comment #12
josh waihi commentedwill a simple subquery do?
Comment #13
Crell commentedI have no idea, will it? :-) Right now, Postgres uses the generic merge query which is non-atomic, because it's a SELECT then if-else for INSERT or UPDATE. I don't know enough Postgres to say how best to make it atomic. If it works and doesn't suck performance-wise, I'm cool with it. :-)
If it can't be done, well, OK, it can't be done. But I'd hate to leave that situation in there if we don't have to.
Comment #14
josh waihi commented:S err.... looks like not. So a merge query for PostgreSQL (and SQLite?) will do:
4 queries to the database. But I don't see a better solution in 8.3. The fact that there is a transaction makes it atomic though. So I'm happy with the current implementation.
Comment #15
Crell commentedHm, good point. So it is atomic, just not the fastest. I'm OK with that.
I'm bumping this to Drupal 8, in case someone wants to look into it again there, but for now it's not a blocker for D7.
Comment #16
andypostsubscribe
Comment #17
andypostRight now db_merge() does not work for postgres at all
Why not just build a query like
Comment #18
Crell commentedIf db_merge() doesn't work, how is Postgres even functional at this point? :-) I dispute that claim.
I have no idea if that procedure would work or not. Josh?
Comment #19
andypostPostgres is broken now, details #636632: Decide on multiple database testing #13-14
Comment #20
josh waihi commentedIt doesn't seem to me that PostgreSQL is broken - well, at least not in this area. @andypost is this still an issue for you?
Comment #21
andypost@Josh I've tested today and everything works fine - setup (minimal) and testing (takes a lot of time). There's some broken tests for fields.
Comment #22
josh waihi commentedah yeah, there are a few exceptions and failures here and there. I need to get onto them. But if merger query isn't an issue, can we close this?
Comment #23
Crell commentedJosh: Your call whether we try to optimize the query in D7 or bump this issue to D8. As long as it works (the transaction *should* make it atomic, I think) I'm comfortable with what it is now.
Comment #24
andypostwhy not put this in stored procedure? this is a "thin" place which maybe caused a poor performance of pgsql
Comment #25
ivansb@drupal.orgThe simple begin update insert commit; is not going to work if you've several concurrent one running.
The only safe way is as pointed out in
http://drupal.org/node/301036#comment-1047050 (pg docs)
So a solution could be inspired to:
http://drupal.org/node/301036#comment-2050722
+ what's suggested by the pg manual
or what's suggested in pg manual + some php [1]
or what's suggested in the manual + execute (dynamic statement
built in plpgsql)
The loop is necessary to prevent 2 merges jumping one on the other.
Is MergeQuery used somewhere?
I'd say you need merging when you don't know if things are in or not (this is bad) and this mostly happen when you're loading external data on which you don't have too much control.
Merging is very expensive if done properly. I'm not aware about how MySQL does it but a) it does it in the wrong way or b) it does it slowly under concurrent execution.
I'd say that if MySQL does it properly the only things that it offer is syntactical sugar. So I wouldn't be that afraid of the fact that pg solution is "ugly/slow".
I'd promote too much the use of merge in core/contrib. I'd say it's only suited to importing external data.
Forgiving if you put stuff in your DB... is not good housekeeping in the first place.
Currently I'm trying to write a C extension to postgres for text similarity search, so I'm a bit busy.
But I think that once we chose the way it shouldn't be that hard to build up a solution.
eg.
You dynamically write the insert and the update statements in php and then modify the pg doc solution as:
[1]
Unfortunately sql functions don't have control structure/exception handling AND plpgsql has to be installed and can be installed just by superuser.
I wouldn't see it as a stopper since pg users are used to other pains when dealing with Drupal.
Surely the need to install plpgsql will have to be advertised in the INSTALL file for pg.
dynamic statement could be built in plpgsql in spite of being build in php... but php should run faster for text processing... and actually plpgsql is "rather slow" for things you wouldn't expect from a programming language (eg. assignment are quite slow).
Comment #26
Crell commentedMerge queries are used all over core in a number of key places where we by nature cannot know in advance if a record already exists: cache_set() and variable_set() for instance. The only way to emulate them generically we're already doing in the base implementation. The issue here is whether or not there is a faster/more atomic/better approach for PostgreSQL that is PostgreSQL-specific.
Comment #27
josh waihi commentedWell we could always lock the table while we so our select/insert/update in db_merge. That will ensure atomicity. Because we cannot assume that we have access to plpgsql or require it for that matter. @ivan, can that function be written without plpgsql? Also is it possible to do that with prepared statements?
Locking the table seems like the only thing we can do at the moment.
Comment #28
ivansb@drupal.orgI'm sorry I didn't check it properly where db_merge/upsert is used... but as Crell pointed out it is used in cache... and locking the table will cause a lot of contention.
We could see if the places where db_merge is used are subject to the same problem that actually require the loop in the pgplsql code.
Still plain sql functions don't have control logic.
So even the simpler
begin;
update ...
insert ...
commit;
is going to fail
Writing triggers is going to be a pain.
Considering what we were doing with {blocks} in previous D versions... atomicity has never been a priority.
But while most places can still miss atomicity we didn't have in the past locking cache is going to cause a lot of pain.
Well to be honest I'm actually surprised of the proliferation of db_merge once I checked.
Comment #29
Crell commentedPretty much everywhere we're now using a merge we used to be doing one of the following:
- Update, oh nothing was actually updated so insert. (Doesn't work in PDO for various subtle reasons, and is not atomic.)
- Is there already a value? Set it to 1. Else set it to value + 1. Also not atomic.
So what we're replacing wasn't atomic to begin with. It's now atomic in MySQL, but not in Postgres or SQLite. If we can make it atomic, great. How to do that, I defer to Josh as I don't actually use postgres. :-)
Comment #30
josh waihi commented@Crell, this creates a problem, If we can't archive atomicity for PostgreSQL and SQLite, then should we be using this function at all? 2/3 of our database support essentially won't support a feature. I'm interested in how MySQL is atomic, the syntax maybe one command but behind it must be doing more.
Comment #31
Crell commentedMySQL's implementation is atomic: http://lists.mysql.com/internals/33850 (I'm assuming that a MySQL AB engineer is going to be correct on that front.)
As I said above, the equivalent logic in Drupal 6 was not atomic on any database. In Drupal 7, the base implementation is non-atomic but the MySQL override is. If we can give PostgreSQL and SQLite their own custom implementations to make them atomic, great, but if not then they're at least no worse off than we were in Drupal 6.
It's a logical operation that we *must* have, for various systems to work. The API is there and is stable. We just need to make the implementation as good as we can make it.
Comment #32
josh waihi commentedbased on my research from #710926: Make simpletest concurrency work for PostgreSQL, the default implementation of db_merge (which PostgreSQL uses) isn't atomic.
I was thinking it would be easier to just try the insert, then upon the duplicate key violation, update the record instead. However, as we know this will destroy any transaction we're in and because Drupal doesn't properly layer transactions, we couldn't rollback just that failed insert.
However if we commited #669794: Use savepoints for nested transactions then we could do exactly that and make the default method faster and atomic. Two birds with one stone and a pretty neat implementation for transactions too!
Comment #33
andypostAs #669794: Use savepoints for nested transactions commited maybe it's time to merge this issue with #715108: Make Merge queries more consistent and robust because current mysql implementation is not atomic #7 [#715108]
Comment #34
catchMarking as duplicate per andypost.
Comment #35
chx commentedOn which planet postgresql and mysql issues are duplicate?
Comment #36
catchWhen they're fixed by the same issue?
Marking postponed instead, if for someone reason it's not fixed in the other issue, we can put this back to CNW.
Comment #37
Crell commentedThe other issue has evolved to be an overall cleanup of Merge queries so that the default implementation is no longer a lame placeholder, which means Postgres doesn't need an alternate implementation in the first place.
Yay, one less critical in the queue!