Currently Postgres does not have a Postgres-specific implementation of MERGE queries. Instead, we're falling back to the slow, 2-query mechanism in the base driver that has a small race condition in it. We therefore need someone to write a proper Postgres-specific implementation.

After talking to some Postgres gurus, we've determined that the correct way to do that is to write a custom Postgres function that we can then call. However, I know absolutely nothing about writing Postgres functions. If you do, please speak up and send in a patch! :-)

Comments

linuxpoet’s picture

Can you define the parameters for the patch? What are you going to pass the function? What do you expect it to return?

Crell’s picture

The API to the merge query is already defined. key(), fields(), etc. I don't know enough about Postgres to say how such a Postgres function should work, but it needs to mimic the behavior of the method calls to the object.

Crell’s picture

Merge queries are now documented here, too: http://drupal.org/node/310085

markus_petrux’s picture

At the bottom of the following page there is an example that might be a start for this.

http://www.postgresql.org/docs/current/static/plpgsql-control-structures...

And here's how it has been solved in the MySQL driver:

http://drupal.org/node/301501

HTH+ :)

markus_petrux’s picture

A few more references, taken from: http://wiki.postgresql.org/wiki/Todo

MERGE is typically used to merge two tables. REPLACE or UPSERT command does UPDATE, or on failure, INSERT. This is similar to UPDATE, then for unmatched rows, INSERT. Whether concurrent access allows modifications which could cause row loss is implementation independent. To implement this cleanly requires that the table have a unique index so duplicate checking can be easily performed. It is possible to do it without a unique index if we require the user to LOCK the table before the MERGE.

Another possible reference from the PostgreSQL manual:

http://www.postgresql.org/docs/7.4/interactive/rules-update.html

mgifford’s picture

Is there someone with the expertise to write and/or review a patch here? Seems like this could provide significant performance improvements.

mgifford’s picture

Issue tags: +Performance

adding performance tag

mgifford’s picture

Issue tags: +PostgreSQL

and also PostgresSQL sorry.

damien tournoud’s picture

I twitted about that this morning, and @jwpye suggested to use http://mbk.projects.postgresql.org/. It is not GPL but apparently covers our use case.

damien tournoud’s picture

Hum, scratch that, it does only work with simple conditions.

Crell’s picture

Component: database system » postgresql database
Issue tags: -PostgreSQL

Refiling...

josh waihi’s picture

will a simple subquery do?

Crell’s picture

I have no idea, will it? :-) Right now, Postgres uses the generic merge query which is non-atomic, because it's a SELECT then if-else for INSERT or UPDATE. I don't know enough Postgres to say how best to make it atomic. If it works and doesn't suck performance-wise, I'm cool with it. :-)

If it can't be done, well, OK, it can't be done. But I'd hate to leave that situation in there if we don't have to.

josh waihi’s picture

:S err.... looks like not. So a merge query for PostgreSQL (and SQLite?) will do:

BEGIN;
SELECT .....
INSERT ....
-- OR
UPDATE ....
COMMIT;

4 queries to the database. But I don't see a better solution in 8.3. The fact that there is a transaction makes it atomic though. So I'm happy with the current implementation.

Crell’s picture

Version: 7.x-dev » 8.x-dev

Hm, good point. So it is atomic, just not the fastest. I'm OK with that.

I'm bumping this to Drupal 8, in case someone wants to look into it again there, but for now it's not a blocker for D7.

andypost’s picture

subscribe

andypost’s picture

Version: 8.x-dev » 7.x-dev

Right now db_merge() does not work for postgres at all
Why not just build a query like

$BODY$
DECLARE sql text;
DECLARE sql2 text;
BEGIN
  sql := 'UPDATE '|| quote_ident($1) ||' SET field1 = ' || $3 || ', field2 = ' || quote_literal($4) || ' WHERE PrimaryKey = ' || quote_literal($2) || ';';
  EXECUTE sql;
  BEGIN
    sql2 := 'INSERT INTO '|| quote_ident($1) ||' (PrimaryKey, field1, field2) VALUES ('''|| $2 ||''' , '|| $3 ||' , '''|| $4 ||''');';
    EXECUTE sql2;
    EXCEPTION WHEN unique_violation THEN
  END;
END;
$BODY$
Crell’s picture

If db_merge() doesn't work, how is Postgres even functional at this point? :-) I dispute that claim.

I have no idea if that procedure would work or not. Josh?

andypost’s picture

josh waihi’s picture

It doesn't seem to me that PostgreSQL is broken - well, at least not in this area. @andypost is this still an issue for you?

andypost’s picture

@Josh I've tested today and everything works fine - setup (minimal) and testing (takes a lot of time). There's some broken tests for fields.

josh waihi’s picture

ah yeah, there are a few exceptions and failures here and there. I need to get onto them. But if merger query isn't an issue, can we close this?

Crell’s picture

Josh: Your call whether we try to optimize the query in D7 or bump this issue to D8. As long as it works (the transaction *should* make it atomic, I think) I'm comfortable with what it is now.

andypost’s picture

why not put this in stored procedure? this is a "thin" place which maybe caused a poor performance of pgsql

ivansb@drupal.org’s picture

The simple begin update insert commit; is not going to work if you've several concurrent one running.
The only safe way is as pointed out in
http://drupal.org/node/301036#comment-1047050 (pg docs)
So a solution could be inspired to:
http://drupal.org/node/301036#comment-2050722
+ what's suggested by the pg manual
or what's suggested in pg manual + some php [1]
or what's suggested in the manual + execute (dynamic statement
built in plpgsql)

The loop is necessary to prevent 2 merges jumping one on the other.

Is MergeQuery used somewhere?
I'd say you need merging when you don't know if things are in or not (this is bad) and this mostly happen when you're loading external data on which you don't have too much control.
Merging is very expensive if done properly. I'm not aware about how MySQL does it but a) it does it in the wrong way or b) it does it slowly under concurrent execution.

I'd say that if MySQL does it properly the only things that it offer is syntactical sugar. So I wouldn't be that afraid of the fact that pg solution is "ugly/slow".

I'd promote too much the use of merge in core/contrib. I'd say it's only suited to importing external data.
Forgiving if you put stuff in your DB... is not good housekeeping in the first place.

Currently I'm trying to write a C extension to postgres for text similarity search, so I'm a bit busy.

But I think that once we chose the way it shouldn't be that hard to build up a solution.

eg.

You dynamically write the insert and the update statements in php and then modify the pg doc solution as:
[1]

CREATE FUNCTION merge_db(insert TEXT, update TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
	EXECUTE update;
        IF found THEN
            RETURN;
        END IF;

        BEGIN
            EXECUTE insert;
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- do nothing
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

Unfortunately sql functions don't have control structure/exception handling AND plpgsql has to be installed and can be installed just by superuser.
I wouldn't see it as a stopper since pg users are used to other pains when dealing with Drupal.
Surely the need to install plpgsql will have to be advertised in the INSTALL file for pg.

dynamic statement could be built in plpgsql in spite of being build in php... but php should run faster for text processing... and actually plpgsql is "rather slow" for things you wouldn't expect from a programming language (eg. assignment are quite slow).

Crell’s picture

Merge queries are used all over core in a number of key places where we by nature cannot know in advance if a record already exists: cache_set() and variable_set() for instance. The only way to emulate them generically we're already doing in the base implementation. The issue here is whether or not there is a faster/more atomic/better approach for PostgreSQL that is PostgreSQL-specific.

josh waihi’s picture

Well we could always lock the table while we so our select/insert/update in db_merge. That will ensure atomicity. Because we cannot assume that we have access to plpgsql or require it for that matter. @ivan, can that function be written without plpgsql? Also is it possible to do that with prepared statements?

Locking the table seems like the only thing we can do at the moment.

ivansb@drupal.org’s picture

I'm sorry I didn't check it properly where db_merge/upsert is used... but as Crell pointed out it is used in cache... and locking the table will cause a lot of contention.

We could see if the places where db_merge is used are subject to the same problem that actually require the loop in the pgplsql code.

Still plain sql functions don't have control logic.
So even the simpler
begin;
update ...
insert ...
commit;
is going to fail

Writing triggers is going to be a pain.

Considering what we were doing with {blocks} in previous D versions... atomicity has never been a priority.
But while most places can still miss atomicity we didn't have in the past locking cache is going to cause a lot of pain.
Well to be honest I'm actually surprised of the proliferation of db_merge once I checked.

Crell’s picture

Pretty much everywhere we're now using a merge we used to be doing one of the following:

- Update, oh nothing was actually updated so insert. (Doesn't work in PDO for various subtle reasons, and is not atomic.)

- Is there already a value? Set it to 1. Else set it to value + 1. Also not atomic.

So what we're replacing wasn't atomic to begin with. It's now atomic in MySQL, but not in Postgres or SQLite. If we can make it atomic, great. How to do that, I defer to Josh as I don't actually use postgres. :-)

josh waihi’s picture

@Crell, this creates a problem, If we can't archive atomicity for PostgreSQL and SQLite, then should we be using this function at all? 2/3 of our database support essentially won't support a feature. I'm interested in how MySQL is atomic, the syntax maybe one command but behind it must be doing more.

Crell’s picture

MySQL's implementation is atomic: http://lists.mysql.com/internals/33850 (I'm assuming that a MySQL AB engineer is going to be correct on that front.)

As I said above, the equivalent logic in Drupal 6 was not atomic on any database. In Drupal 7, the base implementation is non-atomic but the MySQL override is. If we can give PostgreSQL and SQLite their own custom implementations to make them atomic, great, but if not then they're at least no worse off than we were in Drupal 6.

It's a logical operation that we *must* have, for various systems to work. The API is there and is stable. We just need to make the implementation as good as we can make it.

josh waihi’s picture

Category: task » bug
Priority: Normal » Critical
Status: Active » Needs work

based on my research from #710926: Make simpletest concurrency work for PostgreSQL, the default implementation of db_merge (which PostgreSQL uses) isn't atomic.

I was thinking it would be easier to just try the insert, then upon the duplicate key violation, update the record instead. However, as we know this will destroy any transaction we're in and because Drupal doesn't properly layer transactions, we couldn't rollback just that failed insert.

However if we commited #669794: Use savepoints for nested transactions then we could do exactly that and make the default method faster and atomic. Two birds with one stone and a pretty neat implementation for transactions too!

andypost’s picture

As #669794: Use savepoints for nested transactions commited maybe it's time to merge this issue with #715108: Make Merge queries more consistent and robust because current mysql implementation is not atomic #7 [#715108]

catch’s picture

Status: Needs work » Closed (duplicate)

Marking as duplicate per andypost.

chx’s picture

Status: Closed (duplicate) » Needs work

On which planet postgresql and mysql issues are duplicate?

catch’s picture

Status: Needs work » Postponed

When they're fixed by the same issue?

Marking postponed instead, if for someone reason it's not fixed in the other issue, we can put this back to CNW.

Crell’s picture

Status: Postponed » Closed (duplicate)
Issue tags: -Performance

The other issue has evolved to be an overall cleanup of Merge queries so that the default implementation is no longer a lame placeholder, which means Postgres doesn't need an alternate implementation in the first place.

Yay, one less critical in the queue!