Optimize Merge queries in Postgres driver [#301036]

Comment #1

linuxpoet commented 20 September 2008 at 18:24

Can you define the parameters for the patch? What are you going to pass the function? What do you expect it to return?

Log in or register to post comments

Comment #2

Crell commented 20 September 2008 at 18:43

The API to the merge query is already defined. key(), fields(), etc. I don't know enough about Postgres to say how such a Postgres function should work, but it needs to mimic the behavior of the method calls to the object.

Log in or register to post comments

Comment #3

Crell commented 7 October 2008 at 06:14

Merge queries are now documented here, too: http://drupal.org/node/310085

Log in or register to post comments

Comment #4

markus_petrux commented 7 October 2008 at 06:38

At the bottom of the following page there is an example that might be a start for this.

http://www.postgresql.org/docs/current/static/plpgsql-control-structures...

And here's how it has been solved in the MySQL driver:

http://drupal.org/node/301501

HTH+ :)

Log in or register to post comments

Comment #5

markus_petrux commented 7 October 2008 at 06:53

A few more references, taken from: http://wiki.postgresql.org/wiki/Todo

MERGE is typically used to merge two tables. REPLACE or UPSERT command does UPDATE, or on failure, INSERT. This is similar to UPDATE, then for unmatched rows, INSERT. Whether concurrent access allows modifications which could cause row loss is implementation independent. To implement this cleanly requires that the table have a unique index so duplicate checking can be easily performed. It is possible to do it without a unique index if we require the user to LOCK the table before the MERGE.

someone working to add merge?

MERGE vs REPLACE

MERGE SQL Statement

MERGE Specification

Internal design of MERGE, with Rules

Another possible reference from the PostgreSQL manual:

http://www.postgresql.org/docs/7.4/interactive/rules-update.html

Log in or register to post comments

Comment #6

mgifford

he/him

English

commented 16 September 2009 at 11:12

Is there someone with the expertise to write and/or review a patch here? Seems like this could provide significant performance improvements.

Log in or register to post comments

Comment #7

mgifford

he/him

English

commented 16 September 2009 at 11:13

Issue tags:

+Performance

adding performance tag

Log in or register to post comments

Comment #8

mgifford

he/him

English

commented 16 September 2009 at 11:14

Issue tags:

+PostgreSQL

and also PostgresSQL sorry.

Log in or register to post comments

Comment #9

damien tournoud commented 17 September 2009 at 00:25

I twitted about that this morning, and @jwpye suggested to use http://mbk.projects.postgresql.org/. It is not GPL but apparently covers our use case.

Log in or register to post comments

Comment #10

damien tournoud commented 17 September 2009 at 00:31

Hum, scratch that, it does only work with simple conditions.

Log in or register to post comments

Comment #11

Crell commented 30 October 2009 at 03:24

Component:	database system	» postgresql database
Issue tags:	-PostgreSQL

Refiling...

Log in or register to post comments

Comment #12

josh waihi commented 30 October 2009 at 04:14

will a simple subquery do?

Log in or register to post comments

Comment #13

Crell commented 30 October 2009 at 06:27

I have no idea, will it? :-) Right now, Postgres uses the generic merge query which is non-atomic, because it's a SELECT then if-else for INSERT or UPDATE. I don't know enough Postgres to say how best to make it atomic. If it works and doesn't suck performance-wise, I'm cool with it. :-)

If it can't be done, well, OK, it can't be done. But I'd hate to leave that situation in there if we don't have to.

Log in or register to post comments

Comment #14

josh waihi commented 16 November 2009 at 05:26

:S err.... looks like not. So a merge query for PostgreSQL (and SQLite?) will do:

BEGIN;
SELECT .....
INSERT ....
-- OR
UPDATE ....
COMMIT;

4 queries to the database. But I don't see a better solution in 8.3. The fact that there is a transaction makes it atomic though. So I'm happy with the current implementation.

Log in or register to post comments

Comment #15

Crell commented 16 November 2009 at 16:26

Version:

7.x-dev

» 8.x-dev

Hm, good point. So it is atomic, just not the fastest. I'm OK with that.

I'm bumping this to Drupal 8, in case someone wants to look into it again there, but for now it's not a blocker for D7.

Log in or register to post comments

Comment #16

andypost

he/him

Russian

commented 21 November 2009 at 15:00

subscribe

Log in or register to post comments

Comment #17

andypost

he/him

Russian

commented 23 November 2009 at 17:43

Version:

8.x-dev

» 7.x-dev

Right now db_merge() does not work for postgres at all
Why not just build a query like

$BODY$
DECLARE sql text;
DECLARE sql2 text;
BEGIN
  sql := 'UPDATE '|| quote_ident($1) ||' SET field1 = ' || $3 || ', field2 = ' || quote_literal($4) || ' WHERE PrimaryKey = ' || quote_literal($2) || ';';
  EXECUTE sql;
  BEGIN
    sql2 := 'INSERT INTO '|| quote_ident($1) ||' (PrimaryKey, field1, field2) VALUES ('''|| $2 ||''' , '|| $3 ||' , '''|| $4 ||''');';
    EXECUTE sql2;
    EXCEPTION WHEN unique_violation THEN
  END;
END;
$BODY$

Log in or register to post comments

Comment #18

Crell commented 24 November 2009 at 00:15

If db_merge() doesn't work, how is Postgres even functional at this point? :-) I dispute that claim.

I have no idea if that procedure would work or not. Josh?

Log in or register to post comments

Comment #19

andypost

he/him

Russian

commented 24 November 2009 at 08:25

Postgres is broken now, details #636632: Decide on multiple database testing #13-14

Log in or register to post comments

Comment #20

josh waihi commented 18 January 2010 at 21:16

It doesn't seem to me that PostgreSQL is broken - well, at least not in this area. @andypost is this still an issue for you?

Log in or register to post comments

Comment #21

andypost

he/him

Russian

commented 19 January 2010 at 20:59

@Josh I've tested today and everything works fine - setup (minimal) and testing (takes a lot of time). There's some broken tests for fields.

Log in or register to post comments

Comment #22

josh waihi commented 21 January 2010 at 20:31

ah yeah, there are a few exceptions and failures here and there. I need to get onto them. But if merger query isn't an issue, can we close this?

Log in or register to post comments

Comment #23

Crell commented 21 January 2010 at 22:23

Josh: Your call whether we try to optimize the query in D7 or bump this issue to D8. As long as it works (the transaction *should* make it atomic, I think) I'm comfortable with what it is now.

Log in or register to post comments

Comment #24

andypost

he/him

Russian

commented 22 January 2010 at 13:41

why not put this in stored procedure? this is a "thin" place which maybe caused a poor performance of pgsql

Log in or register to post comments

Comment #25

ivansb@drupal.org

Italian

Milano

commented 25 January 2010 at 22:42

The simple begin update insert commit; is not going to work if you've several concurrent one running.
The only safe way is as pointed out in
http://drupal.org/node/301036#comment-1047050 (pg docs)
So a solution could be inspired to:
http://drupal.org/node/301036#comment-2050722
+ what's suggested by the pg manual
or what's suggested in pg manual + some php [1]
or what's suggested in the manual + execute (dynamic statement
built in plpgsql)

The loop is necessary to prevent 2 merges jumping one on the other.

Is MergeQuery used somewhere?
I'd say you need merging when you don't know if things are in or not (this is bad) and this mostly happen when you're loading external data on which you don't have too much control.
Merging is very expensive if done properly. I'm not aware about how MySQL does it but a) it does it in the wrong way or b) it does it slowly under concurrent execution.

I'd say that if MySQL does it properly the only things that it offer is syntactical sugar. So I wouldn't be that afraid of the fact that pg solution is "ugly/slow".

I'd promote too much the use of merge in core/contrib. I'd say it's only suited to importing external data.
Forgiving if you put stuff in your DB... is not good housekeeping in the first place.

Currently I'm trying to write a C extension to postgres for text similarity search, so I'm a bit busy.

But I think that once we chose the way it shouldn't be that hard to build up a solution.

eg.

You dynamically write the insert and the update statements in php and then modify the pg doc solution as:
[1]

CREATE FUNCTION merge_db(insert TEXT, update TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
	EXECUTE update;
        IF found THEN
            RETURN;
        END IF;

        BEGIN
            EXECUTE insert;
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- do nothing
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

Unfortunately sql functions don't have control structure/exception handling AND plpgsql has to be installed and can be installed just by superuser.
I wouldn't see it as a stopper since pg users are used to other pains when dealing with Drupal.
Surely the need to install plpgsql will have to be advertised in the INSTALL file for pg.

dynamic statement could be built in plpgsql in spite of being build in php... but php should run faster for text processing... and actually plpgsql is "rather slow" for things you wouldn't expect from a programming language (eg. assignment are quite slow).

Log in or register to post comments

Comment #26

Crell commented 25 January 2010 at 22:56

Merge queries are used all over core in a number of key places where we by nature cannot know in advance if a record already exists: cache_set() and variable_set() for instance. The only way to emulate them generically we're already doing in the base implementation. The issue here is whether or not there is a faster/more atomic/better approach for PostgreSQL that is PostgreSQL-specific.

Log in or register to post comments

Comment #27

josh waihi commented 26 January 2010 at 01:31

Well we could always lock the table while we so our select/insert/update in db_merge. That will ensure atomicity. Because we cannot assume that we have access to plpgsql or require it for that matter. @ivan, can that function be written without plpgsql? Also is it possible to do that with prepared statements?

Locking the table seems like the only thing we can do at the moment.

Log in or register to post comments

Comment #28

ivansb@drupal.org

Italian

Milano

commented 26 January 2010 at 03:37

I'm sorry I didn't check it properly where db_merge/upsert is used... but as Crell pointed out it is used in cache... and locking the table will cause a lot of contention.

We could see if the places where db_merge is used are subject to the same problem that actually require the loop in the pgplsql code.

Still plain sql functions don't have control logic.
So even the simpler
begin;
update ...
insert ...
commit;
is going to fail

Writing triggers is going to be a pain.

Considering what we were doing with {blocks} in previous D versions... atomicity has never been a priority.
But while most places can still miss atomicity we didn't have in the past locking cache is going to cause a lot of pain.
Well to be honest I'm actually surprised of the proliferation of db_merge once I checked.

Log in or register to post comments

Comment #29

Crell commented 26 January 2010 at 04:29

Pretty much everywhere we're now using a merge we used to be doing one of the following:

- Update, oh nothing was actually updated so insert. (Doesn't work in PDO for various subtle reasons, and is not atomic.)

- Is there already a value? Set it to 1. Else set it to value + 1. Also not atomic.

So what we're replacing wasn't atomic to begin with. It's now atomic in MySQL, but not in Postgres or SQLite. If we can make it atomic, great. How to do that, I defer to Josh as I don't actually use postgres. :-)

Log in or register to post comments

Comment #30

josh waihi commented 26 January 2010 at 10:08

@Crell, this creates a problem, If we can't archive atomicity for PostgreSQL and SQLite, then should we be using this function at all? 2/3 of our database support essentially won't support a feature. I'm interested in how MySQL is atomic, the syntax maybe one command but behind it must be doing more.

Log in or register to post comments

Comment #31

Crell commented 26 January 2010 at 16:25

MySQL's implementation is atomic: http://lists.mysql.com/internals/33850 (I'm assuming that a MySQL AB engineer is going to be correct on that front.)

As I said above, the equivalent logic in Drupal 6 was not atomic on any database. In Drupal 7, the base implementation is non-atomic but the MySQL override is. If we can give PostgreSQL and SQLite their own custom implementations to make them atomic, great, but if not then they're at least no worse off than we were in Drupal 6.

It's a logical operation that we *must* have, for various systems to work. The API is there and is stable. We just need to make the implementation as good as we can make it.

Log in or register to post comments

Comment #32

josh waihi commented 11 February 2010 at 03:54

Category:	task	» bug
Priority:	Normal	» Critical
Status:	Active	» Needs work

based on my research from #710926: Make simpletest concurrency work for PostgreSQL, the default implementation of db_merge (which PostgreSQL uses) isn't atomic.

I was thinking it would be easier to just try the insert, then upon the duplicate key violation, update the record instead. However, as we know this will destroy any transaction we're in and because Drupal doesn't properly layer transactions, we couldn't rollback just that failed insert.

However if we commited #669794: Use savepoints for nested transactions then we could do exactly that and make the default method faster and atomic. Two birds with one stone and a pretty neat implementation for transactions too!

Log in or register to post comments

Comment #33

andypost

he/him

Russian

commented 26 March 2010 at 18:44

As #669794: Use savepoints for nested transactions commited maybe it's time to merge this issue with #715108: Make Merge queries more consistent and robust because current mysql implementation is not atomic #7 [#715108]

Log in or register to post comments

Comment #34

catch

he/him

English

commented 22 April 2010 at 06:54

Status:

Needs work

» Closed (duplicate)

Marking as duplicate per andypost.

Log in or register to post comments

Comment #35

chx commented 1 May 2010 at 03:29

Status:

Closed (duplicate)

» Needs work

On which planet postgresql and mysql issues are duplicate?

Log in or register to post comments

Comment #36

catch

he/him

English

commented 14 May 2010 at 15:13

Status:

Needs work

» Postponed

When they're fixed by the same issue?

Marking postponed instead, if for someone reason it's not fixed in the other issue, we can put this back to CNW.

Log in or register to post comments

Comment #37

Crell commented 16 June 2010 at 08:34

Status:	Postponed	» Closed (duplicate)
Issue tags:	-Performance

The other issue has evolved to be an overall cleanup of Merge queries so that the default implementation is no longer a lame placeholder, which means Postgres doesn't need an alternate implementation in the first place.

Yay, one less critical in the queue!

Log in or register to post comments

Optimize Merge queries in Postgres driver

Comments