It is possible that i am missing the point as I cant find any discussions on the following issue:

On both field_sql_storage_field_storage_write() & hook_field_storage_write() we have the following comment

// Delete and insert, rather than update, in case a value was added.

The code (for update operations) then deletes the field and the revision field and inserts the field and revision field

This to my possibly ignorant eyes looks like incredibly inefficient. I would expect to see the following logic:
1) Query existing value for the field
2) If doesn't exist: Insert field and field revision.
3) If exists and value is different: Update field and field revision.

This will have two improvements:
1) Write to the database only if value changed
2) Select + Update are much faster then Delete + Insert

Some of the benefits on database performance I can see are:

  • Less database writes (insert/update)
  • Less Fragmentation by updating instead of delete+insert
  • Less UNDO and REDO info sql needs to handle
  • Faster transactions and less chance of deadlocks
  • Possibly less database index updates

I am no export on the Entity/Field API so if I have missed something significant my apologies, however if it is a valid point I would be happy to look into it further and possibly offer a patch

Comments

I totally agree with you here (and it is an issue/feature of Drupal 7.x too) But there may be some history behind this? On the face of it it does seem very inefficient and very pessimistic - a comparison and update or insert as you suggest seems preferable and far more efficient.
We have experienced deadlocks from this process on a dev site under heavy test load where products on a commerce site were being bulk updated.
It would be good to get this reviewed for 8. and any fix/change should be portable back to 7 as the code is largely the same.

Issue tags:+Performance

adding Performance tag to issue

This looks strange indeed.

Since the code does not state why it is done that way, it is safe to assume that there is no super-fundamental reason for that, and it's just that one developer thought it would make more sense to do it this way at some point. I agree that deleting and re-inserting data like this is definitely not a good idea for SQL backends.

AFAICS, the new code would have to query existing values first, compare them against new values, and lastly, perform the necessary insert/update + delete queries.

That said, one could possibly even attempt to perform the comparison based on the original field values in $entity->original - although that might involve some risks of race conditions, not sure.

Anyone up for writing a patch?

would be happy to give it a go

Until we decide to go to READ COMMITTED instead of READ REPEATABLE as our default transaction isolation, trying to do incremental modification is just going to increase our concurrency issues and race conditions. I would not recommend trying to tackle this now.

In MySQL driver we can use REPLACE instead UPDATE or INSERT.

Thanks @Damien Tournoud that makes sense as READ REPEATABLE gives no way of checking if data changed outside the transaction. Looks to me like the cost of READ REPEATABLE is incredibly high compared to the gains.

@andyceo thanks for the tip not sure if REPLACE solves the issue as it the same as a Delete + update and as far as I understand may be a bit more expensive as it will try an Insert before the delete, and that will fail on all entity updates.

REPLACE solves the issue as it the same as a Delete + update and as far as I understand may be a bit more expensive as it will try an Insert before the delete

Database implementers wouldn't have written it if there wasn't a benefit. This article suggests REPLACE over UPDATE in the case where there is no existing record.

#1650930: Use READ COMMITTED for MySQL transactions suggests a lower isolation level which I agree could have benefits in some cases.

drupal_write_record seems to be the current implementation for D8 so lets get it right.

So let's mock up a test case for performance, and try out some options like REPLACE and transaction isolation levels, and finding any bulk delete with large replacements (like #2229013-6: PDOException: SQLSTATE[HY000]: General error: 1030 Got error -1 from storage engine: TRUNCATE {cache_field} ; Array ( ) in cache).

Issue tags:+database cache

I suggest we explode the MergeQuery implementation for both PostgresSQL (using MERGE queries) and MySQL (using REPLACE INTO) queries and leave the default as-is for SQLite.

This will gain in both performance an reliability.

I'm writing the mysql implementation now. It falls back to the QueryMerge object if a REPLACE isn't possible.

I can't see where postgres merge is implemented in the main docs - this is the closest I've found to a syntax.

Issue tags:+Needs backport to 7.x

This should be backported to Drupal 7 as well.

lets wrap the postgresql implementation in a REPEATABLE-READ isolation if we can't find a MERGE implementation (#2236395: extended db_transaction / Database::Connection::startTransaction to include isolation level). sqlite is already serializable by default but it wouldn't hurt to do this the same in case someone lowers it to READ-UNCOMMITTED

Assigned:Unassigned» danblack

From my own tests using DELETE+INSERT is an huge mistake, inducing a lot of deadlocks.
Taking a mySQL server 5.5, opening 2 transactions, make a delete on rowid=1 on the first one, then delete on rowid=2 on the second one, and then try to insert rowid 1 on the first and rowid2 on the second, you have an insert deadlock.
MySQL adds some table level locks after the delete queries. Without theses deletes no insert deadlocks. tested on READ COMMITTED and REPEATABLE READS. Of course REPEATABLE READS could add some gap locking things which could make some deadlocks. But here, even in READ COMMITTED mode you have a deadlock if two transactions are trying to save a field (arg!).. READ COMMITED mode is immune to that bug after MySQL 5.1 but I made the test on a 5.0 and had a crash.
This happens a lot with Drupal commerce and others Drupal usage where write operations are concurrent and numerous.

So, now that the merge queries deadlocks are fixed, why not using merge queries to update or insert fields?

Test:

create table t1 ( id varchar(128) NOT NULL DEFAULT ''), id2 varchar(128) NOT NULL default '', PRIMARY KEY (id,id2));

--> session 1
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
DELETE FROM t1 WHERE id=1;
INSERT INTO t1 (id,id2) VALUES (1,1);

In parallel, session 2:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
DELETE FROM t1 WHERE id=2;
INSERT INTO t1 (id,id2) VALUES (2,2);

One of the transaction will get a deadlock, without the delete everything is fine.

Version:8.x-dev» 7.x-dev
Issue tags:-database cache, -Needs backport to 7.x

field_sql_storage_field_storage_write() & hook_field_storage_write() are only in D7.

yes, reimplementing with merge sounds like a good idea.

In comment 12 my commitment was to use REPLACE in the db_merge for D8 which is actually #2239071: use REPLACE as part of mysql db_merge

I'll leave it for someone so port these two functions to use db_merge.

Version:7.x-dev» 8.x-dev
Category:Feature request» Task
Issue tags:+Entity Field API, +Entity system

ContentEntityDatabaseStorage::doSaveFieldItems() still DELETEs + INSERTs upon an entity update in D8.

Agreed that this needs to be fixed in D8 also, but please make this happen for Drupal 7 first since there are a lot of real sites in real production environements suffering from this. This is kind of production-critical and this should be enough to bypass the "D8 first" rule.