Updated: Comment #44
Problem/Motivation
When updating a database record in D7, according to the documentation UpdateQuery->execute() returned "the number of rows affected by the update", relying on PDO's return value.
It turned out that PDO's number of affected rows was highly dependent on the database engine and therefore not reliable:
In line with the ANSI SQL standard, both PostgreSQL and SQLite define "affected rows" as all rows matched by the query.
For MySQL, the definition however depends: While MyISAM always returned the number of rows that really had to be changed, on InnoDB it used to be the same, but starting with MySQL 5.1 this changed to returning all rows matched by the query.
This has been reverted with MySQL #29157, so InnoDB again matches MyISAM starting with MySQL versions 5.1.24 and 5.5.
With MySQL being our primary database platform, in D7 we implemented a workaround for SQLite but missed PostgreSQL and InnoDB 5.1.0 - 5.1.24. Finally the affected rows are only checked in our MySQL-based tests, but not otherwise used in core.
Proposed resolution
For Drupal 8, we decided to use the PDO::MYSQL_CLIENT_FOUND_ROWS flag which was introduced in 5.3 with PHP #44135, but is only consistently available only from PHP 5.3.5 with PHP #53425, which we're anyway requiring in D8.
This flag ensures consistency of MySQL's behaviour with PostgreSQL and SQLite, without relying on query rewrites or other workarounds.
Remaining tasks
Decide on a backport to Drupal 7 ensuring consistency between all database systems.
User interface changes
None.
API changes
In Drupal 8, update queries always return the number of rows matched by the query, which includes rows that didn't have to be updated because they're values wouldn't have changed.
Related Issues
#1266572: Workaround in UpdateQuery_sqlite for affected rows count causes certain updates to be suppressed
#1542186: PHP 5.4 "Illegal string offset" warning when install
Original report by simg
I've been having problems with the comment_notify module due to the fact that db_affect_rows function returns 0 even if there is an existing row in the comment_notify table for a given uid. This is because mysql returns 0 records on UPDATE if the new values are the same as the existing values.
ie The update statement "update comment_notify_user_settings set node_notify = 0 where uid = 1;"
returns
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0
If node_notify is already 0 (even though a uid = 1 exists !!)
The mysqli_info() function returns the 0 rows affected and not the rows matched = 1.
This problem is described here.
http://www.php.net/manual/en/function.mysql-info.php
I have fixed this by modifying the db_affect_rows() function in includes/database.mysqli.inc with the following code
(includes/database.mysql.inc requires a similar fix but needs to call mysql_info)
function db_affected_rows() {
global $active_db; // mysqli connection resource
$info_str = mysqli_info($active_db);
if (ereg("Records: ([0-9]*)", $info_str, $count) == false) {
ereg("Rows matched: ([0-9]*)", $info_str, $count);
}
return $count;
}
/* old code */
/* this function doesn't work due bugs in mysql (bugs #41283 and #41285) which where mysql doesn't count row updates if the values haven't changed
function db_affected_rows() {
global $active_db; // mysqli connection resource
return mysqli_affected_rows($active_db);
}*/
Comment | File | Size | Author |
---|---|---|---|
#52 | drupal-affected-rows-behavior-805858-51.patch | 5.7 KB | Darren Oh |
#52 | interdiff-805858-45-51.txt | 1.14 KB | Darren Oh |
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis is by design. Deal with it instead of hacking around it.
Comment #2
simg CreditAttribution: simg commentedEr, how can it be by design ??
The underlying bug in mysql is acknowledged.
And I have dealt with it. Yes, the fix is a little hacky but what is the alternative ?
Attached are patches that to fix database.mysqli.inc and database.mysql.inc
The following pattern which seems to be in use in a number of places throws a warning message - seems to be the cause of a fair few issues in the queue.
db_query('UPDATE {comment_notify_user_settings} SET node_notify = %d, comment_notify = %d WHERE uid = %d', $edit['node_notify_mailalert'], $edit['comment_notify_mailalert'], $user->uid);
if (!db_affected_rows()) {
db_query('INSERT INTO {comment_notify_user_settings} (uid, node_notify, comment_notify) VALUES (%d, %d, %d)', $user->uid, $edit['node_notify_mailalert'], $edit['comment_notify_mailalert']);
}
Comment #3
Damien Tournoud CreditAttribution: Damien Tournoud commentedThere is no bug, the name of the function is
db_affected_rows()
, notdb_matched_rows()
.The correct pattern is shown in
variable_set()
:Notice the
@
before the INSERT query.Obviously this is sub-optimal, that's why we have implemented proper MERGE queries in Drupal 7.
Comment #4
simg CreditAttribution: simg commentedOK, thanks for the info.
Strangely, I can't find any reference to what the @ actually does?
Not that it really matters but shouldn't db_affected_rows() be database independent ? The difference between "affected rows" and "matched rows" seems like a mysql specific concept and is surely a bug in mysql ?
Comment #5
Damien Tournoud CreditAttribution: Damien Tournoud commentedIt's not at all a bug. Most databases return the number of affected rows as a result of UPDATE and DELETE queries.
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedAnd yes, our db_affected_rows() has the same behavior of all the databases we support. That behavior is correctly documented as:
Comment #7
simg CreditAttribution: simg commentedLooks like this issue was fixed quite nicely in 2005 but I guess the error has some how crept back in ...
http://drupal.org/node/22786
Comment #8
simg CreditAttribution: simg commentedYes, sure. But generally databases don't differentiate between rows changed and rows matched.
Both MS SQL Server and PostgreSQL return the rows matched value whether or not they internally perform a physical write.
Rows matched is the value that should be used by the mysql implementation of db_affect_rows() especially given how it's being used ...
Comment #9
Eric_A CreditAttribution: Eric_A commentedFrom #638702: insertion errors
I too am getting this since my lucid upgrade, but only with mysqli. mysql seems to work as before. Perhaps the flags passed to mysqli_real_connect() are not respected anymore?
From #22786: db_affected_rows code cleanup
Now that db_affected_rows() returns the number of rows matched instead of only changed we can get rid of the hacks that worked around this.
The referenced commit shows the flags mentioned by cdale: http://drupalcode.org/viewvc/drupal/drupal/includes/database.mysql.inc?r...
It would be nice to have a bigger list of what happened since the 2005 commit by Dries...
Comment #10
Eric_A CreditAttribution: Eric_A commentedWait a minute... The 2005 code passes integer 2 when connecting, and then immediately does a mysql_db_connect() without passing the connection and thus undoing it all?
EDIT: Sigh, never mind, this is not an issue here. If the link identifier is not specified, the last link opened by mysql_connect() is assumed.
Comment #11
AlexisWilke CreditAttribution: AlexisWilke commentedDamien,
There is a bug in MySQL. And since most users use that silly database, providing a fix makes sense.
The documentation is WRONG. The db_affected_rows() function is expected to return the right number: the number of rows that the UPDATE matched. All the functions that do an UPDATE + INSERT are now broken in all of Drupal. Feel free to search and you will see that all the modules everywhere have this issue popping up.
MySQL sucks and I tell my users to switch to PostgreSQL, a true database, but that's not going to happen.
Somehow, the flag (2 or MYSQLI_CLIENT_FOUND_ROWS) was used to make sure the right result was being returned. That seems to not work any more with certain versions of MySQL. The problem is that most people cannot just choose which version to run. Their webhost has chosen for them.
Therefore, a fix in the function that does it wrong is the right solution, even if you do not like it.
Asking 1,000 people to change their code because of a bug in one database system, seems a bit much to me.
The use of the @ for the variable is because you are very likely to have multiple users hit that line of code and Core avoids any locking of the database. The result is pretty much the same, except that locking prevents anyone from reading anything from that table.
For all those tables that pretty much on the Admin will be able to change, the @ should never be necessary. Now if you think that an UPDATE + db_affected_rows() + INSERT is wrong, let me know how you'd do that procedure otherwise?!
Just like it was addressed a while back (circa 2005), it needs to be addressed again (circa 2011?)
Thank you.
Alexis Wilke
Comment #12
Damien Tournoud CreditAttribution: Damien Tournoud commentedBug of MySQL or not, we will not change that in Drupal 6, which is long frozen. For Drupal 7, PDO only supports the "modified" semantic in MySQL (at least on PHP 5.2). We picked this one as the baseline (see
UpdateQuery_sqlite
for the impact of this).Back to by design.
Comment #13
teliseo CreditAttribution: teliseo commentedDespite Damien saying that the “rows changed” behavior is by design, it is clear that the preferred behavior is for db_affected_rows() to return the number of rows matched, as evidenced by the MYSQLI_CLIENT_FOUND_ROWS flag being passed to mysqli_real_connect() by Drupal 6. This flag should cause the “matched” behavior, but it does not in many cases due to an apparent bug in the MySQL PHP extension library (mysqli.so). The relevant code, in php-5.3.3/ext/mysqli/mysqli_nonapi.c:
Note that if the build option is defined to use the MySQL Native Driver, mysqlnd_connect() is properly called with the passed in flags (including MYSQLI_CLIENT_FOUND_ROWS from Drupal). In this case db_affected_rows() after an UPDATE will return the number of matched rows, not the number of changed rows. On platforms where MySQL is built with this option, things work as intended.
However, on many platforms, including Fedora/RHEL/CentOS up through the current Fedora 14, MySQL is built without the MYSQLI_USE_MYSQLND option set, causing mysql_real_connect() to instead be called, ignoring the passed in flags, and therefore causing db_affected_rows() after an UPDATE to return the number of changed rows.
The obvious solution is to fix this code:
I don’t think that the original code was intentional—it seems nothing but an oversight in the code revision, shown around line 215 of the relevant code revision. I have submitted this patch to php.net.
The better performing solution would be for PHP to call the MySQL Native Driver, but this is potentially disruptive due to an incompatibility with legacy password fields, which Fedora/RH uses by default.
I know that fixing the MySQL PHP library on your webhost is not an option for many, but I thought it important to document the root cause, and the apparent reason why different people experience different behavior.
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedLet's quickly discuss what to do here before RC.
The way I understand this:
Because MYSQL_ATTR_FOUND_ROWS is not supported by PHP 5.3 *and* because the number of affected rows cannot be reliably returned by MySQL, I fear that we are going to have to:
UPDATE table SET a = "val1" WHERE b = "val2"
intoUPDATE table SET a = "val1" WHERE b = "val2" AND a <> "val1"
). The code to do that already exists in the SQLite driver.Comment #15
carlos8f CreditAttribution: carlos8f commentedAlthough the matched row count might be preferred over the changed count in some cases, it doesn't look technically feasible to ensure such behavior across all db/php combinations.
I also don't see how rewriting the queries improves the situation much. It just reduces the matched row count and the workaround in #3 would still be necessary.
So always returning affected rows rather than attempting to get matched rows seems like the best option.
Comment #16
chx CreditAttribution: chx commentedUse a merge query. There is nothing else we can do. The return value of DML queries is no way in hell cross db compatible and thanks to PDO it's broken on MySQL too.
Comment #17
Crell CreditAttribution: Crell commentedThis is why Merge queries were added in the first place. Using Update in that fashion doesn't work.
Comment #18
Crell CreditAttribution: Crell commentedTempted to close again, but it's definitely not critical.
Comment #19
Steven Jones CreditAttribution: Steven Jones commentedThere is a test for this in core, which is failing on PostgreSQL currently,
basically it updates a field to be field * field (i.e. squares the value) on a table that has one row with the field == 1, which causes the number of rows affected to differ on MySQL and PostgreSQL, causing the test to fail.
Looks like we might need to take the sqlite specific code and apply it to the postgreSQL implementation too.
This is possibly a different issue, but I wanted to flag it up here.
Comment #20
Steven Jones CreditAttribution: Steven Jones commentedCorrect tag
Comment #21
Steven Jones CreditAttribution: Steven Jones commentedHijacking this issue based on comment #14 for discussion.
From that comment it seems we need to do:
Does that mean that we need to pop the code from the SQLite driver into the parent driver, and then MySQL is a special case that doesn't need it? Or do we put the code in a utility method in the parent driver, and then call that method in all the drivers except MySQL?
Comment #22
Damien Tournoud CreditAttribution: Damien Tournoud commentedAll the drivers need that. Even on MySQL the count is not guaranteed to be exactly the matched rows.
Comment #23
marcingy CreditAttribution: marcingy commentedPushing back down to normal based on #18.
Comment #24
Steven Jones CreditAttribution: Steven Jones commented@marcingy This was marked as 'major' because tests are failing in a secondary environment because of this issue. Has that been addressed in some other issue?
Comment #25
catchLet's have someone run the tests and report which one fails rather than leaving this as major 'just in case'.
Comment #26
posulliv CreditAttribution: posulliv commentedI ran the test suite with PostgreSQL 9.1 and the only test that failed was testExpressionUpdate in UpdateTest.php
Like was mentioned in comment #19, this test fails because there is no condition specified for the UPDATE. So PostgreSQL will say all rows in the table matched whereas MySQL will say all rows matched but only 3 were affected.
Not sure about implementing the workaround the SQLite driver uses since the PostgreSQL driver uses prepared statements. I'm not too familiar with how to add a condition to an UPDATE statement when the condition is an expression either?
Comment #27
Steven Jones CreditAttribution: Steven Jones commentedHere's a patch that does what #22 asks in D8. Mostly to see what the testbot makes of it.
Comment #28
chemical CreditAttribution: chemical commentedTestbot likes all kind of changed to Postgres and SQLite specific files because it only runs MySQL. Delete the files and most likely all test are still passed.
We need an environment to test both Postgres and SQLite specific changes!
Comment #29
PanchoPatch doesn't apply anymore, but let's first decide on our route to go:
In D8, we're now requiring PHP 5.3.10, so we should be able to set the flag MYSQL_ATTR_FOUND_ROWS.
This should mean that we are now able to consolidate db_affected_rows() behaviour for all db platforms on either the number of matched rows or the number of updated rows, while currently:
- MySQL returns just the changed rows
- Postgres returns all matched rows
- SQLite returns just the changed rows, but only thanks to a workaround (
UPDATE table SET a = "val1" WHERE b = "val2" into UPDATE table SET a = "val1" WHERE b = "val2" AND a <> "val1").
)Our options would be as follows:
A) Return just the changed rows, meaning mean we had to rewrite the UPDATE queries for all engines the way we currently only do for SQLite, because according to #14 and #22, MySQLs changed rows are not reliable anyway.
B) Return all matched rows, meaning we would appropriately set the MYSQL_ATTR_FOUND_ROWS flag on MySQL, remove the SQLite workaround, and be done, with the additional benefit of being in line with most other database platforms around.
The remaining issue with Fedora/RHEL/CentOS (#13) should be fixed with PHP 5.3.5 as well, see bug, changelog.
Is this correct? And if yes, do you agree that we should proceed with option B for D8?
[edit:] And for D7, do you agree we should give option A a try and see how it performs for large updates?
(Adding SQLite tag as well because it might be just as well affected depending on our decision.)
Comment #30
Damien Tournoud CreditAttribution: Damien Tournoud commentedYes, let's proceed with option (b) for Drupal 8.
Comment #31
PanchoAccording to #26, testExpressionUpdate fails on Postgres because of this inconsistency. Can't currently reproduce because D8 doesn't install on Postgres, but this should be still the case, therefore major priority (test failure in a secondary environment).
Comment #32
Pancho@Damien:
Nice, then I'll be preparing a patch.
[edit:]
We're currently having a flag
$queryOptions['sqlite_return_matched_rows']
which will now go away in D8.Now if in D7 we would be rewriting all update queries to consistently return changed rows only, how about implementing something like
$queryOptions['return_changed_rows']
in D8, rewriting the update queries in D8 as well thereby mimicking the (then consistently fixed) pre-D8 behavior?Comment #33
PanchoNot at all surprising - just for our documentation:
The SQLite documentation seems to be slightly misleading:
However, according to SQLite lead Richard Hipp the behavior is indeed by design, calling matched but not physically changed rows "no-op changes":
A bit less misleading is PostgreSQL documentation:
The documentation of the actual method makes it clear though:
So if there was any doubt, we can now rest assured that this is expected behaviour and not some widespread inaccuracy.
Comment #34
PanchoSo here is a patch for D8.
Setting PDO::MYSQL_ATTR_FOUND_ROWS when opening the connection should be fine, even though not relevant to non-update queries.
Also, removing the workaround for SQLite and clarifying documentation.
Finally, I refactored the relevant testExpressionUpdate() leaving in just the actual test of algebraic expressions, and actually doing a better job there. The number of affected rows is moved to a separate testUpdateAffectedRows() where we can focus more on that, including a detailed explanation.
Tested the patch against UpdateTest() on both MySQL and SQLite, and everything worked as it should. While performance didn't seem to improve on SQLite, it didn't degrade on MySQL either.
Comment #35
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis looks really good. Please consider it RTBC if it passes.
Comment #36
PanchoPassed the MySQL tests as well.
Untested on postgreSQL because of #2001350: [meta] Drupal cannot be installed on PostgreSQL. It shouldn't be affected at all, though, because postgreSQL has always been correct, rendering #19 a non-issue, at least for the route we chose.
Don't want to set it RTBC myself though, but maybe someone else wants to do a second review?
Comment #37
Damien Tournoud CreditAttribution: Damien Tournoud commented#34 proves that we are not relying on the old behavior anywhere in core anymore (I highly suspected that, but that's a good proof). The new behavior is well tested. RTBC.
Comment #38
alexpottCommitted d790fb7 and pushed to 8.x. Thanks!
Comment #39
PanchoCool, I'll take care of the change notice.
Now what about D7? We're still inconsistent there, too and can't count on MYSQL_ATTR_FOUND_ROWS, so we can only make it consistent on counting really changed rows.
Rewriting all queries like in patch #27 should mostly cut it. I only think that we can implement the D8 behavior as an option only for PHP 5.3.5. So if contrib wants D8 behaviour be backported to D7, contrib can require PHP 5.3.5.
Comment #40
YesCT CreditAttribution: YesCT commentedwhile waiting on the change record,
https://drupal.org/core-gates#documentation
Comment #41
PanchoUpdated issue summary and added a change notification.
Now moving to D7.
Given that MySQL is our primary db platform, given that it is quite easy to rewrite the query, so it matches the behaviour of MyISAM and most versions of InnoDB, given that this already was done as workaround for SQLite, and given that the tests would pass then without modification, I guess we should use the query rewrite workaround for all db drivers.
On MyISAM we actually don't need the query rewrite, and If it should turn out it adds a performance hit we might investigate differentiating between the two engines. But that should be neither necessary nor worth the pain.
Comment #42
alexpottComment #43
PanchoRefactored patch in #27 for D7.
Removed 'return_matched_rows' flag as it wouldn't be consistent either. If we want it to be consistent, we'd need yet another workaround for PostgreSQL and SQLite, and I don't think that's worth it.
We probably want to test it in all db scenarios and maybe also do some profiling on MySQL. But first let's see what the MySQL bot says.
Comment #43.0
PanchoUpdate issue summary using the template
Comment #44
PanchoNice. To be sure this doesn't hurt performance, we could use some profiling on MySQL.
Adjusting tags. Also added related issue to the issue summary.
Comment #45
PanchoMove workaround to a protected function avoiding code duplication between generic implementation and pgsql driver.
Comment #46
PanchoOops.
Comment #46.0
Panchoadded related issue #1266572: Workaround in UpdateQuery_sqlite for affected rows count causes certain updates to be suppressed
Comment #46.1
Panchoadded another related issue
Comment #47
stefan.r CreditAttribution: stefan.r commented45: d7_return_changed_rows_only_on_update-805858-45.patch queued for re-testing.
Comment #48
stefan.r CreditAttribution: stefan.r commented@Pancho what do we need to move this forward for D7? #45 seems like the way to go, is there anything specific I can profile?
Comment #49
Darren OhAdding the updated fields to the WHERE condition improves MySQL performance by 400% in my tests. This should not be delayed any further.
Comment #51
Darren OhFails on PostgreSQL.
Comment #52
Darren OhComment #53
Darren OhComment #54
Darren OhWorks now.
Comment #57
David_Rothstein CreditAttribution: David_Rothstein commentedFrom the Drupal 8 change notice:
However if I'm reading https://www.drupal.org/requirements/database right we never supported MySQL 5.1.0 - 5.1.24 anyway; if so, there is no actual bug affecting sites that use MySQL here. And then the lowest-risk fix by far would be to just fix this in PostgreSQL specifically... this is pretty low-level stuff to be messing with in a stable release more than we have to.
That however is pretty intriguing :) What specific tests show this kind of performance improvement?
Comment #58
Darren OhThere’s a problem with API module parsing. Something to do with bytea fields. Need to fix that.
Comment #63
poker10 CreditAttribution: poker10 at ActivIT s.r.o. commentedAdding a parent issue.
Btw: Do we really need to "fix" this instead of changing the test for this specific case? Because this is not a bug in the true sence, it is a regular PostgreSQL behavior. Introducing new conditions to all update queries just to workaround this seem like an overkill to me. It can easily introduce additional complexity of queries and reduce their speed.. And that is not what we need. Even if we only apply it to PostgreSQL.
Great example is D8/D9 - here is the behavior to report all matched rows (for all database drivers). It does not make sense to make D7 postgreSQL driver to report differently.
Comment #64
mcdruidI think we can close this issue and carry on any D7 work in #3264471: DatabaseUpdateTestCase::testExpressionUpdate assertion on number of affected rows fails in PostgreSQL. I'll try to ensure credit is carried across as we're definitely using the D7 patches that started off here.
Not sure if that means it'd be best to move this back to D8 and mark as Fixed based on the commit to D8 in #38?
I'll do that for now, but if something else makes more sense that's fine.