Reviewing the DatabaseBackend, it's already full of try-catch for the db not being there. Let's finish this job and nuke installbackend.

CommentFileSizeAuthor
#83 install-backend-exceptions-1792536-83.patch31.54 KBBerdir
#83 install-backend-exceptions-1792536-83-interdiff.txt864 bytesBerdir
#79 install-backend-exceptions-1792536-79.patch31.59 KBDavid_Rothstein
#79 interdiff-78-79.txt653 bytesDavid_Rothstein
#78 install-backend-exceptions-1792536-78-interdiff.txt1.51 KBBerdir
#78 install-backend-exceptions-1792536-78.patch31.4 KBBerdir
#78 install-backend-exceptions-1792536-78.patch31.4 KBBerdir
#78 install-backend-exceptions-1792536-78.patch31.4 KBBerdir
#75 drupal8.cache-install.75.patch31.53 KBsun
#72 install-backend-exceptions-1792536-72.patch30.7 KBBerdir
#72 install-backend-exceptions-1792536-72-interdiff.txt2.5 KBBerdir
#66 install-backend-exceptions-1792536-65.patch32.65 KBBerdir
#66 install-backend-exceptions-1792536-65-interdiff.txt478 bytesBerdir
#61 install-backend-exceptions-1792536-61.patch32.75 KBBerdir
#61 install-backend-exceptions-1792536-61-interdiff.txt6.02 KBBerdir
#55 1792536_55.patch31.66 KBchx
#55 interdiff.txt714 byteschx
#52 1792536_52.patch31.37 KBchx
#50 1792536_50.patch31.07 KBchx
#46 1792536_46.patch31.58 KBchx
#42 1792536_42.patch31.02 KBchx
#38 1792536_38.patch29.55 KBchx
#37 1792536_37.patch29.49 KBchx
#31 1792536_31.patch28.05 KBchx
#29 1792536_29.patch30.24 KBchx
#27 1792536_27.patch25.75 KBchx
#25 1792536_25.patch26.15 KBchx
#23 1792536_23.patch26.24 KBchx
#13 1792536_11.patch20.21 KBchx
#11 1792536_9.patch20.2 KBchx
#7 1792536_7.patch19.11 KBchx
#6 drupal_classes.txt828 byteschx
#3 remove_install_backend.patch12.91 KBchx
remove_install_backend.patch7.01 KBchx
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

Note that explicit delete commands are still not try-catch'd. The reason is that those really should throw an exception if the DB is not there to give ample warning of possible stale caches -- unless of course you are so early in installing that the DB doesn't even exist yet. But while you are installing I can't imagine doing explicit deletes.

Status: Needs review » Needs work

The last submitted patch, remove_install_backend.patch, failed testing.

chx’s picture

FileSize
12.91 KB

Well, duh.

chx’s picture

Status: Needs work » Needs review
andypost’s picture

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.phpundefined
@@ -188,10 +188,15 @@ class DatabaseBackend implements CacheBackendInterface {
+    try {
+      Database::getConnection()->delete($this->bin)
+        ->condition('expire', CacheBackendInterface::CACHE_PERMANENT, '<>')
+        ->condition('expire', REQUEST_TIME, '<')
+        ->execute();
+    }
+    catch (Exception $e) {
+      // The database may not be available, so we'll ignore expire requests.
+    }

Probably approach without exceptions should be better:
if (class_exists(\Database, false) like I pointed in #1161486-71: Move IP blocking into its own module

chx’s picture

FileSize
828 bytes

While I agree with #5 and have implemented it, when the first connection is made during install and no tables have been installed yet you will get an exception throw at the first config get, attached is the list of the classes that exist. If you look into the InstallBackend, it wraps the parent::foo calls in a try-fail despite they are only made when the Database class already exists, and this is why. I am not sure whether we want to pursue this further. Please advise.

Edit: also note that checking for install not being defined is not desirable either as it is clearly described in the InstallBackend -- it's not a null-backend after all.

chx’s picture

FileSize
19.11 KB

We can try this. It's not without precedent: drupal_get_profile() also peeks at install_state.

chx’s picture

Also? I love the Drupal community (andypost in this case) with its relentless seeking of better solutions. Because this one, unlike the exceptions one, can handle the deletes as well. Yay!

chx’s picture

Title: Remove the install backend » Remove the install backend and the allow exceptions in the default cache backend

Also note that we removed the blanket try-catch from the default backend which can only be good. If there is a real problem with the database, that exception now shows up.

Status: Needs review » Needs work

The last submitted patch, 1792536_7.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
20.2 KB

Bleh, Drupal and its bastard testing framework.

Status: Needs review » Needs work

The last submitted patch, 1792536_9.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
20.21 KB

Status: Needs review » Needs work

The last submitted patch, 1792536_11.patch, failed testing.

pounard’s picture

Ok, so it's time to step into this issue. This is not the right way to go, adding a isDatabaseAvailable(). As soon as you instanciated a database backend database should be there, it's a prerequisite. The right solution would be to test during install if the database is there, then instanciate a database backend instead, or a null backend if it's not. Then we will have to check that the ugly cache stall problems are fixed doing this after install.

EDIT: Of course, there's numerous other details to think about. First, using a null backend will speed up the whole installation (or a memory backend even!) considering this fact only the last install task should use a real database backend in order to empty cleanly the caches.

Another solution would be to ensure that the very first site hit with would flush all caches. That would solve numerous cache stall problems.

fubhy’s picture

The intention of this patch is great, however (in its current state) it does not cure the problem, it simply covers the symptoms. And it does so only for the DatabaseBackend (which is CURRENTLY! fine for core install profiles). In my opinion, what should optimally happen is that the CacheFactory should in any case return a NullBackend instance in case we are not done with installation yet. Otherwise we only fix this for the DatabaseBackend.

Once we install using an installation profile that stores data in a bin that uses a custom backend we would have to fix it there too and basically everywhere we use a non-DatabaseBackend implementations.

fubhy’s picture

Re #15: If we use a NullBackend until the site is fully installed we would not even have to do that, it would simply work because once the site is installed it will return to simply using the cache backends that it is supposed to use. Consequently, we won't have any invalid cache entries that we would have to clear.

Also, the logic for checking whether the site is installed could live in cache() instead of the CacheFactor.

chx’s picture

Assigned: chx » Unassigned
pounard’s picture

I'm not fond of moving those checks into the CacheFactory, but if its solves unconditionally the problems, why not. But there is still another problem: when the site is being bootstrapped by an AJAX request during install, I'm not sure inside this request we can detect if we are in install or maintenance mode, both cases that cache backend exception would be useful.

chx’s picture

Priority: Normal » Critical

Given that this issue covers a broken upgrade path I deem it critical. To recap, we broke adding 'tags' to the cache table which thrown an exception. This was fixed in the merge issue but the upgrade path is still broken -- if you add a throw $e in DatabaseBackend::set then the cache_set in variables will immediately throw one and there are more, a lot more. The tags upgrade must be moved out of system and up to update prepare bootstrap but this is not visible due to the eaten exception.

chx’s picture

Note that the fails above are *exactly* about this! They are not new fails, they are existing failures unmasked by this patch.

Berdir’s picture

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.phpundefined
@@ -41,6 +41,14 @@ class DatabaseBackend implements CacheBackendInterface {
+   * Checks whether the database is available.
+   */
+  protected function isDatabaseAvailable() {
+    global $install_state;
+    return class_exists('Drupal\Core\Database\Database', FALSE) && (empty($install_state) || !empty($install_state['database_tables_exist']));

I assume the result of this check is wrong in case of the database cache unit tests. Unlike the upgrade path failures, those are actually new and introduced by this patch.

This would mean that global $install_state does in fact exist during a unit test, which currently doesn't make sense to me...

I am wondering if we can instead "inject" this information somehow from the outside, instead of having to check global variables here.

chx’s picture

Status: Needs work » Needs review
FileSize
26.24 KB

Status: Needs review » Needs work

The last submitted patch, 1792536_23.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
26.15 KB

Status: Needs review » Needs work

The last submitted patch, 1792536_25.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
25.75 KB

Status: Needs review » Needs work

The last submitted patch, 1792536_27.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
30.24 KB

Nasty buggers.

Status: Needs review » Needs work

The last submitted patch, 1792536_29.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
28.05 KB

With less debug.

Status: Needs review » Needs work

The last submitted patch, 1792536_31.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review

#31: 1792536_31.patch queued for re-testing.

Berdir’s picture

Hm, deadlock exceptions during cache set, multiple times in a row. Is it possible that these were there before but we've ignored them before?

It's always the same backtrace, as far as I can see.

exception 'PDOException' with message 'SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'system_list' for key 'PRIMARY'' in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php:58
Stack trace:
#0 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php(58): PDOStatement->execute(Array)
#1 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php(523): Drupal\Core\Database\Statement->execute(Array, Array)
#2 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Driver/mysql/Insert.php(34): Drupal\Core\Database\Connection->query('INSERT INTO {ca...', Array, Array)
#3 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Query/Merge.php(418): Drupal\Core\Database\Driver\mysql\Insert->execute()
#4 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Cache/DatabaseBackend.php(149): Drupal\Core\Database\Query\Merge->execute()
#5 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(252): Drupal\Core\Cache\DatabaseBackend->set('system_list', Array)
#6 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(105): system_list('module_enabled')
#7 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(919): module_list()
#8 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(830): module_hook_info()
#9 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(1000): module_implements('stream_wrappers')
#10 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(195): module_invoke_all('stream_wrappers')
#11 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(251): file_get_stream_wrappers()
#12 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(288): file_stream_wrapper_get_class(false)
#13 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(503): file_stream_wrapper_valid_scheme(false)
#14 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/lib/Drupal/simpletest/TestBase.php(853): file_prepare_directory('sites/default/f...', 3)
#15 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/lib/Drupal/simpletest/WebTestBase.php(650): Drupal\simpletest\TestBase->prepareEnvironment()
#16 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/system/lib/Drupal/system/Tests/Database/DatabaseTestBase.php(28): Drupal\simpletest\WebTestBase->setUp()
#17 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/lib/Drupal/simpletest/TestBase.php(696): Drupal\system\Tests\Database\DatabaseTestBase->setUp()
#18 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(381): Drupal\simpletest\TestBase->run()
#19 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(22): simpletest_script_run_one_test('142', 'Drupal\system\T...')
#20 {main}

Status: Needs review » Needs work

The last submitted patch, 1792536_31.patch, failed testing.

Berdir’s picture

Oh, and this one seems to be what is now actually happening when we have a conflict between two parallel merge conflicts:

exception 'PDOException' with message 'SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'system_list' for key 'PRIMARY'' in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php:58
Stack trace:
#0 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php(58): PDOStatement->execute(Array)
#1 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php(523): Drupal\Core\Database\Statement->execute(Array, Array)
#2 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Driver/mysql/Insert.php(34): Drupal\Core\Database\Connection->query('INSERT INTO {ca...', Array, Array)
#3 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Query/Merge.php(418): Drupal\Core\Database\Driver\mysql\Insert->execute()
#4 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Cache/DatabaseBackend.php(149): Drupal\Core\Database\Query\Merge->execute()
#5 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(252): Drupal\Core\Cache\DatabaseBackend->set('system_list', Array)
#6 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(105): system_list('module_enabled')
#7 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(39): module_list('module_enabled')
#8 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/common.inc(4782): module_load_all()
#9 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/bootstrap.inc(2118): _drupal_bootstrap_code()
#10 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(372): drupal_bootstrap(7)
#11 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(22): simpletest_script_run_one_test('198', 'Drupal\text\Tes...')
#12 {main}

So as far as I can see, the IntegrityConstraintViolationException part isn't working?

chx’s picture

Status: Needs work » Needs review
FileSize
29.49 KB

I think the IntegrityConstraintViolationException part works but it rethrows the exception. Instead, I changed how Merge retries. It might or might not help.

chx’s picture

FileSize
29.55 KB

Documented $retry properly.

Status: Needs review » Needs work

The last submitted patch, 1792536_38.patch, failed testing.

chx’s picture

Status: Needs work » Needs review

#38: 1792536_38.patch queued for re-testing.

Edit: Locking framework tests had a single fail and that definitely looks unrelated so I retested.

Berdir’s picture

So If I understand this correctly, then the concerns from @pounard have been addressed by messing with global $conf where necessary? That looks a bit ugly, but cleaner at the same time, as CacheFactory does not need to care about it. And we can possibly de-uglify those parts when moving the cache stuff properly into the container. Not quire sure yet how, but we can look at it there.

Looks close to RTBC to me then, the only thing that bothers me a bit (and already did in the issue where we added it is the naming of system_update_8007_add_tag_column():

+++ b/core/includes/update.incundefined
@@ -195,6 +198,116 @@ function update_prepare_d8_bootstrap() {
+      foreach ($tables as $table) {
+        system_update_8007_add_tag_column($table);

Now that we actually moved the call to update.inc, should we move and rename the helper function as well?

+++ b/core/modules/views/lib/Drupal/views/ViewsDataCache.phpundefined
@@ -205,14 +205,18 @@ protected function processEntityTypes(array &$data) {
+    try {
+      if ($this->rebuildCache && !empty($this->storage)) {
+        // Keep a record with all data.
+        $this->set($this->baseCid, $this->storage);
+        // Save data in seperate cache entries.
+        foreach ($this->storage as $table => $data) {
+          $cid = $this->baseCid . ':' . $table;
+          $this->set($cid, $data);
+        }
       }
+    } catch (\Exception $e) {
+      // During testing the table is gone before this fires. Nasty.

I guess that would get fixed with #512026: Move $form_state storage from cache to new state/key-value system as well.

chx’s picture

FileSize
31.02 KB

Moved and renamed the helper to update_add_cache_columns. Sure, that'd be swell if that issue would happen but it's no reason to hold this up. Added @todo. The change notice was filed Dec 21 at http://drupal.org/node/1872554 and I have already updated it in anticipation of this commit so if this gets in while I am asleep , it can safely be moved to fixed :)

Status: Needs review » Needs work

The last submitted patch, 1792536_42.patch, failed testing.

Berdir’s picture

Another deadlock.

exception 'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction' in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php:58
Stack trace:
#0 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php(58): PDOStatement->execute(Array)
#1 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php(523): Drupal\Core\Database\Statement->execute(Array, Array)
#2 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Driver/mysql/Insert.php(34): Drupal\Core\Database\Connection->query('INSERT INTO {ca...', Array, Array)
#3 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Query/Merge.php(425): Drupal\Core\Database\Driver\mysql\Insert->execute()
#4 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Cache/DatabaseBackend.php(149): Drupal\Core\Database\Query\Merge->execute()
#5 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(252): Drupal\Core\Cache\DatabaseBackend->set('system_list', Array)
#6 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(105): system_list('module_enabled')
#7 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(919): module_list()
#8 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(830): module_hook_info()
#9 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(1000): module_implements('stream_wrappers')
#10 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(195): module_invoke_all('stream_wrappers')
#11 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(251): file_get_stream_wrappers()
#12 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(370): file_stream_wrapper_get_class(false)
#13 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(1560): file_stream_wrapper_get_instance_by_uri('sites/default/f...')
#14 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/file.inc(506): drupal_chmod('sites/default/f...')
#15 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/lib/Drupal/simpletest/TestBase.php(853): file_prepare_directory('sites/default/f...', 3)
#16 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/lib/Drupal/simpletest/WebTestBase.php(650): Drupal\simpletest\TestBase->prepareEnvironment()
#17 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/system/lib/Drupal/system/Tests/Common/JavaScriptTest.php(38): Drupal\simpletest\WebTestBase->setUp()
#18 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/lib/Drupal/simpletest/TestBase.php(696): Drupal\system\Tests\Common\JavaScriptTest->setUp()
#19 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(381): Drupal\simpletest\TestBase->run()
#20 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(22): simpletest_script_run_one_test('93', 'Drupal\system\T...')
#21 {main}
Berdir’s picture

Status: Needs work » Needs review

#42: 1792536_42.patch queued for re-testing.

chx’s picture

FileSize
31.58 KB

I ran out of ideas on how not to deadlock short of nuking the transaction -- for now I am skipping MERGE altogether. Even if this happens to pass I will retest manually on QA a few times to make sure it does not deadlock.

chx’s picture

Probably we could just drop the transaction from merge itself, because -- without knowing the isolation level, it's pointless to run a transaction there.

chx’s picture

Forget it, the dec 21 commit already dropped the transaction but then why did we get a deadlock up on dec 31? Perhaps there's an outer transaction somewhere, we will see I submitted into a test issue both #42 and #46 like ten times, we will see whether either of them have deadlocks.

Edit: the results are in #1877372: Ignore this issue there is no doubt that #46 does not deadlock but #42 does. I would be happy if someone would tell me why is that.

chx’s picture

So, this is likely not to be an error with our Merge implementation much rather a mysql (maria?) bug. We investigated the deadlocks and they are not deadlocks.

------------------------
LATEST DETECTED DEADLOCK
------------------------
130102  4:07:35
*** (1) TRANSACTION:
TRANSACTION 7E28EC7A, ACTIVE 0 sec, process no 12597, OS thread id 140725946107648 inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s)
MySQL thread id 11084954, query id 5653107133 localhost drupaltestbot-my update
INSERT INTO cache_bootstrap (cid, serialized, created, expire, tags, checksum_invalidations, checksum_deletions, data) VALUES ('system_list', '1', '1357099655', '0', '', '0', '0', '...
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 87741737 page no 3 n bits 96 index "PRIMARY" of table "drupaltestbotmysql"."cache_bootstrap" trx id 7E28EC7A lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 7E28EC79, ACTIVE 0 sec, process no 12597, OS thread id 140725230679808 inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 376, 2 row lock(s)
MySQL thread id 11084810, query id 5653107132 localhost drupaltestbot-my update
INSERT INTO cache_bootstrap (cid, serialized, created, expire, tags, checksum_invalidations, checksum_deletions, data) VALUES ('system_list', '1', '1357099618', '0', '', '...
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 87741737 page no 3 n bits 96 index "PRIMARY" of table "drupaltestbotmysql"."cache_bootstrap" trx id 7E28EC79 lock mode S locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 87741737 page no 3 n bits 96 index "PRIMARY" of table "drupaltestbotmysql"."cache_bootstrap" trx id 7E28EC79 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (2)
------------

(2) could be granted an X lock and then (1) could be given one, there's no deadlock here. We also caught -- in the logs only -- deadlocks involving the semaphore INSERT (!).

We could go INSERT ON DUPLICATE KEY UPDATE -- that perhaps won't deadlock -- but how can we tell that the query / table is such that simplification is viable? It is only viable because there is only one unique key on cache tables and that's included in the key(). We can not load the schema because it's module defined and cache is earlier than modules. Are we ok with an explicit "simple" flag?

chx’s picture

FileSize
31.07 KB

Here's #42 with a "catch exceptions in set and do something sane in that case". It's no problems if set doesnt succeed if there's no stale data left behind. This patch has been submitted into #1877372-23: Ignore this issue for extensive testing.

Status: Needs review » Needs work

The last submitted patch, 1792536_50.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
31.37 KB

Best we can do , really. This is the same as #50 just catches the correct kind of exception and has a better comment.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me. Let's get this in, I think this is not going to result in more random failures than we already have due to having a catch there now. This will also simplify the cache in DIC issue, will look at fixing the installer there once this is in.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.phpundefined
@@ -131,33 +125,36 @@ protected function prepareItem($cache, $allow_invalid) {
+    catch (DatabaseExceptionWrapper $e) {
+      // If set() failed for whatever reason, then try to delete() to avoid a
+      // stale cache. Removing this causes sporadic testbot failures due to
+      // what seems a bug in the InnoDB deadlock detection heuristics.

Wondering if we want to have a @todo here (or maybe just a follow-up issue) to explore other ways to fix this, like upgrading to a different mysql version.

Sorry if I was unclear in regards to that destruct. What I meant is that *after* this issue is in, I will re-roll [#1872554] to use the terminator, to remove that hack again + having another example/proof why we need it. So it's actually a good thing that we have to do it here ;)

David_Rothstein’s picture

Looks really good, although this part is begging for a code comment...

 function install_bootstrap_full(&$install_state) {
+  unset($GLOBALS['conf']['cache_classes']['cache']);
+  drupal_static_reset('cache');
chx’s picture

FileSize
714 bytes
31.66 KB

OMG David Rothstein praises my patch! Biggest reward possible. Comment, certainly.

sun’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/includes/install.core.inc
@@ -342,17 +342,8 @@ function install_begin_request(&$install_state) {
-  $conf['cache_classes'] = array('cache' => 'Drupal\Core\Cache\InstallBackend');
+  $conf['cache_classes'] = array('cache' => 'Drupal\Core\Cache\NullBackend');

In the installer, we should actually use the MemoryBackend until cache tables are set up. Can we replace this NullBackend with MemoryBackend?

+++ b/core/includes/update.inc
@@ -136,7 +136,10 @@ function update_prepare_d8_bootstrap() {
   // Bootstrap the database.
+  $GLOBALS['conf']['cache_classes'] = array('cache' => 'Drupal\Core\Cache\NullBackend');
   drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE);
+  unset($GLOBALS['conf']['cache_classes']['cache']);
+  drupal_static_reset('cache');

It looks like the intention was to move the unset() and static_reset to the end of update_prepare_d8_bootstrap(), so that the cache backend override is only removed after the cache tables have been upgraded.

That said, I really wonder whether it makes sense to have any kind of cache backend for update.php... I don't think we'd ever want to read or write cached data when running updates. So perhaps it would be best to just simply remove the unset+reset, and move the cache backend override to an earlier place in update.php (and add a short comment to explain it).

+++ b/core/includes/update.inc
@@ -195,6 +198,116 @@ function update_prepare_d8_bootstrap() {
+      if (!db_table_exists('cache_tags')) {
+        $table = array(
...
+        db_create_table('cache_tags', $table);
...
+      if (!db_table_exists('cache_config')) {
+        $spec = array(
...
+        db_create_table('cache_config', $spec);

Can't we read those schema definitions out of the existing system_schema() (instead of re-declaring them in update.inc)?

+++ b/core/includes/update.inc
@@ -195,6 +198,116 @@ function update_prepare_d8_bootstrap() {
+      require_once DRUPAL_ROOT . '/core/modules/system/system.install';
+      $tables = array(
+        'cache',
+        'cache_bootstrap',
+        'cache_block',
+        'cache_field',
+        'cache_filter',
+        'cache_form',
+        'cache_image',
+        'cache_menu',
+        'cache_page',
+        'cache_path',
+        'cache_update',
+      );
+
+      foreach ($tables as $table) {
+        update_add_cache_columns($table);
+      }

1) Why don't we simply drop and re-create those cache tables instead of manipulating them? (which would also be one step forward towards #1167144: Make cache backends responsible for their own storage)

2) Contributed modules with cache tables will need a similar update mechanism. How will we address that?

+++ b/core/lib/Drupal/Core/Database/Query/Merge.php
@@ -416,16 +423,16 @@ public function execute() {
       catch (IntegrityConstraintViolationException $e) {
-        // The insert query failed, maybe it's because a racing insert query
-        // beat us in inserting the same row. Retry the select query, if it
-        // returns a row, ignore the error and continue with the update
-        // query below.
-        if (!$select->execute()->fetchField()) {
+        // Retry only once.
+        if ($this->retry) {
           throw $e;
         }
+        $this->retry = TRUE;
+        return $this->execute();

Hm. The original logic was explained well. I'm not sure whether I can make sense of the new logic — why would an insert query suddenly succeed on a second try if it failed on the first try?

+++ b/core/modules/system/lib/Drupal/system/Tests/DrupalKernel/DrupalKernelTest.php
@@ -38,6 +38,7 @@ function testCompileDIC() {
+    $conf['cache_classes'] = array('cache' => 'Drupal\Core\Cache\NullBackend');

@@ -105,5 +106,6 @@ function testCompileDIC() {
+    unset($conf['cache_classes']);

Can we move these to setUp() + tearDown()?

We should also add a comment to document why this part of the kernel functionality is intentionally not tested. Is it even valid to skip the cache layer in this test?

Btw, that DrupalKernelTest looks more and more like it should actually extend DUTB...?

+++ b/core/modules/views/lib/Drupal/views/ViewsDataCache.php
@@ -205,14 +205,19 @@ protected function processEntityTypes(array &$data) {
+    } catch (\Exception $e) {
+      // During testing the table is gone before this fires. Nasty.
+      // @todo remove after http://drupal.org/node/512026.
     }

1) This links to #512026: Move $form_state storage from cache to new state/key-value system, which does not sound related?

2) Minor: catch should be on a newline.

sun’s picture

On a second thought, all of the additional merge query changes cannot be right..., because:

#1798732: Convert install_task, install_time and install_current_batch to use the state system essentially achieved the exact same as this patch, but without any of those other changes. That is, at least as far as the installer is concerned. The stack traces in this issue generally seem to indicate the test setup as common root origin. Since the test setup goes through the non-interactive installer and the former (but now removed) InstallBackend was used in the installer, there seems to be a close relationship.

FWIW, the backtrace in #34 contains calls into Cache\DatabaseBackend, which should not be contained in the first place. The trace starts in prepareEnvironment(), which means that there is a database, but it doesn't contain any tables. Within that setup code, the database cache cannot be used.

In the worst case, we're accessing the wrong database; i.e., the one of the parent site/test runner — which would apparently explain why the SQL error is an "integrity violation" (~8 threads hitting the same cache tables like hell in parallel) instead of a "table does not exist" (expected).

chx’s picture

Assigned: Unassigned » sun
Berdir’s picture

Can't we read those schema definitions out of the existing system_schema() (instead of re-declaring them in update.inc)?

I'm not sure if it's safe to do that, they are just cache tables (so we could always re-create), but at least for normal tables, that wouldn't work in case we need to do changes later (post 8.0) on.

2) Contributed modules with cache tables will need a similar update mechanism. How will we address that?

We tried to do that before, but it is extremely complicated due to database prefixes, that's we replaced it with a fixed list, for contrib, see http://drupal.org/node/1872554. We had exactly the same situation in 7.x as well.

Hm. The original logic was explained well. I'm not sure whether I can make sense of the new logic — why would an insert query suddenly succeed on a second try if it failed on the first try?

Because of the deadlocks, which we still can't fully explain but think it is a mysql bug, but "try again" was one of the answers somewhere in the mysql mailing list.

1) This links to #512026: Move $form_state storage from cache to new state/key-value system, which does not sound related?

It is related, just check the issue :)

2) Minor: catch should be on a newline.

Interesting, I wasn't aware of this (always did it on the same line) but it does in fact look as if most examples in core are on a new line. I haven't found any official documentation on this other than #576248: [policy] Coding Standards Update for PHP5 which just seems to be an issue suggesting it?

Berdir’s picture

@#57:

- Your issue does not seem to remove the try/catch from the database backend. Those deadlocks were made visible due to that change, they do exist right now but are eaten in there.

- Yes, I am confused by those setUp()/tearDown issues as well and we need to look into it, this was my first thought as well. However, any frequently accessed site could run into the same problems.

Berdir’s picture

Ok, here is an update.

- Changed installer to use MemoryBackend. Let's see if that works.
- Moved the cache reset to the bottom of that function, agree that it makes more sense logically. Can we discuss the "completely disable cache" during update.php in a separate issue? I agree that it could make sense, especially when you have stale cache data dating back to 7.x. I know that this has broken things before in weird ways.
- Tried to improve the documentation in Merge.php
- Moved the cache backend configuration to setUp() for the kernel tests. Removed the unset, this is taken care of by TestBase::tearDown() Just like the deadlocks, without this change, it already did cache set calls internally, they were just eaten away by the database cache backend. I do agree that this smells like DUTB, just like many other tests as well (e.g. plugin discovery) but I think that should be discussed in another issue as well.
- Improved the @todo, added a new line for catch.

chx’s picture

Assigned: sun » Unassigned
Status: Needs review » Reviewed & tested by the community

As for confusing backtraces or leakage, that's no mystery: prepareEnvironment runs before the prefix change so we are hitting the same tables N times. Another fake deadlock hits {semaphore} when tearDown::refreshVariables calls variable_initialize . The pounding of tables + this mysql bug results in the problem. The penultimate solution would be to explicitly generate caches when the change occurs instead of generating a stampede.

sun's concerns are addressed, back to RTBC. I am grateful to berdir for addressing those.

catch’s picture

Status: Reviewed & tested by the community » Needs review

There's no discussion here of the original bug the install backend was introduced to fix (deleting cache entries set by ajax requests made during the installer). That bug wasn't testable and I think we've removed the only request made by core for clean urls check. On phone so no link to original but both me and David Rothstein were in volved.

sun’s picture

@catch:
I performed extensive manual testing in #1798732: Convert install_task, install_time and install_current_batch to use the state system, which equally changes the installer to use the MemoryBackend, and I never ran into bogus/corrupted cache entry issues. Obviously, that patch changes the installer environment some more, but the net effect with regard to the cache backend change is essentially identical - as long as no database tables exist, the MemoryBackend is used (which isn't shared across requests, so even if there is an Ajax request, it doesn't matter), and as soon as they exist, every bootstrap in install.php reverts the cache backend override before any installer actions are executed and data is cached as usual. Therefore, I think this change is fine.

+++ b/core/includes/update.inc
@@ -298,6 +411,9 @@ function update_prepare_d8_bootstrap() {
     }
+  // Now remove the cache override.
+  unset($GLOBALS['conf']['cache_classes']['cache']);
+  drupal_static_reset('cache');
   }
 }

Due to an indentation/brackets mismatch, the override is not reverted when running update.php on D8.

catch’s picture

Status: Needs review » Needs work

@sun so you mean this line:

 function install_bootstrap_full(&$install_state) {
+  unset($GLOBALS['conf']['cache_classes']['cache']);
+  drupal_static_reset('cache');

That's probably fine then. CNW for the last bit of #64 though.

Berdir’s picture

Status: Needs work » Needs review
FileSize
478 bytes
32.65 KB

Also did another manual install with this patch applied and everything seems to have worked fine.

sun’s picture

Assigned: Unassigned » Crell

I still fear that the Merge query changes are working around a symptom, but do not fix the actual cause. Assigning to @Crell.

Due to the debugging info provided in this issue, it appears pretty clear to me that our test environment setup is broken. I'd strongly recommend to fix that first, before introducing code that cannot really explain itself but magically fixes a mystic bug, which might not actually exist in the first place.

Berdir’s picture

I agree that we need to look into setUp()/tearDown(), opened #1882584: Improve concurrent testing by avoiding unecessary persistent rebuilds for that. Feel free to adjust the category/priority.

However, I do not think that this problem is limited to tests. It's just one scenario that can cause it. Just like #937284: DEADLOCK errors on MergeQuery INSERT due to InnoDB gap locking when condition in SELECT ... FOR UPDATE results in 0 rows was originally reported by users who where doing multiple requests at the same time, or multiple processes generating content and similar use cases with many cache sets at the same time. Unlike the original deadlock issue, these exceptions were eaten in the caching layer, this issue makes them visible.

Crell’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -131,33 +125,36 @@ protected function prepareItem($cache, $allow_invalid) {
+    catch (DatabaseExceptionWrapper $e) {
+      // If set() failed for whatever reason, then try to delete() to avoid a
+      // stale cache. Removing this causes sporadic testbot failures due to
+      // what seems a bug in the InnoDB deadlock detection heuristics.
+      $this->delete($cid);

I am not comfortable working around a bug that we think may be there but isn't documented. If there's an open MySQL bug report we can reference, let's do. Otherwise it's guesswork.

(I'm fine with the DatabaseBackend gracefully ignoring write failures; that's entirely sane, but not for reasons of the MySQL bug, which is more related to the changes to Merge.)

+++ b/core/lib/Drupal/Core/Database/Query/Merge.php
@@ -124,6 +124,17 @@ class Merge extends Query implements ConditionInterface {
@@ -416,16 +427,17 @@ public function execute() {

@@ -416,16 +427,17 @@ public function execute() {
           $insert->useDefaults($this->defaultFields);
         }
         $insert->execute();
+        $this->secondTry = FALSE;
         return self::STATUS_INSERT;
       }
       catch (IntegrityConstraintViolationException $e) {
-        // The insert query failed, maybe it's because a racing insert query
-        // beat us in inserting the same row. Retry the select query, if it
-        // returns a row, ignore the error and continue with the update
-        // query below.
-        if (!$select->execute()->fetchField()) {
-          throw $e;
+        // Could be a temporary problem (lock), retry once.
+        if (!$this->secondTry) {
+          $this->secondTry = TRUE;
+          return $this->execute();
         }
+        // In case we already tried a second time, throw the exception.
+        throw $e;
       }
     }
     if ($this->needsUpdate) {

This can't work. If an exception is thrown on the second try, then it's an exception thrown within an exception context => That fatal useless PHP error.

That could probably be worked around,

Crell’s picture

The last comment was supposed to be:

That could probably be worked around, but I am still not comfortable with this approach without knowing more than "we think MySQL has a bug somewhere and this may work."

Berdir’s picture

That fatal useless PHP error.

That fatal useless error I think you are referring to happens when an exception is thrown and not catched in an exception/error handler (Exception thrown without stackstrace). This is different :)

Berdir’s picture

Discussed this with Crell in IRC.

Because we re-added the exception ignore part to DatabaseBackend, we don't really care about the Merge.php change anymore. So I removed that, we can continue discussing that in the merge deadlock issue if we need to.

Also removed the comment in DatabaseBackend if you prefer having no explanation over the only possible explanation :)

Status: Needs review » Needs work

The last submitted patch, install-backend-exceptions-1792536-72.patch, failed testing.

Berdir’s picture

Hm, so the above failure shows why we can't revert the changes in Merge.php, I've seen them before now that I think about it :)

exception 'PDOException' with message 'SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'system_list' for key 'PRIMARY'' in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php:58
Stack trace:
#0 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php(58): PDOStatement->execute(Array)
#1 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php(523): Drupal\Core\Database\Statement->execute(Array, Array)
#2 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Driver/mysql/Insert.php(34): Drupal\Core\Database\Connection->query('INSERT INTO {ca...', Array, Array)
#3 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Query/Merge.php(418): Drupal\Core\Database\Driver\mysql\Insert->execute()
#4 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Cache/DatabaseBackend.php(151): Drupal\Core\Database\Query\Merge->execute()
#5 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(252): Drupal\Core\Cache\DatabaseBackend->set('system_list', Array)
#6 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(105): system_list('module_enabled')
#7 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/module.inc(39): module_list('module_enabled')
#8 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/common.inc(4777): module_load_all()
#9 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/bootstrap.inc(2150): _drupal_bootstrap_code()
#10 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(372): drupal_bootstrap(7)
#11 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(22): simpletest_script_run_one_test('419', 'Drupal\system\T...')
#12 {main}

This is a PDOException and we don't catch those. So either we need to do that (which means it would still fail in other cases where merge is used directly) or go back to the patch from #66.

sun’s picture

Assigned: Crell » Unassigned
Status: Needs work » Needs review
FileSize
31.53 KB

Eh? Isn't that the entire point of catching the exception in Merge::execute()?

Berdir’s picture

@sun: I think we should instead catch DatabaseException, which is the interface that DatabaseWrapperException and the constraint one implement. I was incorrect about PDOException, that was just the wrapped exception.

Status: Needs review » Needs work

The last submitted patch, drupal8.cache-install.75.patch, failed testing.

Berdir’s picture

@sun's patch failed with the exception. As discussed in IRC, I think this happens in case of a stacked race condition where a third process deletes a key while the one that run in a race condition does the second select query and ends up confused as it still does not exist.

I think this should work. Catching the interface should cover everything thrown by the database layer. Uploading 3x times to see if there's some randomness.

Edit: Uh, I thought the PDO part was changed anyway, doesn't really belong here but also doesn't hurt I think.

David_Rothstein’s picture

In the midst of all this bigger stuff, my point in #54 about the need for a code comment didn't get addressed. (Actually @chx did address it, but wound up adding a code comment somewhere else - which is OK because it probably was needed there too :)

So attached is a reroll with just that change.

There's no discussion here of the original bug the install backend was introduced to fix (deleting cache entries set by ajax requests made during the installer). That bug wasn't testable and I think we've removed the only request made by core for clean urls check. On phone so no link to original but both me and David Rothstein were in volved.

I had actually done a quick manual test of the earlier patch to ensure that bug didn't reoccur, but didn't mention it (however, I didn't realize there were no more Ajax requests in the installer; if so, my test was meaningless).

In theory, it should be OK, though. The earlier bug basically occurred because in the latter stages of the installer, the installer was saving things to the database but not clearing existing caches that were associated with those things. The fix at the time made the latter stages of the installer actually clear caches (but never write to them). This patch makes it use the normal cache infrastructure entirely, so it should still have the same effect. There is, I guess, a slight risk since the switchover doesn't happen until install_bootstrap_full(), so the early bootstrap on the latter pages of the installer might not clear the database cache now when it's supposed to. But hopefully it's a small risk in practice...

I guess the only other question is whether having the installer write data to the cache is ever a problem. I can't think of too many issues though (maybe "install.php" winds up in a cache somewhere, but it seems unlikely)?

It's also a shame to lose all those tests we had for this behavior, but there's not much we can do about it until we reach a point where we're able to test the installer directly (the existing tests were only indirect).

Status: Needs review » Needs work

The last submitted patch, install-backend-exceptions-1792536-79.patch, failed testing.

webchick’s picture

I just committed #1882584: Improve concurrent testing by avoiding unecessary persistent rebuilds. Hitting re-test to see if it has any effect.

webchick’s picture

Status: Needs work » Needs review
Berdir’s picture

Awesome, looks like that really did help a lot.

I tested a patch without the try/catch 10 times, see #1877372-26: Ignore this issue. All runs passed, so here is that patch combined with the comment from #79.

While we could still keep the try/catch just to be on the safe side (making sure that we do not leave stale caches behind isn't a bad idea), I think should now also be ok to go with this patch. So we should be able to get either this or #79 in.

sun’s picture

Title: Remove the install backend and the allow exceptions in the default cache backend » Remove the install backend and stop catching exceptions in the default database cache backend
Status: Needs review » Reviewed & tested by the community
Issue tags: +API clean-up

This looks nice and clean and sexy now.

webchick’s picture

Assigned: Unassigned » catch

This one also looks like a catchpatch. :)

catch’s picture

Status: Reviewed & tested by the community » Fixed

This looks great. Nice to see the install controller go away finally.

I want to add those exceptions back in #1167144: Make cache backends responsible for their own storage but that's a different issue so committed/pushed to 8.x.

No one should ever have been using the install backend, so I don't think this needs a change notice.

pounard’s picture

Big win!

Automatically closed -- issue fixed for 2 weeks with no activity.