Problem/Motivation

Suppose you have some content entity type (foo) with an entity reference to nodes, and you have contrib or custom code that needs to update something about the referenced nodes whenever the host entity is updated. So you implement hook_foo_presave() or hook_foo_update(), and in that function you load the referenced nodes, do something to them, and save them.

By design, all of this happens within a single database transaction: SqlContentEntityStorage::save() starts the transaction prior to invoking hook_foo_presave() and doesn't commit the transaction until after all hook_foo_update() implementations complete.

Meanwhile, when the first child node is saved, Entity::invalidateTagsOnSave() invalidates several broad tags ('node_list' and '4xx-response') that are not specific to the particular node, which causes DatabaseCacheTagsChecksum::invalidateTags() to send a merge (UPDATE) query to the corresponding records in the {cachetags} table.

This causes the database engine (e.g., InnoDB) to acquire a row lock on those records, which it doesn't release until the end of the transaction. However, if there are more child nodes that need to be saved, or other slow hook implementations that need to be performed, prior to the end of the transaction, then these locks could last 10, 20, or more seconds.

Now suppose you have concurrent users editing foo entities on this site. Even if they're editing different foo entities with different child nodes, they all end up competing for the same lock on the 'node_list' cache tag. Depending on how long they each need to wait, this could end up appearing as a deadlock, or in some cases, even lead to real deadlock.

Proposed resolution

Change DatabaseCacheTagsChecksum to delay updating the database with cache tag invalidations until just before the end of the transaction. However, also have its isValid() return FALSE while there are not-yet-committed invalidations. This way, other requests continue seeing the old value until the end of the transaction (which is already the case in HEAD, since that's how transactions work), while later code in the same request (e.g., other presave/update hooks) continues to see the invalidation as having already occurred (just as in HEAD).

If done right, I think this is fully backwards compatible, while reducing to practically 0 the time that a {cachetags} record lock is held.

Remaining tasks

  • Write patch.
  • Review if the idea and implementation are sound.
  • Figure out what test coverage to add.

User interface changes

None.

API changes

A public method addition to Drupal\Core\Database\Connection for adding callbacks to run just prior to transaction commit.

Data model changes

CommentFileSizeAuthor
#127 2966607-127-interdiff.txt2.47 KBBerdir
#127 2966607-127.patch32.47 KBBerdir
#125 2966607-125-interdiff.txt6.24 KBBerdir
#125 2966607-125.patch32.45 KBBerdir
#119 2966607-119.patch28.03 KBWim Leers
#119 interdiff.txt1.83 KBWim Leers
#115 2966607.105_114.interdiff.txt9.84 KBdww
#114 2966607-114-interdiff.txt606 bytesBerdir
#114 2966607-114.patch27.19 KBBerdir
#113 2966607-113-interdiff.txt1.88 KBBerdir
#113 2966607-113.patch26.88 KBBerdir
#111 2966607-111-interdiff.txt6.99 KBBerdir
#111 2966607-111.patch26.86 KBBerdir
#105 2966607-105-interdiff.txt915 bytesBerdir
#105 2966607-105.patch26.79 KBBerdir
#104 2966607-104-interdiff.txt2.09 KBBerdir
#104 2966607-104.patch26.79 KBBerdir
#102 2966607-102-interdiff.txt11.39 KBBerdir
#102 2966607-102.patch26.6 KBBerdir
#93 2966607-93-interdiff.txt2.62 KBBerdir
#93 2966607-93.patch21.95 KBBerdir
#88 2966607-88-interdiff.txt3.75 KBBerdir
#88 2966607-88.patch21.47 KBBerdir
#85 2966607-85-interdiff.txt14.78 KBBerdir
#85 2966607-85.patch21.49 KBBerdir
#79 2966607-79-interdiff.txt2.61 KBBerdir
#79 2966607-79.patch18.73 KBBerdir
#68 deadlock-no-patch.txt27.25 KBclaudiu.cristea
#67 error-deadlock.txt7.33 KBclaudiu.cristea
#36 2966607-36.patch19.54 KBWim Leers
#36 interdiff.txt1.81 KBWim Leers
#35 interdiff.txt1.13 KBWim Leers
#35 2966607-35.patch19.58 KBWim Leers
#34 2966607-34.patch18.49 KBWim Leers
#34 interdiff.txt1.92 KBWim Leers
#31 2966607-31.patch18.46 KBWim Leers
#31 interdiff.txt2.73 KBWim Leers
#22 invalidate_cache_tags_at_end_of_root_transaction-22.patch19.55 KBWim Leers
#22 interdiff.txt2.73 KBWim Leers
#20 interdiff.txt5.29 KBWim Leers
#20 invalidate_cache_tags_at_end_of_root_transaction-19.patch19.5 KBWim Leers
#6 invalidate_cache_tags_at_end_of_root_transaction.patch8.78 KBWim Leers
#9 interdiff.txt708 bytesWim Leers
#9 invalidate_cache_tags_at_end_of_root_transaction-2966607-9.patch9.14 KBWim Leers
#10 2966607-10-test_only_FAIL.patch5.68 KBWim Leers
#10 invalidate_cache_tags_at_end_of_root_transaction-2966607-10.patch14.76 KBWim Leers
#16 interdiff.txt3 KBWim Leers
#16 invalidate_cache_tags_at_end_of_root_transaction-16.patch15.35 KBWim Leers
#17 interdiff.txt2.5 KBWim Leers
#17 invalidate_cache_tags_at_end_of_root_transaction-17.patch15.45 KBWim Leers
#18 interdiff.txt2.56 KBWim Leers
#19 invalidate_cache_tags_at_end_of_root_transaction-19_tests_only_FAIL.patch10.06 KBWim Leers
#18 invalidate_cache_tags_at_end_of_root_transaction-18.patch16.18 KBWim Leers
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

effulgentsia created an issue. See original summary.

effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Issue summary: View changes
Wim Leers’s picture

/me high-fives @effulgentsia. We talked this through together yesterday, and we pair-programmed remotely on a proof-of-concept patch. (I turned it into a functioning patch, @effulgentsia turned our discussion into a clear issue summary!)

Let's see what testbot says…

Status: Needs review » Needs work

The last submitted patch, 6: invalidate_cache_tags_at_end_of_root_transaction.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Issue tags: +Needs tests

Only two tests failed, both of them in demo_umami. I am fairly certain that that means this is actually going to work! I'll work to get those failing tests to pass, and also on explicit test coverage that explicitly tests:

  1. reordering of UPDATE cachetags SET invalidations=invalidations + 1 WHERE tag = … queries to the end of the transaction
  2. cache reads in the middle of a transaction that ARE NOT depending on cache tags that have not yet been invalidated result in cache reads/hits
  3. cache reads in the middle of a transaction that ARE depending on cache tags that have not yet been invalidated result in cache ignores/misses
Wim Leers’s picture

Status: Needs work » Needs review
FileSize
708 bytes
9.14 KB

The failures were happening during Umami's installation, specifically the thing that's unique about it (hence only Umami is failing): content import. I've been debugging this all day.

During content import, Umami creates a Term entity which has a path alias. Path aliases are stored in the path field type; their logic lives in \Drupal\path\Plugin\Field\FieldType\PathItem. In that class' ::postSave() method, we need to save the path -> alias mapping. The alias is specified already, but the original path we still need to compute. To do that, it calls $entity->urlInfo()->getInternalPath(). This means we look up the canonical route for the Term entity, to get the path component of the URL. This fails with a RouteNotFoundException with the message Route "entity.taxonomy_term.canonical" does not exist..

So, somehow, we're causing that route to not yet have been written. That's what we have to figure out and fix. And it seemingly is the only adversely affected code. Hopefully no custom or contrib code is similarly affected; the way routes are built/provided/read is rather complex.

After quickly noticing that the missing route actually is known at earlier moments in time, and then debugging the route provider and builder for hours, after finding the seemingly related issues #2788087: Drupal 8: You have requested a non-existent service "router.route_provider.old" and #2913912: URL generator may have a stale route provider during module installation and #2937015: Update drupal 7 functions to drupal 8 … I finally found the root cause:

SQLSTATE[21S01]: Insert value list does not match column list: 1136 Column count doesn't match value count at row 1: INSERT INTO {cache_data} (cid, expire, created, tags, checksum, data, serialized) VALUES () ON DUPLICATE KEY UPDATE cid = VALUES(cid), expire = VALUES(expire), created = VALUES(created), tags = VALUES(tags), checksum = VALUES(checksum), data = VALUES(data), serialized = VALUES(serialized); Array
(
)

which also generates this in the MySQL query log:

		   11 Query	INSERT INTO test20492877cache_data (cid, expire, created, tags, checksum, data, serialized) VALUES () ON DUPLICATE KEY UPDATE cid = VALUES(cid), expire = VALUES(expire), created = VALUES(created), tags = VALUES(tags), checksum = VALUES(checksum), data = VALUES(data), serialized = VALUES(serialized)

Spot the error? VALUEs () is wrong. This is due to the "don't do unnecessary writes" optimization that we added:

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -236,6 +236,11 @@ protected function doSetMultiple(array $items) {
+      // @see \Drupal\Core\Cache\CacheTagsChecksumInterface::CHECKSUM_WHILE_IN_TRANSACTION
+      if ($fields['checksum'] === CacheTagsChecksumInterface::CHECKSUM_WHILE_IN_TRANSACTION) {
+        continue;
+      }

… which can result in there then being zero rows to write, but the patch didn't yet catch that case. IOW: the root cause turned out to be something so absolutely trivial that it feels painful to have spent the better part of a day finding it. However, it was so very difficult to find because the exception was not showing up anywhere; it was being caught prematurely by \Drupal\Core\Routing\RouteProvider::preLoadRoutes():

        try {
          $result = $this->connection->query('SELECT name, route FROM {' . $this->connection->escapeTable($this->tableName) . '} WHERE name IN ( :names[] )', [':names[]' => $routes_to_load]);
          $routes = $result->fetchAllKeyed();

          $this->cache->set($cid, $routes, Cache::PERMANENT, ['routes']);
        }
        catch (\Exception $e) {
          $routes = [];
        }

That catch statement, and especially it doing $routes = []; is what made this so very damn hard to find!

If all is well, that should make this patch green! Next step: the 3 parts of explicit test coverage mentioned in #8.

Wim Leers’s picture

  1. reordering of UPDATE cachetags SET invalidations=invalidations + 1 WHERE tag = … queries to the end of the transaction

Whew, this was one of the hardest things to write tests for that I've seen in years.

To test this, we need to be able to get a list of all executed queries. So I set out to decorate the @database service. This does not work, because A) there are multiple static methods, B) there is no interface, so you have to subclass \Drupal\Core\Database\Connection (meaning you have to manually override every method, and so on.

Eventually, I landed on something that works, requires little code, but does require a per-DB driver override. That's still totally manageable though.

So, one of the 3 test cases described in #8 now has an explicit test, but only for MySQL for now. Still, an important piece of validation!

If someone else wants to take this on, feel free! Unassigning for the weekend.

Wim Leers’s picture

+++ b/core/modules/system/tests/modules/entity_test/entity_test.module
@@ -514,6 +514,15 @@ function entity_test_entity_form_display_alter(EntityFormDisplay $form_display,
+  if ($entity instanceof \Drupal\entity_test\Entity\EntityTest) {
+    \Drupal\user\Entity\User::create([
+      'name' => 'foobar',
+      'status' => 1,
+    ])->save();
+

I forgot to move this into a separate test module; it's getting very late here…

dawehner’s picture

+++ b/core/lib/Drupal/Core/Database/Connection.php
@@ -1191,6 +1198,22 @@ public function popTransaction($name) {
   /**
+   * Registers a pre-root transaction commit callback.
+   *
+   * @param callable $callback
+   *   The callback to invoke.
+   *
+   * @see \Drupal\Core\Database\Connection::doCommit()
+   *
+   * @internal
+   * @todo Do we want to support multiple?
+   * @todo Mature this into an official API.
...
+  public function registerPreRootTransactionCommitCallback(callable $callback) {
+    $this->preRootTransactionCommitCallback = $callback;
+  }

From reading \Drupal\KernelTests\Core\Database\TransactionTest a while ago, I remember we do have support for nested transactions, which for example is the usecase of nested entity saving. I'm wondering whether this would cause the invalidation of the outer cache tags, in the inner transaction commit?

Thinking about it a bit more: Maybe we need this callback on the transaction (\Drupal\Core\Database\Transaction) itself?

One thing I'm wondering though: Could we avoid touching the DB layer by making things a little it more explicit and not touching every DB transaction automatically?

  public function save(EntityInterface $entity) {
    $transaction = $this->database->startTransaction();
    \Drupal::service('cache_tags.invalidator')->delayCacheTags();
    try {
     ...
     \Drupal::service('cache_tags.invalidator')->stopDelayCacheTags();
    }

I'm not sure this is a better way, the current proposal seems to do quite a bit of magic though.

Berdir’s picture

Bunch of comments, mostly minor things, I guess a bit too early for that kind of review but still posting it. Also saw the comment above about the presave thing only after writing it but I guess there's still a bit of useful info there, so leaving it there.

  1. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumInterface.php
    @@ -19,6 +19,21 @@
       /**
    +   * The checksum returned a database transaction is in progress.
    +   *
    +   * Every cache backend SHOULD detect this and not write cache items with this
    +   * checksum. Not detecting this would not yield incorrect cache reads, but
    

    is there an "if" missing in the first sentence? doesn't look like a complete sentence.

  2. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumInterface.php
    @@ -19,6 +19,21 @@
    +  const CHECKSUM_WHILE_IN_TRANSACTION = -1;
    

    was wondering if a negative number could break an existing backend that is not expecting this, but it's actually defined as a string, so we should be fine.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -236,6 +236,11 @@ protected function doSetMultiple(array $items) {
     
    +      // @see \Drupal\Core\Cache\CacheTagsChecksumInterface::CHECKSUM_WHILE_IN_TRANSACTION
    +      if ($fields['checksum'] === CacheTagsChecksumInterface::CHECKSUM_WHILE_IN_TRANSACTION) {
    +        continue;
    

    Inline @see is afaik not valid and also not necessary as anything capable of parsing that (api.drupal.org, IDE) is also capable of parsing the reference below?

  4. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -247,6 +252,11 @@ protected function doSetMultiple(array $items) {
     
    +    // @see \Drupal\Core\Cache\CacheTagsChecksumInterface::CHECKSUM_WHILE_IN_TRANSACTION
    +    if (empty($values)) {
    +      return;
    

    same, @see not valid here, I guess this is more of a @todo write actual comment anyway? ;)

  5. +++ b/core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php
    @@ -44,9 +51,35 @@ public function __construct(Connection $connection) {
    +  /**
    +   * Helper method for ::invalidateTags().
    +   */
    +  protected function doInvalidateTags(array $tags) {
    

    will also need better docs and @param

  6. +++ b/core/modules/system/tests/modules/entity_test/entity_test.module
    @@ -514,6 +514,15 @@ function entity_test_entity_form_display_alter(EntityFormDisplay $form_display,
    +
    +  if ($entity instanceof \Drupal\entity_test\Entity\EntityTest) {
    +    \Drupal\user\Entity\User::create([
    +      'name' => 'foobar',
    +      'status' => 1,
    +    ])->save();
    +
    +  }
    

    we should IMHO wrap this in a state condition or so, so we don't create lots of users in all existing tests saving entities, also the instance of will trigger for any subclass of EntityTest and we have at least a dozen of those. use getEntityTypeId() instead?

Wim Leers’s picture

I'm wondering whether this would cause the invalidation of the outer cache tags, in the inner transaction commit?

That callback is only invoked for root transactions, not for nested transactions. We specifically do not want to do this for every transaction, because then we again increase deadlock probability.

Could we avoid touching the DB layer by making things a little it more explicit and not touching every DB transaction automatically?

Initially, @effulgentsia and I were contemplating two possible approaches. The one in the patch above. And one in which we make the Entity base class and the SQLContentEntityStorage aware of it. But that would only solve the entity-specific case. That's definitely the most common case. But it's possible for long-running DB queries/transactions to occur for things other than entities (and perhaps also cache tag invalidations to occur), and hence for the same deadlocks to occur. So we figured we might as well only propose the more general approach.

Status: Needs review » Needs work

The last submitted patch, 10: invalidate_cache_tags_at_end_of_root_transaction-2966607-10.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
3 KB
15.35 KB

Addressed #11. Should make lots (all?) tests pass.

Wim Leers’s picture

#16 passes tests, but doesn't actually verify yet that the nested entity save occurred.

Wim Leers’s picture

Actually, so far tests were using the memory-based cache backend. (This is specific to kernel tests.) Let's switch to the DB-based one.

Wim Leers’s picture

Here is the remaining test coverage, as described in #8. Plus some more cases.

It all passes locally, so should also pass on testbot :)

Wim Leers’s picture

Uhm … d.o just deleted the non-failing, tests+changes patch in #19. And it also deleted the interdiff. WTF :(

The last submitted patch, 19: invalidate_cache_tags_at_end_of_root_transaction-19_tests_only_FAIL.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Wim Leers’s picture

#13: Yep, bit too early, but doesn't matter.

  1. ✔️ Improved wording.
  2. 👍
  3. ✔️
  4. ✔️
  5. ✔️
  6. ✔️ Already did that in #16!
Wim Leers’s picture

And yay! We've now got some solid test coverage, which is passing, and all of core's existing tests are green. Looking forward to your reviews! :)

webchick’s picture

Priority: Normal » Major

Given the propensity for this:problem to cause 30+ second page load times for people who run into this, escalating to Major.

Great work!! :D

Wim Leers’s picture

Given the propensity for this:problem to cause 30+ second page load times for people who run into this, escalating to Major.

This does NOT cause >30 second page load times. Some sites have so much custom code that runs during an entity type's pre- and post-save hooks, that running that code takes >30 seconds. It's the very fact that all that custom code takes so long to run that it even causes these deadlocks.
IOW: #24 reverses symptom and cause. Transaction duration is the cause (and questionably nested entity saves are the root cause), deadlock is the symptom.

Wim Leers’s picture

Posted a comment at #2347867-42: Race conditions with lock/cache using non-database storage - add a non-transactional database connection, asking people on that issue to please review this issue!

webchick’s picture

Fair enough. :) Make that, the fact that the lack of this solution opens the door for 30+ second load times on sites, we should treat this as major, IMO. Feel free to downgrade if you disagree though.

Wim Leers’s picture

Right, this patch makes it easier for sites to get away with 30+ response times, to allow all of their super complex logic to run.

Agreed with Major — this is super important for some sites!

Fabianx’s picture

Status: Needs review » Needs work

Quick review:

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -236,6 +236,11 @@ protected function doSetMultiple(array $items) {
+      // Avoid useless writes.
+      if ($fields['checksum'] === CacheTagsChecksumInterface::CHECKSUM_WHILE_IN_TRANSACTION) {
+        continue;
+      }

There is a problem with that.

Even though no-one in core does that, in theory a cache could be invalidated just by setting the new value.

This would break that, we at least need to delete the item (at commit time).

---

I feel that for a complete solution we need a little bit more robustness.

I also dislike that it feels like we special case the subscribing to commit events just for one case instead of a more generic subscriber using e.g. events (could be optimized for the NULL case). Also it feels like we might want to ensure that all caches do not need to to implement that themselves.

e.g. a by default decorated cache.backend.*, which takes care of transaction state.

Repost of full proposal from the other issue:

While in a transaction:

- cache_get: Allow - if cache_prefix or cache tag had not been invalidated in this request [use simplified memcache key prefix algo, we could also be more strict and do invalidation by bin]
(else we run 100% cache-less, which is not really what we want even though most things are static caches anyway).

- cache_set: Disallow, treat like a record_cache_clear_all($cid), record
- cache_clear_all: Disallow, treat like a record_cache_clear_all($bin, $prefix)
- tag invalidation: Disallow, treat like record_cache_tag_invalidation

After the transaction is committed => commit all the cache clears, which is simple as just need to execute it.

That will solve the cache inconsistency for memcache / redis AND also reduce lock wait timeout exceeded problems within transactions as there will be no writes to the cache bins during the transaction.

I can see that this is feature creep, so proposal:

If we want to just move cache tag invalidation to the end of the transaction, that is fine. If we want to also avoid other writes => more work is needed.

And as long as that new interface to set the commit subscriber is marked 'private' / @internal we can also extend it in the future for the 100% solution.

Wim Leers’s picture

Status: Needs work » Needs review

There is a problem with that.

Even though no-one in core does that, in theory a cache could be invalidated just by setting the new value.

This would break that, we at least need to delete the item (at commit time).

Setting what new value? This cannot break anything AFAICT. The code you quoted is an optional optimization to avoid useless writes. We can safely remove that code you cited, and things would still work correctly, thanks to checksum validation.

I feel that for a complete solution we need a little bit more robustness.

What robustness is missing?

I also dislike that it feels like we special case the subscribing to commit events just for one case instead of a more generic subscriber using e.g. events (could be optimized for the NULL case). Also it feels like we might want to ensure that all caches do not need to to implement that themselves.

e.g. a by default decorated cache.backend.*, which takes care of transaction state.

Again, none of the cache backends have to implement anything; this is an optional optimization. Attached is a patch that removes the changes to DatabaseBackend to prove this.

Symfony event subscribers/event dispatching are very slow performance-wise. There can easily be many dozens of cache tag invalidations, dispatching events for all of them, iterating through event subscriber services to find them etc is most definitely going to be too expensive. People were already complaining about the array_unique(array_merge(…)) statements in \Drupal\Core\Cache\Cache (and they indeed show up in profiling), they even complained about the assert() calls added there.
Even assuming it wouldn't be a performance problem, I don't see what problem you're wanting to solve with it. Can you be more specific?

Repost of full proposal from the other issue:
[…]
I can see that this is feature creep, so proposal:

If we want to just move cache tag invalidation to the end of the transaction, that is fine. If we want to also avoid other writes => more work is needed.

Thank you for keeping this issue tightly scoped! That is indeed what this issue is about: it's only about moving cache tag invalidation DB queries to the end of the transaction. Avoiding other DB writes can happen in other issues.

Wim Leers’s picture

Attached is a patch that removes the changes to DatabaseBackend to prove this.

Here is that patch.

Fabianx’s picture

I have to apologize:

I misread the code and just took CHECKSUM_WHILE_IN_TRANSACTION literally (e.g. return always that checksum while in transaction).

The code and approach is fine. I would however rename that to INVALID_CHECKSUM_WHILE_IN_TRANSACTION, because as a side effect this also solves the cache coherency problem for all other cache backends (including memcache and redis).

As long as the code only invalidates via cache tags and uses the usual get() => miss => set() pattern, they will be correct.

Because as the data is written with an invalid checksum, it will be invalid.

The only problem is that for those cache items, then they are essentially uncached while the transaction is on-going - however that only applies to the request in the transaction.

Lets quickly verify a race condition case (using memcache as cache backend) as well:

  • - Process A starts a transaction and clears the 'rendered' cache tag [worst case], then for some odd reason re-calculates the "who's online" block, which now has an invalid checksum, process A continues for 30 more seconds within the transaction.
  • - Process B reads the cache entry from memcache, it is invalid (-1 can never occur as checksum [or maybe via overflow?]) and treated as a cache miss
  • - Process B rebuilds the cache entry and writes it with the last good known checksum to memcache
  • - Process A reads the cache entry for some strange reason and as it is within reset tags, the entry is invalid and rebuild again

This is all correct. Even if the transaction aborts and is rolled back it continues to be correct.

Now for the case that memcache would support the optimization case:

  • - Process A starts a transaction and clears the 'rendered' cache tag [worst case], then for some odd reason re-calculates the "who's online" block, but no cache is written process A continues for 30 more seconds within the transaction.
  • - Process B reads the cache entry from memcache, it is valid and treated as a cache hit
  • - Process A reads the cache entry for some strange reason and as it is within reset tags, the entry is invalid and rebuild again, but again not set

Both are correct, but memcache and redis are strongly advised to support the API of not setting items with an invalid checksum for performance reasons.

+1 to it - more detailed review later (if not someone else beats me to RTBC)

Fabianx’s picture

+++ b/core/lib/Drupal/Core/Database/Connection.php
@@ -1191,6 +1198,22 @@ public function popTransaction($name) {
   /**
+   * Registers a pre-root transaction commit callback.
+   *
+   * @param callable $callback
+   *   The callback to invoke.
+   *
+   * @see \Drupal\Core\Database\Connection::doCommit()
+   *
+   * @internal
+   * @todo Do we want to support multiple?
+   * @todo Mature this into an official API.
+   */
+  public function registerPreRootTransactionCommitCallback(callable $callback) {
+    $this->preRootTransactionCommitCallback = $callback;
+  }

I think we should support multiple out of the box, but keep this marked @internal. Because the change to multiple is just 3 lines-of-code, I think it is worth it.

Shipping with an internal API that only allows one setter at once will lead to frustration later down the road - one way or another - in my experience.

Wim Leers’s picture

#32:

I would however rename that to INVALID_CHECKSUM_WHILE_IN_TRANSACTION

Good call! Done.

The only problem is that for those cache items, then they are essentially uncached while the transaction is on-going - however that only applies to the request in the transaction.

Correct. And this is of course intentional. Because the transaction that the cache write is for has not yet been committed and could therefore be reverted.

[…] strongly advised to support the API of not setting items with an invalid checksum for performance reasons.

Correct.

+1 to it - more detailed review later (if not someone else beats me to RTBC)

❤️🎉

Wim Leers’s picture

Given that @FabianX is now +1 to the whole "invalid checksum while in transaction"-thing, also restoring the test coverage that I reverted in #30/#31 to convince him.

Wim Leers’s picture

#33:

Shipping with an internal API that only allows one setter at once will lead to frustration later down the road

Agreed. (Arguably just a single callback is fine because then we're sure only the code added by this patch can use it. But that's a weak argument.)

Done.

The last submitted patch, 34: 2966607-34.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

The last submitted patch, 35: 2966607-35.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Status: Needs review » Needs work

The last submitted patch, 36: 2966607-36.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Wim Leers’s picture

Is HEAD broken? It doesn’t seem to be. This is a really weird failure. #2969738: testQuickStartCommand failing on PHP5 last touched that test and was committed very recently.

Fabianx’s picture

Issue tags: +Needs change record

This should have a change record and likely also ping memcache / redis maintainers directly in the relevant queues as this is super important.

markgifford’s picture

I work with a large site with lots of nested entities that was generating cache tag deadlock errors, perhaps between 100 and 200 a day. An early version of this patch - sorry, unsure which exact version but I think probably the initial one - was deployed for testing. Since then, we haven't seen any cache tag related deadlock errors.

Wim Leers’s picture

Status: Needs work » Needs review

Re-testing #36. Hoping that the weirdness with Drupal\Tests\Core\Command\QuickStartTest has been fixed, either in core or in DrupalCI's infrastructure. This patch cannot possibly cause this. #2910883-51: Move all entity type REST tests to the providing modules was also impacted by that same weird failure.

Wim Leers’s picture

@markgifford THANK YOU! That is great to hear, and WONDERFUL news! Thanks for testing & reporting back!

Status: Needs review » Needs work

The last submitted patch, 36: 2966607-36.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

borisson_’s picture

The same test is still failing so it is related to this patch, the test expects to see Congratulations, you installed Drupal!, instead there is an error: There is already an active transaction.

This test already runs in a seperate process, so that is not a solution to this problem.

Nick_vh’s picture

For the record - we have a site where we are seeing deadlocks on the {cachetags} table. We are going to give this patch a try as well and will report back if it has resolved our issue. It's a semi-decoupled site which is using a bunch of rest api calls. When there is some traffic on the site, it probably reproduces the initial case given where there are many similar requests at once and this cachetag insert deadlock occurs. All credit goes to @nickvanpraet to spot this. None to me - but reporting for the sake of documentation.

Wim Leers’s picture

#46: The patch came back green before; this new test is merely installing Drupal yet is failing. So either it's a false negative (i.e. the test is bad), or it's a true negative and this installation is somehow installing Drupal differently (i.e. the test is good, and tests a hitherto unexercised code path).

#47: Excellent! Looking forward to seeing your results :) Hopefully they're as good as #42's :)

bendev’s picture

Hi Wim, thanks for the patch.

I also have a similar issue on a live website and wants to test the patch.
(first need to reroll it against 8.5.3 - we try to do it in the coming days and come back to you with the results)

Wim Leers’s picture

@bendev: 👍

bendev’s picture

@wim

It doesn't seem to work in my case. I still get the following errors but my problem is perhaps different:

Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {semaphore} (name, value, expire) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2); Array ( [:db_insert_placeholder_0] => state:Drupal\Core\Cache\CacheCollector [:db_insert_placeholder_1] => 9924356965b0ed832163486.65656809 [:db_insert_placeholder_2] => 1527699536.0902 ) in Drupal\Core\Cache\CacheCollector->updateCache() (line 233 of /var/www/myproject/web/core/lib/Drupal/Core/Cache/CacheCollector.php) #0 /var/www/myproject/web/core/lib/Drupal/Core/Database/Connection.php(657): Drupal\Core\Database\Connection->handleQueryException(Object(PDOException), 'INSERT INTO {se...', Array, Array) #1 /var/www/myproject/web/core/lib/Drupal/Core/Database/Driver/mysql/Connection.php(87): Drupal\Core\Database\Connection->query('INSERT INTO {se...', Array, Array) #2 /var/www/myproject/web/core/lib/Drupal/Core/Database/Driver/mysql/Insert.php(32): Drupal\Core\Database\Driver\mysql\Connection->query('INSERT INTO {se...', Array, Array) #3 /var/www/myproject/web/core/lib/Drupal/Core/Lock/DatabaseLockBackend.php(77): Drupal\Core\Database\Driver\mysql\Insert->execute() #4 /var/www/myproject/web/core/lib/Drupal/Core/ProxyClass/Lock/DatabaseLockBackend.php(75): Drupal\Core\Lock\DatabaseLockBackend->acquire('state:Drupal\\Co...', 30) #5 /var/www/myproject/web/core/lib/Drupal/Core/Cache/CacheCollector.php(233): Drupal\Core\ProxyClass\Lock\DatabaseLockBackend->acquire('state:Drupal\\Co...') #6 /var/www/myproject/web/core/lib/Drupal/Core/Cache/CacheCollector.php(327): Drupal\Core\Cache\CacheCollector->updateCache() #7 /var/www/myproject/web/core/lib/Drupal/Core/EventSubscriber/KernelDestructionSubscriber.php(51): Drupal\Core\Cache\CacheCollector->destruct() #8 [internal function]: Drupal\Core\EventSubscriber\KernelDestructionSubscriber->onKernelTerminate(Object(Symfony\Component\HttpKernel\Event\PostResponseEvent), 'kernel.terminat...', Object(Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher)) #9 /var/www/myproject/web/core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php(111): call_user_func(Array, Object(Symfony\Component\HttpKernel\Event\PostResponseEvent), 'kernel.terminat...', Object(Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher)) #10 /var/www/myproject/vendor/symfony/http-kernel/HttpKernel.php(88): Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch('kernel.terminat...', Object(Symfony\Component\HttpKernel\Event\PostResponseEvent)) #11 /var/www/myproject/vendor/stack/builder/src/Stack/StackedHttpKernel.php(32): Symfony\Component\HttpKernel\HttpKernel->terminate(Object(Symfony\Component\HttpFoundation\Request), Object(Symfony\Component\HttpFoundation\Response)) #12 /var/www/myproject/web/core/lib/Drupal/Core/DrupalKernel.php(642): Stack\StackedHttpKernel->terminate(Object(Symfony\Component\HttpFoundation\Request), Object(Symfony\Component\HttpFoundation\Response)) #13 /var/www/myproject/web/index.php(22): Drupal\Core\DrupalKernel->terminate(Object(Symfony\Component\HttpFoundation\Request), Object(Symfony\Component\HttpFoundation\Response)) #14 {main}.

Drupal\Core\Database\DatabaseExceptionWrapper : SQLSTATE[42000]: Syntax error or access violation: 1305 SAVEPOINT savepoint_1 does not exist: ROLLBACK TO SAVEPOINT savepoint_1; Array ( ) dans Drupal\Core\Entity\Sql\SqlContentEntityStorage->save() (ligne 827 de /var/www/myproject/web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php).

Drupal\Core\Entity\EntityStorageException : SQLSTATE[42000]: Syntax error or access violation: 1305 SAVEPOINT savepoint_1 does not exist: ROLLBACK TO SAVEPOINT savepoint_1; Array ( ) dans Drupal\Core\Entity\Sql\SqlContentEntityStorage->save() (ligne 829 de /var/www/myproject/web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php).

PS I started from https://www.drupal.org/project/drupal/issues/2833539. This is how I came to this issue.

Berdir’s picture

That's a different case, your lock is around the semaphore table when writing a cache collector. I suggest you open a specific issue for that, because I think we can just ignore such an exception there, we don't care if we fail to write that cache.

You could also try the redis module, which provides a lock backend that replaces the semaphore table.

bendev’s picture

ok @berdir thanks for the clarification

I also used to get some errors on cache_entity. Since I apply this patch I didn't see one anymore. I will keep watching...
Thanks

Drupal\Core\Database\DatabaseExceptionWrapper : SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {cache_entity}

I will follow your suggestion to open a new ticket or install redis

bendev’s picture

After a few days, still getting some deadlock on cache_entity table...
Don't know if this helps but this seems to always be related with field collections in my case

effulgentsia’s picture

Thanks for the update. That actually makes sense. This patch is solely focused on removing the deadlocks on the cachetags table. It would be odd if it had the side-effect of changing behavior with respect to deadlocks on any other tables. Those other deadlocks would need separate investigation and patches. I was actually a bit concerned when I read #53, since that would point to an unanticipated side-effect of the patch, so am somewhat relieved by #54, even though I'm sorry that due to non cachetags related deadlocks, you're not getting much benefit from this patch.

Grayle’s picture

Update for #47

Our use case was different to start with, but this patch seems to have fixed it nevertheless. I'll be honest, I've got no clue why. I didn't see any changes to what I suspect was the culprit.

Our issue was duplicate key entry exceptions for the cachetags table when separate requests (REST) invalidated the same cache tags at the same time. As far as I can tell, the locking system works, but once request A has inserted the new cachetag, request B's logic to determine whether to use an update or an insert was working off stale data and opted for insert resulting in the Duplicate Key exception.

But, like I said, this patch fixed it. I haven't seen that exception anymore since implementing the patch.

effulgentsia’s picture

@Grayle: thanks, that's fascinating. What database is that on? MySQL, PostgreSQL, or something else?

Grayle’s picture

@effulgentsia That's on MySQL. Like I said, I'm not sure that was the cause, but it seemed like the only explanation.

Anonymous’s picture

Assumption about fail in the #36:

QuickStartTest use SQLite. SQLite has problem with support transactions:

But I tried use #36 + #1120020-59 - it is not help yet :(

Only another error message:

- In Connection.php line 1163:
- There is already an active transaction 
+ In PDOConnection.php line 19:
+ SQLSTATE[HY000]: General error: 1 cannot start a transaction within a transaction
dtv_rb’s picture

Hello,

Is this patch supposed to also solve this issue: https://www.drupal.org/project/drupal/issues/1679344?

Because of this we are forced to use the https://www.drupal.org/project/cache_consistent module (dev version, nearly 2 years old and mostly unmaintained) on our D8 website.

Unfortunately we experience some weird side effects with this module (cache_consistent). E.g. content entity types are not getting properly installed when enabling a module that provides new content entity types.

But without cache_consistent we regularly run into race conditions due to our high traffic while using https://www.drupal.org/project/redis as cache backend.

Wim Leers’s picture

#56: You say you don't understand it, but your explanation shows that you do understand it :)

Our issue was duplicate key entry exceptions for the cachetags table when separate requests (REST) invalidated the same cache tags at the same time. As far as I can tell, the locking system works, but once request A has inserted the new cachetag, request B's logic to determine whether to use an update or an insert was working off stale data and opted for insert resulting in the Duplicate Key exception.

This analysis makes sense!

#59: but then why did that test pass before? Retest queued.
EDIT: Oh … because that test is new. It's pretty unfortunate that that test got added precisely at this time. Ah well :( That means

#60: See #32, where @Fabianx wrote:

The code and approach is fine. I would however rename that to INVALID_CHECKSUM_WHILE_IN_TRANSACTION, because as a side effect this also solves the cache coherency problem for all other cache backends (including memcache and redis).

anavarre’s picture

Issue tags: +Needs reroll
Wim Leers’s picture

Issue tags: -Needs reroll

No need for a reroll, this still applies cleanly to 8.6 HEAD. The only reason this patch is failing is the SQLite fail (see above) in the recently introduced SQLite-only test.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

fgm’s picture

Just for the record, since MongoDB will be getting a cache component in 8.x-2.1, I'd be glad to be warned if anything changes as a consequence of that issue so I can apply the change there too.

claudiu.cristea’s picture

@Wim Leers,

I can see the SQLite failure on our CI:

PDOException: SQLSTATE[HY000]: General error: 1 cannot start a transaction within a transaction

web/core/lib/Drupal/Core/Database/Driver/sqlite/PDOConnection.php:19
web/core/lib/Drupal/Core/Database/Connection.php:1163
web/core/lib/Drupal/Core/Database/Transaction.php:63
web/core/lib/Drupal/Core/Database/Connection.php:1079
web/core/lib/Drupal/Core/Database/Query/Insert.php:84
web/core/lib/Drupal/Core/Database/Driver/sqlite/Insert.php:21
web/core/lib/Drupal/Core/Database/Query/Merge.php:371
web/core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php:98
web/core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php:59
web/core/lib/Drupal/Core/Database/Connection.php:1242
web/core/lib/Drupal/Core/Database/Connection.php:1228
web/core/lib/Drupal/Core/Database/Connection.php:1197
web/core/lib/Drupal/Core/Database/Transaction.php:69
web/core/lib/Drupal/Core/EventSubscriber/MenuRouterRebuildSubscriber.php:70
web/core/lib/Drupal/Core/EventSubscriber/MenuRouterRebuildSubscriber.php:49
web/core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php:111
web/core/lib/Drupal/Core/Routing/RouteBuilder.php:192
web/core/lib/Drupal/Core/ProxyClass/Routing/RouteBuilder.php:83
web/core/lib/Drupal/Core/Extension/ModuleInstaller.php:317
web/core/lib/Drupal/Core/ProxyClass/Extension/ModuleInstaller.php:83
web/core/includes/install.inc:635
web/core/includes/install.core.inc:1044
web/core/includes/install.core.inc:671
web/core/includes/install.core.inc:549
web/core/includes/install.core.inc:117
web/core/lib/Drupal/Core/Test/FunctionalTestSetupTrait.php:309
web/core/tests/Drupal/Tests/BrowserTestBase.php:1051
web/core/tests/Drupal/Tests/BrowserTestBase.php:490
web/modules/custom/joinup_core/tests/src/Functional/JoinupRdfBrowserTestBase.php:27
claudiu.cristea’s picture

FileSize
7.33 KB

Unfortunately, the 3rd time I ran a test suite with this patch, on 8.5.6, I got again the deadlock error on cache invalidation. See the attached log.

claudiu.cristea’s picture

FileSize
27.25 KB

We are receiving the deadlock error with the cache tables very often on our CI infrastructure while testing with Behat. Our project is a big project with Behat tests running more than 1 hour, currently on Drupal 8.5.x. I tested the patch from #36, here are my findings:

Before patching

Without the patch the error occurs at the end of a Behat test, usually when cleaning the testing users, nodes, etc. See deadlock-no-patch.txt for the trace of such an exception.

How I tested

I applied the patch and I ran the Behat test suites 15 times (more or less, 15 hours of testing).

Results

From 15 tries, 11 went without any error. The others are marked as green by our CI but inspecting the error-log file, I found the exception but this time the exception is moved in the shutdown phase. See error-deadlock.txt. The Organic Groups module which we are using is registering a shutdown function to perform some cleanup. The initial exception has disappeared and we are receiving only the shutdown phase exception.

Fabianx’s picture

While reviewing the memcache patch for cache tags, I remembered this issue:

I do think the patch should be changed to do the cache tag invalidation after the transaction has been committed instead of last during it.

That will not only remove the deadlocks, but also allow both memcache and redis to make use of the functionality when using their native cache tags backend (which do now exist).

As the cache comes first and the database comes second, it should always be safe to invalidate the cache immediately after the DB (it is not safe the other way round). It just means that updated data is read a few microseconds later after the transaction is committed.

And the DatabaseCacheTagsChecksum could decide to open a new transaction - if it really wanted to invalidate all tags at once, but it should never be necessary as for any given set of tags, there could be someone that invalidates the same tags without a transaction.

Potentially - as the SQlite error seems to imply that it somehow opens a transaction within a transaction - the bug might go away as well just like that.

Berdir’s picture

I've wondered the same, I've also seen some weird race conditions with the redis backend when someone managed to read an invalidated node from the storage before the transaction was actually committed, I suspect some hook_node_update() implementation was doing something slow.

The problem could be that it's a behavior change as we'd move the cache invalidation from before the hooks to after.

Fabianx’s picture

Actually it is not a behaviour change especially when using READ_COMMITTED transaction isolation level (which is what people should be relying on and what core should support):

- Before:

Cache tag is invalidated during transaction, tagCache[] is reset

A cache_get() checks loads the data from the database, because we are in read-commited the old value is ready and the checksum matches, the cache is valid, tagCache[] now returns the data before the transaction was written.

[unless some other process invalidated it]

- After I:

A cache_get() checks loads the data from the cache, because it is valid, the checksum matches
[unless some other process invalidated it]

Cache tag is invalidated during transaction at the end, tagCache[] is reset, tagCache[] now has no data

- After 2:

- A cache_get() checks loads the data from the cache during the transaction, because it is valid, the checksum matches
[unless some other process invalidated it]

- The transaction is completed

- Cache tag is invalidated after transaction, tagCache[] is reset, tagCache[] now has no data

Feels like current core behaviour is the most buggy as a cache_get after the transaction reliably would return the old data as valid only in the process that had completed the transaction (due to the tagCache).

I am okay if we make this configurable and you can still opt-out via I-really-want-to-run-repeatable-read-and-dont-care-about-deadlocks=1.

But any code relying on REPEATABLE READ with Drupal has probably other problems and then just would not be able to run memcache / redis without cache_consistent anyway. And cache_consistent would just do the cache writes after the transaction anyway together with some other consistency guarantees during the request, which would not even interfere with our cache tag invalidation at the end.

Therefore it is safe to do so and would even fix a bug and it does not matter if we do it at the end of the successful transaction or immediately afterwards.

Some more background from the memcache issue: #2996615: Transaction support for cache (tags) invalidation (which references this one, so beware of endless loop)

Grayle’s picture

My DB-Fu isn't too great so these may all be dumb questions, but I thought we invalidated the tags in the same transaction so we could roll back the entire transaction if it fails. If we can't invalidate the cache, we shouldn't commit the new data because we'd still have the old data being served from cache.
But apparently that can already happen? So is it more or less likely that invalid data is stuck in cache and served if we invalidate tags in a new transaction, and what do we do if that transaction fails? Are we sacrificing consistency for concurrency?

Fabianx’s picture

> My DB-Fu isn't too great so these may all be dumb questions, but I thought we invalidated the tags in the same transaction so we could roll back the entire transaction if it fails

Actually, yes as a side effect of using the same database connection consistency is correct for cache invalidations. However consistency is not guaranteed for cache reads during the transaction.

This patch moves the cache tag invalidation to the end, so it is written together with the transaction. [and for memcache / redis you still have a problem if the transaction is rolled back]

However we can also move it after the DB transaction is committed - unless the cache tags have some dependency on needing to be cleared together, e.g. not just order but a true dependency - but that seems unlikely.

> If we can't invalidate the cache, we shouldn't commit the new data because we'd still have the old data being served from cache.
> But apparently that can already happen? So is it more or less likely that invalid data is stuck in cache and served if we invalidate tags in a new
> transaction, and what do we do if that transaction fails? Are we sacrificing consistency for concurrency?

I think for 99% of the cases we would not need a new transaction to invalidate the cache tags [after the transaction], but just replay back the sequence of invalidations we recorded before during the transaction. [which is also better to avoid deadlocks] And for obvious reasons we would not invalidate caches if the transaction is rolled back.

There is already very weak consistency in Drupal's data model that I feel this should not break anything and as bonus effect it would make memcache and redis work, too.

dpw’s picture

Just to describe my experience with patch #36 (Drupal 8.6.1). In a module, I had implemented hook_node_insert, and was running a lengthy process inside. While this was running, if I tried to save another node, I would get an error message about updating cachetags:

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction: UPDATE {cachetags} SET invalidations=invalidations + 1

I tried making changes to MariaDB, including, adding transaction-isolation = READ-COMMITTED in my.cnf, but this didn't resolve the issue.

After applying the patch in #36, this error went away, and I can update/save other nodes while my longish process runs (saving a new node). In the future, I may move the code out of hook_node_insert if I can find a way to queue up the process after the node is completely saved (i.e. some kind of job scheduler).

Thanks for the patch!

Fabianx’s picture

We are working on a Drupal 7 backport of this and found another problem:

- Rollbacks are not taken into account

We believe the right API is:

- addRootTransactionEndCallback()
- $rootTransactionEndCallback($committed = TRUE)

That would guarantee that the callback is correctly executed and de-registered and not kept around for an unrelated transaction after a rollback.

It might even fix the one test failure as it seems a DB exception happens during that test.

Berdir’s picture

I understand that this is still blocked on that test fail in the quick start test?

The fail can actually be reproduced locally, just install the patch and run for example:

php ./core/scripts/drupal -v install standard --site-name='Test site'

With the -v, you get to see the backtrace of the error:

Exception trace:
 PDO->beginTransaction() at /core/lib/Drupal/Core/Database/Connection.php:1172
 Drupal\Core\Database\Connection->pushTransaction() at /core/lib/Drupal/Core/Database/Transaction.php:63
 Drupal\Core\Database\Transaction->__construct() at /core/lib/Drupal/Core/Database/Connection.php:1088
 Drupal\Core\Database\Connection->startTransaction() at /core/lib/Drupal/Core/Database/Query/Insert.php:84
 Drupal\Core\Database\Query\Insert->execute() at /core/lib/Drupal/Core/Database/Driver/sqlite/Insert.php:21
 Drupal\Core\Database\Driver\sqlite\Insert->execute() at /core/lib/Drupal/Core/Database/Query/Merge.php:371
 Drupal\Core\Database\Query\Merge->execute() at /core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php:98
 Drupal\Core\Cache\DatabaseCacheTagsChecksum->doInvalidateTags() at /core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php:59
 Drupal\Core\Cache\DatabaseCacheTagsChecksum->preRootTransactionCommit() at n/a:n/a
 call_user_func() at /core/lib/Drupal/Core/Database/Connection.php:1251
 Drupal\Core\Database\Connection->doCommit() at /core/lib/Drupal/Core/Database/Connection.php:1237
 Drupal\Core\Database\Connection->popCommittableTransactions() at /core/lib/Drupal/Core/Database/Connection.php:1206
 Drupal\Core\Database\Connection->popTransaction() at /core/lib/Drupal/Core/Database/Transaction.php:69
 Drupal\Core\Database\Transaction->__destruct() at /core/lib/Drupal/Core/EventSubscriber/MenuRouterRebuildSubscriber.php:100
 Drupal\Core\EventSubscriber\MenuRouterRebuildSubscriber->menuLinksRebuild() at /core/lib/Drupal/Core/EventSubscriber/MenuRouterRebuildSubscriber.php:71
 Drupal\Core\EventSubscriber\MenuRouterRebuildSubscriber->onRouterRebuild() at n/a:n/a
 call_user_func() at /core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php:111
 Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch() at /core/lib/Drupal/Core/Routing/RouteBuilder.php:192
 Drupal\Core\Routing\RouteBuilder->rebuild() at /core/lib/Drupal/Core/ProxyClass/Routing/RouteBuilder.php:83
 Drupal\Core\ProxyClass\Routing\RouteBuilder->rebuild() at /core/lib/Drupal/Core/Extension/ModuleInstaller.php:322
 Drupal\Core\Extension\ModuleInstaller->install() at /core/lib/Drupal/Core/ProxyClass/Extension/ModuleInstaller.php:83
 Drupal\Core\ProxyClass\Extension\ModuleInstaller->install() at /core/includes/install.inc:652
 drupal_install_system() at /core/includes/install.core.inc:1106
 install_base_system() at /core/includes/install.core.inc:709
 install_run_task() at /core/includes/install.core.inc:584
 install_run_tasks() at /core/includes/install.core.inc:125
 install_drupal() at /core/lib/Drupal/Core/Command/InstallCommand.php:203
 Drupal\Core\Command\InstallCommand->install() at /core/lib/Drupal/Core/Command/InstallCommand.php:89
 Drupal\Core\Command\InstallCommand->execute() at /vendor/symfony/console/Command/Command.php:251
 Symfony\Component\Console\Command\Command->run() at /vendor/symfony/console/Application.php:946
 Symfony\Component\Console\Application->doRunCommand() at /vendor/symfony/console/Application.php:248
 Symfony\Component\Console\Application->doRun() at /vendor/symfony/console/Application.php:148
 Symfony\Component\Console\Application->run() at /core/scripts/drupal:26

My guess is that \Drupal\Core\Database\Connection::pushTransaction() gets confused about whether or not is still in a transaction. By starting a new transaction as part of finishing the last one, it seems to think that it is no longer in one and then starts one.

Might not be trivial to fix, but if we do indeed switch to running those callbacks *after* the transaction then that would have the benefit of solving this problem on its own I think?

I can confirm that the install command runs through just fine when I switch the order in \Drupal\Core\Database\Connection::doCommit().

We are seeing some issues with redis where saving an entity invalidates the cache but then the another request loads it again from the cache before the transaction is finished (for example, due to slow hook_entity_update() implementations). I understand that this issue would provide the mechanism to only send out the cache tag invalidations to redis once the transaction is done.

Fabianx’s picture

We are seeing some issues with redis where saving an entity invalidates the cache but then the another request loads it again from the cache before the transaction is finished (for example, due to slow hook_entity_update() implementations). I understand that this issue would provide the mechanism to only send out the cache tag invalidations to redis once the transaction is done.

That is correct. For databases it will avoid deadlocks and for redis it will lessen cache concurrency problems.

Wim Leers’s picture

I understand that this is still blocked on that test fail in the quick start test?

Correct.

My guess is that \Drupal\Core\Database\Connection::pushTransaction() gets confused about whether or not is still in a transaction.

If you have the bandwidth to dig in to this, that'd be great!

Berdir’s picture

Status: Needs work » Needs review
FileSize
18.73 KB
2.61 KB

Well, as I wrote, fixing the test fail is easy if we just execute the callbacks later, which means we also need to rename the method and variables.

I think that's what @Fabianx wrote but I'm not sure if this is everything.

I'm also removing the @interal which was proposed earlier. Per #76/#77, this is not just an core-internal thing but is also required for redis/memcache and other external cache systems to ensure consistent results that behaves like the database backend during transactions.

neclimdul’s picture

Status: Needs review » Needs work

Passing! Neat!

The meat of the patch looks great. Just some nits and documentation observations:

  1. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumInterface.php
    @@ -18,6 +18,21 @@
    +   * - TBD: non-database-based implementations of this service?!?
    

    Looks like rough early docs that got left around. Seems like that should be removed or updated before committing.

  2. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -154,6 +154,13 @@
    +   * Pre-root (non-nested) transaction commit callbacks.
    

    Post-root?

  3. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1199,6 +1206,18 @@ public function popTransaction($name) {
    +  /**
    +   * Registers post-root transaction commit callback.
    +   *
    +   * @param callable $callback
    +   *   The callback to invoke.
    +   *
    +   * @see \Drupal\Core\Database\Connection::doCommit()
    +   */
    +  public function registerPostRootTransactionCommitCallback(callable $callback) {
    +    $this->postRootTransactionCommitCallbacks[] = $callback;
    +  }
    +
    

    What's a root transaction and why is this API important/useful? This is mostly from my initial read of the patch because I was really clueless. Going back and reading the discussion and the summary a couple of times it started to make sense in how it was solving this problem but it feels like the docblock could be filled out a little to explain the API to developers considering using it instead of parroting the method name.

  4. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1223,6 +1240,21 @@ protected function popCommittableTransactions() {
    +   * Internal function: actual commit, invoke pre-commit callbacks.
    

    Just @internal? This docblock is odd.

  5. +++ b/core/modules/system/tests/modules/delay_cache_tags_invalidation/delay_cache_tags_invalidation.module
    @@ -0,0 +1,38 @@
    +}
    \ No newline at end of file
    

    newline.

neclimdul’s picture

One other quick observation, it seems these changes removed #1120020: SQLite database locking errors cause fatal errors. If anyone has an insight as to why this fixed it and what about this patch was forcing the error it might be helpful to pass that along to help provide the test case to get its tests written. :-D

Fabianx’s picture

In the D7 patch we also give the caller the opportunity to re-act on an aborted or failed transaction and I think that is important:

#3004437: [D7] Add ability to register rootEndTransaction callbacks during transactions

Also naming (as usual):

Here we use:

registerPostRootTransactionCommitCallback

In D7-POC we use:

addRootTransactionEndCallback

I am not sure what is better, but the D7 naming feels less complex.

Berdir’s picture

+++ b/core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php
@@ -43,10 +50,39 @@ public function __construct(Connection $connection) {
    */
   public function invalidateTags(array $tags) {
+    $in_transaction = $this->connection->inTransaction();
+    if ($in_transaction) {
+      if (empty($this->delayedTags)) {
+        $this->connection->registerPostRootTransactionCommitCallback([$this, 'preRootTransactionCommit']);
+      }
+      $this->delayedTags = Cache::mergeTags($this->delayedTags, $tags);
+    }
+    else {
+      $this->doInvalidateTags($tags);
+    }
+  }
+

Looking at this from a redis-maintainer perspective, I have two thoughts that are somewhat contradicting.

a) Wondering if a trait or base class for a cache tag checksum implementation would make sense.

That trait/base class could offer invalidateTags(), preRootTransactionCommit(), isValid(), and getCurrentChecksum(), the last two already exist exactly like this in redis and for the first I would just need to rename my existing invalidateTags()

b) The problem with that and the current implementation generically is that it needs the database service. Having to inject that into my backend is especially unfortunate because making it a required argument would also break any site that is using a hardcoded $settings['bootstrap_container_definition'] definition to use redis for the container cache bin.

So I definitely don't want to inject that, also because instantiating that service already opens the connection to the database.

If we make it a trait, then there could be a getDatabaseConnection() method, I guess redis would just call the container then. And decoupling it from the connection is probably tricky, it could be a standalone service that the container calls, but then we need to statically call out to that there.

pounard’s picture

It's always chicken an egg, the real problem is that the container is cached using a service it defines. Symfony dumped container was all about solving this kind of problem, this is one of the worst Drupal sadness.

Berdir’s picture

Status: Needs work » Needs review
FileSize
21.49 KB
14.78 KB

@pounard: I don't understand what you mean, the container cache bin is *not* in the container, it's a special hardcoded container that can be overridden. My point is that exactly *because* that exists, changing constructor arguments for my service would break that. My problem isn't chicken/egg, the database doesn't depend on redis. If anything, it is a chicken/chicken problem, aka two different chickens that should not be kept together because they will start a fight ;)

> Just @internal? This docblock is odd.

Yeah, it was copied from the method above, updated both to use @internal. Other feedback addressed from #80.

> In the D7 patch we also give the caller the opportunity to re-act on an aborted or failed transaction and I think that is important:

I'm not so sure. My thinking is that we should instead unset the registered callbacks, because if the transaction is aborted, then the cache invalidation or whatever it is that was delayed also should not happen? That's what happens now if it happens inside the transaction.

> addRootTransactionEndCallback

Yeah, That seems nicer, switched.

Other changes:
* Extracted a trait, already prepared a patch for redis based on that, it's trivial to implement this stuff then: #3018203: Support delayed cache tag invalidation.
* As mentioned, unsetting the callbacks in case of a rollback.
* Removed an extra "Test" from EndOfTransactionQueriesTestTest
* Changed the asserts from being a stdclass to a assertNotEmpty(), less weird to update when we (finally) manage to return a typed object there IMHO.

Wim Leers’s picture

This all looks great IMHO. Mostly some incomplete renaming problems.

  1. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumInterface.php
    @@ -29,7 +29,9 @@
    -   * - TBD: non-database-based implementations of this service?!?
    +   * - Non-database-based implementations of this service can delay tag
    +   *   invalidations until the transaction is committed to avoid
    +   *   race conditions.
    

    👍

  2. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,4 +9,92 @@
     trait CacheTagsChecksumTrait {
    

    +1 for this trait, and for already having a patch for at least 1 contrib module (at #3018203: Support delayed cache tag invalidation).

  3. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,4 +9,92 @@
    +   * The set of delayed tag invalidations.
    

    Nit: The set of cache tags whose invalidation is delayed.

    (I wrote this; my bad!)

  4. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,4 +9,92 @@
    +   * Invalidates all delayed tag invalidations.
    

    Nit: s/Invalidates/Executes/

    (Right now it reads as if we're invalidating invalidations.)

    (I wrote this; my bad!)

  5. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,4 +9,92 @@
    +   * Returns the database connection.
    

    +1 for this method on the trait, the explanation in #3018203-2: Support delayed cache tag invalidation makes perfect sense.

  6. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,4 +9,92 @@
    +  protected abstract function getDatabaseConnection();
    ...
    +  protected abstract function doInvalidateTags(array $tags);
    

    I don't think I ever realized traits could define abstract methods. Cool! 😎

  7. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,4 +9,92 @@
    +   *   The list of tags for which to invalidate cache items.
    

    s/list/set/

    (I wrote this; my bad!)

  8. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -155,11 +155,11 @@
    +  protected $postRootTransactionEndCallbacks = [];
    

    "post" and "end". Let's pick one of them.

  9. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1207,19 +1212,31 @@ public function popTransaction($name) {
    +  public function addRootTransactionEndCallback(callable $callback) {
    

    This gets it right.

  10. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1207,19 +1212,31 @@ public function popTransaction($name) {
    -   * Internal function: commit all the transaction layers that can commit.
    +   * Commit all the transaction layers that can commit.
    +   *
    +   * @internal
    
    @@ -1241,17 +1258,19 @@ protected function popCommittableTransactions() {
    -   * Internal function: actual commit, invoke pre-commit callbacks.
    +   * Do the actual commit, invoke pre-commit callbacks.
    +   *
    +   * @internal
    

    👌

pounard’s picture

@pounard: I don't understand what you mean, the container cache bin is *not* in the container, it's a special hardcoded container that can be overridden. My point is that exactly *because* that exists, changing constructor arguments for my service would break that. My problem isn't chicken/egg, the database doesn't depend on redis. If anything, it is a chicken/chicken problem, aka two different chickens that should not be kept together because they will start a fight ;)

Yes, you lost me exactly there :) It was only a tiny cute rant against exactly that: it's too complicated ! Please don't pay attention, I just like to highlight the excessive complexity.

Berdir’s picture

Requested documentation and naming improvements.

The combination of this patch and the redis patch has been deployed to at last one production instance, where it seems to be working fine so far. However, our behat tests are acting up, somehow invalidation is sometimes properly working, still investigating why.

I guess what we need now is feedback from Fabianx on my feedback/answers in #85 as well as the patch.

smccabe’s picture

Tested this with a lots of high volume Drupal Commerce load testing and it resolved issues that come up during the heavy order processing transactions. I ran through ~100,000 orders with no issues. This was initially in relation to #2974874: Purge can create deadlocks in database, because it causes transactions to take longer to execute

I won't set this to RTBC because it looks like there are some naming and documentation changes, but functionality wise, all good by me.

bdragon’s picture

Interim update from the D7 side (Fabian should be elaborating on this shortly):

OK, we can live with clearing instead of calling the callbacks on a rollback. Agreed that it doesn't really fit in as-is.

My argument for including it originally was as follows:

In a circumstance where we are tracking *non-database* state alongside the transaction, such as the list of tags we are going to invalidate in an external system as soon as the transaction finishes, being called for the rollback of a transaction is useful because we can decide to NOT do this invalidation after all because the change was rolled back and never happened.

Not having this is not the end of the world in this case, because all it means is that we'll invalidate some extra stuff that didn't strictly need invalidated, but only if there was another transaction later in the request that also caused our code to register a callback again.

So yeah, the whole $commit = FALSE thing is more about making sure that internal state can follow the database transactional state properly.

Another suggestion: addRootTransactionEndCallback() should throw DatabaseTransactionNoActiveException if it is called while a transaction is not active. Since allowing it to be called outside a transaction would make it affect the next transaction to run, it would allow for subtle errors with no real benefit. I would definitely consider it a coding error to be calling it when not inside a transaction.

And here's some feedback on the patch itself:

+  /**
+   * Do the actual commit, invoke pre-commit callbacks.
+   *
+   * @internal
+   */
+  protected function doCommit() {

Docblock should say *post*-commit callbacks.

+    if (!$this->connection->commit()) {
+      throw new TransactionCommitFailedException();
+    }
+    if (!empty($this->rootTransactionEndCallbacks)) {
+      foreach($this->rootTransactionEndCallbacks as $callback) {
+        call_user_func($callback);
+      }
+      $this->rootTransactionEndCallbacks = [];
+    }
+  }

It is very important that rootTransactionEndCallbacks is reset BEFORE calling the callbacks, because if one of the callbacks needs to do its own (self-contained) transaction, the callbacks will rerun recursively!

Suggest something more like:

+    if (!$this->connection->commit()) {
+      throw new TransactionCommitFailedException();
+    }
+    if (!empty($this->rootTransactionEndCallbacks)) {
+      $callbacks = $this->rootTransactionEndCallbacks;
+      $this->rootTransactionEndCallbacks = [];
+      foreach ($callbacks as $callback) {
+        call_user_func($callback);
+      }
+    }
+  }
Fabianx’s picture

Status: Needs review » Needs work

This looks great overall! I am glad we could sync D7 and D8 together, some points from our review / comparison of D7 and D8 patches today:

  1. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1131,6 +1138,11 @@ public function rollBack($savepoint_name = 'drupal_transaction') {
    +
    +    // Unset any registered callbacks, they should not be called if the
    +    // transaction itself was rolled back.
    +    $this->rootTransactionEndCallbacks = [];
    +
    

    Per #90 we should really think about calling the rootTransactionEndCallbacks with a parameter of 'FALSE'.

    For the cache tags this means, that we can reset the cache tags.

    Otherwise we would not know that and would invalidate the tags from an earlier aborted transaction.

    While for this use-case that is not a big deal, it could make problems with a different use-case in the future and as we _fix_ the callback API here it's important to think this through.

    Fortunately FALSE can be always supported as NULL vs. TRUE vs FALSE is unique.

  2. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1223,6 +1257,23 @@ protected function popCommittableTransactions() {
    +    if (!empty($this->rootTransactionEndCallbacks)) {
    +      foreach($this->rootTransactionEndCallbacks as $callback) {
    +        call_user_func($callback);
    +      }
    +      $this->rootTransactionEndCallbacks = [];
    

    As Brandon already said, the rootTransactionEndCallbacks need to be removed before this is called to avoid endless recursion.

  3. +++ b/core/lib/Drupal/Core/Database/Driver/mysql/Connection.php
    @@ -660,7 +658,7 @@ protected function popCommittableTransactions() {
    -            $this->connection->commit();
    +            $this->doCommit();
    

    Is it a problem that this can now throw an Exception, where it did not throw an Exception before?

  4. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1200,7 +1212,31 @@ public function popTransaction($name) {
    +  public function addRootTransactionEndCallback(callable $callback) {
    +    $this->rootTransactionEndCallbacks[] = $callback;
    +  }
    

    It might be a good idea to throw an Exception here if this is called while we are not in a transaction.

    The problem (as explained by Brandon) is that if we do not do that, you could register a callback before a transaction is started.

    In a way it could be seen as a "feature", but it feels wrong semantically.

anavarre’s picture

Issue tags: +MySQL deadlock
Berdir’s picture

Status: Needs work » Needs review
FileSize
21.95 KB
2.62 KB

Getting back to this.

* I added the $success argument and refactored the doCommit() method so we ensure there is no loop.
* Fixed some docs.
* #91.3 not sure, but I think if that throws an exception then things are really bad, I understand the whole chunk there only exists to suppress one specific exception.
* Added an exception when trying to add callbacks and there's no active transaction.

sdelbosc’s picture

Quick feedback on patch from #88
We used it on a project which was experimenting at least 10 occurrences of "General error: 1205 Lock wait timeout exceeded; try restarting transaction: UPDATE {cachetags} SET invalidations=invalidations + 1 WHERE tag = :db_condition_placeholder_0;" exceptions per day. They were happening from a custom rest endpoint which is called at least 30,000 times per day and intends to update some Drupal entities.
Since the application of that patch we don't see any of these deadlocks.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Gogowitsch’s picture

A thank you to everyone for your work on this issue!

I have sad news: I applied the patch from #93 to my CI setup and ran the bash script from the related issue #2833539. I still get tons of these errors "1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {node_field_data}".

The patch is likely helpful for other situations and underlying causes, but not for my setup: Ubuntu 18.04.1, Drupal 8.6.10 running within Docker via Lando v3.0.0-rc.13, MySQL 5.7.25 within Docker, all with default settings.

Berdir’s picture

Unfortunate that this doesn't help you, but considering the table where it happens for you, that makes sense. This can only fix deadlocks related to cache tables.

Might make sense to create a dedicated issue for that and provide as much information. Specifically what you are doing there exactly that causes that error, backtrace of that error, mysql configuration and so on.

bryanmanalo’s picture

This helped us a lot -- patch#93. We have a website with tons of translations. This makes concurrent editing faster.

Berdir’s picture

> The combination of this patch and the redis patch has been deployed to at last one production instance, where it seems to be working fine so far. However, our behat tests are acting up, somehow invalidation is sometimes properly working, still investigating why.

Since then, we deployed this to several other large newspaper sites (all in combination with redis, so we're not actually testing the mysql fix here, but using this API to fix a similar problem when using redis for caching). Still, it has been stable for us and we didn't encounter any issues.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Let’s get this in. While I can only speak for the D7 backport, that one has been working great.

Patch looks good to me now.

Please give me and bdragon credit for this work.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -0,0 +1,105 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function invalidateTags(array $tags) {
    ...
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function getCurrentChecksum(array $tags) {
    ...
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function isValid($checksum, array $tags) {
    

    These are not inheriting any docs. We not really come to a good solution for the trait implementing an interfaces docs thing yet but this solution doesn't really work.

  2. +++ b/core/lib/Drupal/Core/Database/Connection.php
    @@ -1223,6 +1263,26 @@ protected function popCommittableTransactions() {
    +  /**
    +   * Do the actual commit, invoke post-commit callbacks.
    +   *
    +   * @internal
    +   */
    +  protected function doCommit() {
    +    $success = $this->connection->commit();
    +    if (!empty($this->rootTransactionEndCallbacks)) {
    +      $callbacks = $this->rootTransactionEndCallbacks;
    +      $this->rootTransactionEndCallbacks = [];
    +      foreach($callbacks as $callback) {
    +        call_user_func($callback, $success);
    +      }
    +    }
    +
    +    if (!$success) {
    +      throw new TransactionCommitFailedException();
    +    }
    +  }
    

    This method makes me think quite hard. What does it mean if $success = $this->connection->commit(); returns TRUE but one of the callbacks changes it to FALSE and the TransactionCommitFailedException is thrown. The commit has succeeded. Also it feels odd that the callbacks can change $success.

    I think maybe we should only throw TransactionCommitFailedException if the commit fails. And throw a different exception if one of the rootTransactionEndCallbacks fails.

    The other that gives me pause is that we continue to call callbacks even when $success if FALSE.

    The other thing is that we're in a loop here and we call the rootTransactionEndCallbacks on the first layer processed by \Drupal\Core\Database\Connection::popCommittableTransactions() - and I don't think this is the root transaction at all - it's the opposite of root :)

Berdir’s picture

1. Looks like Implements ... is a fairly common pattern for traits, for example \Drupal\Core\Form\FormStateValuesTrait.

2. Several things here..

> What does it mean if $success = $this->connection->commit(); returns TRUE but one of the callbacks changes it to FALSE

As discussed, $success is by value and it's not possible to change it, so we're good in that regard.

> The other that gives me pause is that we continue to call callbacks even when $success if FALSE.

That's why we added the success parameter, that's makes sense and is tested. If the transaction fails, we don't invalidate the cache tags but we reset the list so we don't invalidate them for no reason in a later transaction.

What I did realize though is an explicit rollback() should also call it with FALSE, as a result, we can then actually test this. Also improved coding standards in the tests quite a bit and made them work for sqlite and postgresql too.

> The other thing is that we're in a loop here and we call the rootTransactionEndCallbacks on the first layer processed by \Drupal\Core\Database\Connection::popCommittableTransactions() - and I don't think this is the root transaction at all - it's the opposite of root :)

Hm, I don't get this part. We call doCommit() only on the final root transaction as we go through them reversed and if there are none left we call it, otherwise we just release the checkpoint. That's not changed, we just put that part into a separate method so we can call the callbacks too as it is called more than once.

Berdir’s picture

Status: Needs review » Needs work

Obviously the postgresql failed, should have tested that.

Berdir’s picture

Status: Needs work » Needs review
FileSize
26.79 KB
2.09 KB

Fixing the pgsql test, was a stupid mistake but pgsql sends statement objects through query() which is tricky, so I'm skipping those now.

Berdir’s picture

Berdir’s picture

jibran’s picture

Status: Needs review » Reviewed & tested by the community

#101 is addressed so setting it back to RTBC.

catch’s picture

This is a really nice approach to the problem - both moving the invalidations to the end of the transaction and also the invalid checksum logic. I think we need a change record for the new trait etc. so holding off commit for that.

dww’s picture

I created a stub CR:

https://www.drupal.org/node/3071740

Could use a little help fleshing out why anyone other than the core cache / DB subsystem maintainers would care about this. ;)

Cheers,
-Derek

catch’s picture

Status: Reviewed & tested by the community » Needs work

DrupalCS is complaining about a lot of code style violations. I started to change these locally to fix on commit, but then ran into this typo on LoggedStatementsTrait which is more than I really want to do on commit.

  1. +++ b/core/modules/system/tests/modules/database_statement_monitoring_test/src/pgsql/Connection.php
    @@ -0,0 +1,13 @@
    +use Drupal\database_statement_monitoring_test\LoggedStatemementsTrait;
    

    LoggedStatemementsTrait

  2. +++ b/core/modules/system/tests/modules/delay_cache_tags_invalidation/delay_cache_tags_invalidation.info.yml
    --- /dev/null
    +++ b/core/modules/system/tests/modules/delay_cache_tags_invalidation/delay_cache_tags_invalidation.module
    
    +++ b/core/modules/system/tests/modules/delay_cache_tags_invalidation/delay_cache_tags_invalidation.module
    +++ b/core/modules/system/tests/modules/delay_cache_tags_invalidation/delay_cache_tags_invalidation.module
    @@ -0,0 +1,42 @@
    
    @@ -0,0 +1,42 @@
    +<?php
    +
    

    Missing an @file

Added a sentence to the change record explaining the relevance to contrib.

Berdir’s picture

Status: Needs work » Needs review
Issue tags: -Needs change record updates
FileSize
26.86 KB
6.99 KB

Uh, what a mess, sorry about that. I wish those coding standard issues were more visible :)

I also updated the change record, it is more than just a performance improvement for contrib cache backends, we've seen pretty frequent and problematic cache invalidation race conditions on some of our larger sites with Redis. And this fixes it.

Status: Needs review » Needs work

The last submitted patch, 111: 2966607-111.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

Status: Needs work » Needs review
FileSize
26.88 KB
1.88 KB

Ok, one test fail was a stupid mistake, not sure how that happened.

The other is new and also happens with the previous patch. Not quite sure what changed, but creating the entity loads the entity field definitions and that checks the cache tags. And that is outside of the transaction and is correctly not pushed to the end.

Berdir’s picture

Fixing the last code style issue, missed that one.

dww’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
9.84 KB

Here's the interdiff between #105 (last RTBC) and #114. All looks good to me. CR is done. Tests are all green, and there are no code style warnings about https://www.drupal.org/pift-ci-job/1368749.

Back to RTBC.

Thanks, all!
-Derek

p.s. I haven't closely reviewed the entire patch, just this interdiff. I'm relying on the reviews from @Fabianx, @alexpott and others on the bulk of the work.

Wim Leers’s picture

I also updated the change record, it is more than just a performance improvement for contrib cache backends, we've seen pretty frequent and problematic cache invalidation race conditions on some of our larger sites with Redis.

  • Same for Acquia customers, see #42 and #94.
  • Same for large Drupal Commerce installs, see #89.
  • Same for Dropsolid customers, see #47

This is primarily a scalability improvement, because the original naïve implementation (in HEAD) is fine for low-traffic and/or fast-transaction use cases.

(I know many people on this issue already know this, but it's worth reiterating after we've been down in the weeds for so long, especially just before commit.)

mpp’s picture

Off topic answer to @berdir's comment in #111:

Uh, what a mess, sorry about that. I wish those coding standard issues were more visible :)

Have a look at codeclimate, it can highlight (and fix) code sniffer (and many other) issues in your git repo (see example: https://github.com/MPParsley/og/pull/29/files). Wish we had this for Drupal on Gitlab...

mondrake’s picture

Status: Reviewed & tested by the community » Needs work

With this patch, some tests are failing when run on a non-core database driver:

1) Drupal\KernelTests\Core\Cache\EndOfTransactionQueriesTest::testEntitySave
Error: Class '\Drupal\database_statement_monitoring_test\dbal\Connection' not found
/home/travis/drupal8/core/lib/Drupal/Core/Database/Database.php:371
/home/travis/drupal8/core/lib/Drupal/Core/Database/Database.php:166
/home/travis/drupal8/core/lib/Drupal/Core/Config/BootstrapConfigStorageFactory.php:40
/home/travis/drupal8/core/lib/Drupal/Core/Config/BootstrapConfigStorageFactory.php:31
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:1408
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:1617
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:1252
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:910
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:476
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:745
/home/travis/drupal8/core/tests/Drupal/KernelTests/KernelTestBase.php:344
/home/travis/drupal8/core/tests/Drupal/KernelTests/KernelTestBase.php:235
/home/travis/drupal8/core/tests/Drupal/KernelTests/Core/Cache/EndOfTransactionQueriesTest.php:34

2) Drupal\KernelTests\Core\Cache\EndOfTransactionQueriesTest::testEntitySaveRollback
Error: Class '\Drupal\database_statement_monitoring_test\dbal\Connection' not found
/home/travis/drupal8/core/lib/Drupal/Core/Database/Database.php:371
/home/travis/drupal8/core/lib/Drupal/Core/Database/Database.php:166
/home/travis/drupal8/core/lib/Drupal/Core/Config/BootstrapConfigStorageFactory.php:40
/home/travis/drupal8/core/lib/Drupal/Core/Config/BootstrapConfigStorageFactory.php:31
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:1408
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:1617
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:1252
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:910
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:476
/home/travis/drupal8/core/lib/Drupal/Core/DrupalKernel.php:745
/home/travis/drupal8/core/tests/Drupal/KernelTests/KernelTestBase.php:344
/home/travis/drupal8/core/tests/Drupal/KernelTests/KernelTestBase.php:235
/home/travis/drupal8/core/tests/Drupal/KernelTests/Core/Cache/EndOfTransactionQueriesTest.php:34

3) Drupal\Tests\page_cache\Functional\PageCacheTagsIntegrationTest::testPageCacheTags
sort() expects parameter 1 to be array, null given
/home/travis/drupal8/core/modules/system/tests/src/Functional/Cache/AssertPageCacheContextsAndTagsTrait.php:95
/home/travis/drupal8/core/modules/page_cache/tests/src/Functional/PageCacheTagsIntegrationTest.php:121

Is it possible to skip the tests if there is no Drupal\database_statement_monitoring_test\{driver} class defined?

Wim Leers’s picture

@mondrake: Excellent find! 👍

Agreed, and done :) Apparently I introduced this pattern in #10.

I think it's important that third-party database drivers can also run this test. But I do see a way to do that, and the "skip test" message explains how. I've also updated the change record to document this: https://www.drupal.org/node/3071740/revisions/view/11516586/11522393.

mondrake’s picture

Status: Needs review » Reviewed & tested by the community

Thank you @Wim Leers!

Back to RTBC.

I feel like some testing for the changes in the DB API is missing - there is coverage in the Cache testing, but nothing that tests the root end transaction callbacks in isolation in a @group Database test. But that can be a followup?

dww’s picture

Issue tags: +Needs followup

+1 to #120 happening as a follow-up, not holding this up. Tagging as such. @mondrake: would you be willing to create said issue as a child and remove the tag again?

Thanks,
-Derek

mondrake’s picture

larowlan’s picture

First look at this patch, thanks all it looks very thorough and a lot of work has gone into it.

+++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
@@ -0,0 +1,105 @@
+      unset($this->invalidatedTags[$tag]);

invalidatedTags doesn't exist on this trait? should we be hoisting it up so that this is a drop-in solution?

There are some postgres fails on the test above that don't appear to be in HEAD, but also don't appear to be related

catch’s picture

Status: Reviewed & tested by the community » Needs work

Agreed with @larowlan in #123, just looks like an oversight to me?

Berdir’s picture

Nice catch. Pondered on whether I should move that line down into the database implementation, but then we'd need to that awkward dance of using a trait method as another name and call that.

So I went forward and moved both the tagCache and invalidatedTags properties and corresponding logic into the trait. Instead of calculateChecksum(), I've now an abstract method getTagInvalidations() for just the storage specific part.

I think the approach makes sense, redis will be able to use that nicely, the part that I'm not 100% sure is how I'm now doing the invalidatedTags check in invalidateTags(), before the delayed check. But I think it's fine. Also, I'm sure naming/docs can be improved.

Wim Leers’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -9,6 +9,15 @@
    +   * @var array
    
    @@ -86,6 +116,52 @@ public function isValid($checksum, array $tags) {
    +   * @param array $tags
    

    Übernit: these could be string[]. Can be fixed on commit.

  2. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -16,6 +25,13 @@ trait CacheTagsChecksumTrait {
    +   * @var array
    

    Übernit: could be int[]. Can be fixed on commit.

  3. +++ b/core/lib/Drupal/Core/Cache/CacheTagsChecksumTrait.php
    @@ -86,6 +116,52 @@ public function isValid($checksum, array $tags) {
    +  /**
    +   * Fetches invalidation counts for cache tags.
    ...
    +  abstract protected function getTagInvalidations(array $tags);
    

    🤔 Would getTagInvalidationCounts() be slightly clearer?

Berdir’s picture

Thanks for the review. some of the code snippets are missing any any context, so I wasn't 100% sure which ones you meant exactly. I updated some, but $invalidatedTags for example is a bit special, would be boo[],

3. Yeah, that makes more sense with the descriptions.. added the method first and then added description for it.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

jibran’s picture

Status: Needs review » Reviewed & tested by the community

#123 is addressed so back to RTBC.

Wim Leers’s picture

Agreed with RTBC because the #123 nits are fixed, I've given detailed reasons backed up by real-world data in #116 and the problem surfaced by @mondrake in #118 has been addressed.

🚢🙏

catch’s picture

Version: 8.9.x-dev » 8.8.x-dev
Status: Reviewed & tested by the community » Fixed

Committed dbca960 and pushed to 9.0.x, then cherry-picked to 8.9.x and 8.8.x. Thanks!

  • catch committed 46a6dbe on 9.0.x
    Issue #2966607 by Wim Leers, Berdir, claudiu.cristea, dww, Fabianx,...
catch’s picture

  • catch committed abf0b95 on 8.9.x
    Issue #2966607 by Wim Leers, Berdir, claudiu.cristea, dww, Fabianx,...

  • catch committed f3703ff on 8.8.x
    Issue #2966607 by Wim Leers, Berdir, claudiu.cristea, dww, Fabianx,...
Wim Leers’s picture

m4olivei’s picture

This is wonderful! Just wanted to note that we deployed the patch in #127 to 8.7.7 in production last week together with the corresponding patch to memcache in #2996615: Transaction support for cache (tags) invalidation and we've seen our issue dissapear. In our case editors were running against an issue best described by #1679344: Race condition in node_save() when not using DB for cache_field.

Wim Leers’s picture

Thanks for informing all (97! 😲) followers of this issue of that, @m4olivei! 👍

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.