Problem/Motivation

menu_rebuild() / Router::rebuild() has additional race conditions which aren't accounted for by the lock.

1. Processes that fail to acquire a lock, have the menu_rebuild_needed variable / state and no menu masks in memory. If they subsequently call menu_get_item() / any function on the router that checks for rebuild, that will trigger yet another menu_rebuild() / Router::rebuild().

2. Additionally, there's a window between variable_initialize() / state() retrieval and menu_get_item() / Router::[public] where another process may have fully completed a menu_rebuild() / Router::rebuild() and freed the lock. In this case we don't need to rebuild the menu, but we do anyway.

#1 is a serious race condition - it can cause menu_rebuild() / Router::rebuild() to occur over and over again - as one process acquires the lock, other processes lock_wait() / Lock::wait(), then rebuild, then acquire the lock, then more process lock_wait() / Lock::wait() etc. - proper lock stampede.

#2 is more of an optimization to make the current stampede protection more robust.

Both problems exist in 8.x too despite the code having changed a lot. They have been shown in above code with the / for D8 as the alternative implementation.
Its really just the naming that has changed.

Proposed resolution

Move the logic away from a lazy-rebuild model into a explicit rebuild model, which means, there is no global (across requests) flag, to trigger a rebuild.
Instead the router is marked as to be rebuild only in the current process. At the end of the request ($kernel->terminate()) the rebuild is triggered.

That way we have an important performance improvement (or at least avoiding regressions) with this patch:

This leads to the following additional results:

Remaining tasks

(done #203) Agree that the performance improvements are worth it.

(todo) #211 @alexpott "Looks like the following CR will need to be revised https://www.drupal.org/node/2185947 - I think once this lands it should just point to the new CR..."

User interface changes

none.

API changes

RouteBuilderIndicator + Interface + service is removed.

Original report

Everytime menu_rebuild() is called, we get a crazy amount of waits on INSERT INTO menu_router and a bunch of DELETE FROM menu_router. I believe there is some sort of race condition for rebuilding menus. The only way I can kill it is to kill off a bunch of processes. I would imagine that setting a lock for the first user to rebuild that table would be a simple enough fix, but I wanted to open a ticket and see if anyone else had come up with a solution first.

Please help, this is crashing our site: http://www.divx.com/ every time we update a menu or clear the cache.

Thanks!

CommentFileSizeAuthor
#228 optimize_the_route-356399-228.patch67.97 KBsidharrell
#219 interdiff.txt10.32 KBklausi
#218 356399.patch68.05 KBklausi
#204 356399-204.patch59.34 KBdawehner
#197 356399.patch64.75 KBklausi
#195 356399-193.patch63.33 KBdawehner
#193 interdiff.txt1.4 KBdawehner
#193 356399-193.patch63.33 KBdawehner
#191 356399.patch64.48 KBklausi
#187 356399.patch59.5 KBklausi
#186 interdiff.txt1.38 KBdawehner
#183 interdiff.txt3.78 KBdawehner
#183 356399-183.patch59.07 KBdawehner
#178 interdiff.txt565 bytesdawehner
#178 356399-178.patch56.7 KBdawehner
#177 interdiff.txt2.91 KBdawehner
#177 356399-177.patch56.53 KBdawehner
#175 interdiff.txt10.37 KBdawehner
#175 356399-174.patch56.93 KBdawehner
#169 interdiff.txt10.97 KBdawehner
#169 356399-169.patch59.92 KBdawehner
#158 optimize_the_route-356399-158.patch64.33 KBsidharrell
#158 interdiff-3563991-156-158.txt1.71 KBsidharrell
#156 optimize_the_route-356399-156.patch64.34 KBsidharrell
#156 interdiff-3563991-154-156.txt1.26 KBsidharrell
#154 interdiff-3563991-152-154.txt1.04 KBsidharrell
#154 optimize_the_route-356399-154.patch63.53 KBsidharrell
#150 interdiff.txt1.07 KBdawehner
#150 3563991-150.patch67.96 KBdawehner
#148 356399-148.patch67.97 KBdawehner
#146 interdiff-2.txt3.87 KBdawehner
#146 356399-146.patch5.29 KBdawehner
#144 interdiff.txt1.2 KBdawehner
#144 356399-144.patch67.79 KBdawehner
#141 interdiff.txt429 bytesdawehner
#141 356399-141.patch66.59 KBdawehner
#139 356399-139.patch66.52 KBdawehner
#137 2372507-64.patch54.41 KBdawehner
#126 356399-126.patch66.8 KBAnonymous (not verified)
#126 356399-126-interdiff.txt3.2 KBAnonymous (not verified)
#124 356399-124-interdiff.txt0 bytesAnonymous (not verified)
#124 356399-124.patch64.28 KBAnonymous (not verified)
#119 356399-119-interdiff.txt1.98 KBAnonymous (not verified)
#119 356399-119.patch59.79 KBAnonymous (not verified)
#116 356399-117.patch58.48 KBAnonymous (not verified)
#116 356399-117-interdiff.txt9.79 KBAnonymous (not verified)
#115 356399-115.patch56.48 KBAnonymous (not verified)
#115 356399-115-interdiff.txt4.64 KBAnonymous (not verified)
#110 356399-110.patch54.37 KBAnonymous (not verified)
#107 356399-107-interdiff.txt1.47 KBAnonymous (not verified)
#107 356399-107.patch45.08 KBAnonymous (not verified)
#105 356399-105.patch44.29 KBAnonymous (not verified)
#105 356399-105-interdiff.txt4.7 KBAnonymous (not verified)
#100 356399-99-interdiff.patch7.17 KBAnonymous (not verified)
#100 356399-99.patch41.42 KBAnonymous (not verified)
#98 356399-97-interdiff.txt9.01 KBAnonymous (not verified)
#98 356399-97.patch34.68 KBAnonymous (not verified)
#96 356399-96.patch26.46 KBAnonymous (not verified)
#94 uprofiler.png163.92 KBdawehner
#94 548d8f71423d4.drupal-perf.uprofiler.txt1.52 MBdawehner
#92 356399-92-do-not-test.patch12.49 KBAnonymous (not verified)
#89 interdiff.txt1.5 KBdawehner
#89 356399-89.patch4.43 KBdawehner
#85 interdiff.txt1.54 KBdawehner
#85 356399-85.patch3.63 KBdawehner
#81 interdiff.txt4.03 KBdawehner
#81 356399-79.patch4.26 KBdawehner
#79 interdiff-69-79.txt3 KBmartin107
#79 router_rebuild-356399-79.patch3.82 KBmartin107
#69 router_rebuild-356399-69-interdiff.txt3.27 KBBerdir
#69 router_rebuild-356399-69.patch3.57 KBBerdir
#60 router_rebuild-356399-60.patch2.54 KBnlisgo
#60 interdiff-58-60.txt542 bytesnlisgo
#58 interdiff-56-58.txt489 bytesnlisgo
#58 router_rebuild-356399-58.patch2.36 KBnlisgo
#56 interdiff-51-56.txt1.85 KBnlisgo
#56 router_rebuild-356399-56.patch2.27 KBnlisgo
#51 356399-51.patch1.15 KBdawehner
#49 356399-49.patch1.42 KBdawehner
#44 356399_interdiff.txt734 bytescatch
#44 356399_menu_rebuild-44.patch2.42 KBcatch
#42 356399_menu_rebuild-41.patch2.67 KBcatch
#41 356399_menu_rebuild-41.patch2.69 KBcatch
#37 356399_menu_rebuild-36.patch2.69 KBcatch
#36 356399_menu_rebuild-36.patch2.69 KBcatch
#35 356399_menu_rebuild-33.patch2.96 KBcatch
#33 356399_menu_rebuild-33.patch2.84 KBcatch
#32 356399_menu_rebuild-25.patch3.4 KBcatch
#30 356399_menu_rebuild-25.patch2.19 KBcatch
#25 356399_menu_rebuild-25.patch2.94 KBcatch
#14 d7-menu_rebuild-fix.patch1.18 KBJosh Waihi
#12 d7-menu_rebuild-fix.patch1.17 KBJosh Waihi
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

slantview’s picture

So here is what I have been able to discover so far

menu_rebuild is called.

menu_rebuild clears the cache. Now our 400+ requests per second all decide they need to rebuild the menu_router table since they couldn't pull the menu from cache.

everybody gets in on the party at line 2199 and starts to rebuild.

at line 2288 they all DELETE FROM {menu_router} simultaneously and now menu_router is empty... sort of.

whoever is first starts to INSERT around line 2370 and now we have DELETE FROM and INSERT INTO happening at the same time.

the end result is 600-700 connections stuck in the db trying to figure out how to serve their request and a bunch of waiting processes on the web servers.

now we stack up the web requests until our load balancer starts dumping them into our sorry server and everybody gets our beautiful site maintenance page.

so my theory is that we need to lock before we rebuild and make sure we don't clear cache until the initial menu is built form whoever has the lock so that we can keep serving the stale menu cache until the menu is done rebuilding.

i'm working on code for this right now.

anyone else got an idea?

slantview’s picture

Also, there is another ticket at #238760: menu_router table truncated and site goes down that claims that this is fixed in Drupal 6.3, but we are on Drupal 6.8 and this is still broken.

Damien Tournoud’s picture

Status: Active » Closed (duplicate)

This have been identified a long time ago, and I believe this is one of the biggest scalability issue in Drupal 6 at this time. The solution is indeed to implement a locking framework, as I suggested long ago in #251792: Implement a locking framework for long operations.

pwolanin’s picture

Version: 6.8 » 7.x-dev
Status: Closed (duplicate) » Postponed (maintainer needs more info)

Is this really a dup? It sounds like a bug related to the fix we put in for the cases where the rebuild failed. This does indeed seem pretty critical. We also know that using drupal variable is broken (also subject to race conditions) in terms of a locking/semaphore.

David Strauss’s picture

Subscribe.

robertDouglass’s picture

Subscribe.

slantview’s picture

The fix for divx.com was to switch the menu_router table to MyISAM and to use a locking mechanism for this rebuild. I believe that this issue is dependent on #251792: Implement a locking framework for long operations. I don't see how you can really fix this issue without it.

donquixote’s picture

I really don't see the need for the ridiculously high number of queries happening during one menu_rebuild.

I would propose to rewrite the menu_rebuild function, see
Rewrite the function menu_rebuild

Basic idea:
- One big query to load the menu_links and menu_router tables into php arrays.
- Do some computations.
- Write the changes to the DB, if any.

EDIT:
That's too simple, because menu_links can be too big for memory. The trick is to load only a part of menu_links into memory, then it should work.

jmpalomar’s picture

Subscribe.

pwolanin’s picture

These are the 2 relevant issues I think would help with the problems:
#251792: Implement a locking framework for long operations
#193366: execute various rebuilds when modules are submitted

Please contribute to those.

catch’s picture

Status: Postponed (maintainer needs more info) » Closed (duplicate)
Josh Waihi’s picture

Issue summary: View changes
Status: Closed (duplicate) » Needs review
FileSize
1.17 KB

I'm reopening this issue because it seems the "duplicate" has gone off track from what this issue was about and this is for Drupal 7 rather than Drupal 6.

I believe the core issue around menu_rebuild is its poor safeguards from rebuilding the menu when it doesn't need to. When menu_rebuild() is called it tries to acquire a lock. If it can't acquire the lock. It waits, assumes the menu has been rebuilt and returns. It misses completing two extra tasks. One is to set menu_rebuild_needed to FALSE in the local process (using $conf) to prevent menu_rebuild being called again without menu_rebuild_needed explicitly being set correctly. The second is to reload menu_masks which may have changed during the menu rebuild.

Failure to do either of these two things will force another menu_rebuild attempt on the same process if menu_get_item is called again. This can often be the case for taxonomy menus or rendering the output of l() among other things. It can lead to many attempts in a single process trying to rebuild the menu and each time, it may encounter a lock, wait, then continue. I've seen this utilize usleep for almost a minute in XHProf data. The more traffic, the more this race condition is experienced and the longer it takes for the lock to eventually free up.

This patch fixes this issue, though I can imagine there may be some contention around the workaround for menu_masks.

Status: Needs review » Needs work

The last submitted patch, 12: d7-menu_rebuild-fix.patch, failed testing.

Josh Waihi’s picture

FileSize
1.18 KB

Lets hope this patch works.

mgifford’s picture

Status: Needs work » Needs review

newtoid queued 14: d7-menu_rebuild-fix.patch for re-testing.

mgifford’s picture

Is this also going to be a problem in Drupal 8?

Whats the best way to test this new patch?

Code looks simple enough and well documented.

newtoid’s picture

Just had a previously patched and working site lock up and stop. Drupal 7.31.

mgifford’s picture

@newtoid - Were you using the patch in #14?

In which case, this patch may not be sufficient....

newtoid’s picture

@mgifford Thanks for the reply. Yes that was the patch... Is there more?

Fabianx’s picture

+1 Lovely find for a menu_rebuild_needed stampede.

This can take a site down ...

catch’s picture

Title: menu_rebuild causes site to lock up » Router rebuild lock_wait() condition can result in rebuild later in the request (race condition)
Version: 7.x-dev » 8.0.x-dev
Status: Needs review » Needs work
Issue tags: +Needs backport to D7

We may have the same problem in 8.x yes.

   if (!$this->lock->acquire('router_rebuild')) {
      // Wait for another request that is already doing this work.
      // We choose to block here since otherwise the routes might not be
      // available, resulting in a 404.
      $this->lock->wait('router_rebuild');
      return FALSE;
    }

If the state value has already been checked during the request, this won't reset it, so another call to rebuildIfNeeded() would attempt a rebuild again.

Caching was added to the state system in #1786490: Add caching to the state system.

Menu masks also got ported to 8.x, but that is now an internal implementation detail of the database router, so not sure the best way to reset it. Easiest would be to re-init the state system if we get to this condition.

As Josh anticipated I'm not sure about the workaround for menu_masks in the 7.x patch - what about forcing a variable_init() again instead?

tim.plunkett’s picture

Fabianx’s picture

There is another problem here and here it gets tricky:

Under certain conditions e.g. using memcache or a MEMORY table for semaphore, the lock can be released _before_ the transaction is actually committed.

So what should happen is:

if (!acquire_lock()) {
  lock_wait();
  // Reset variables, load from cache.
  variable_reset();
  // Return early.
  return FALSE;
}

// Check if someone else has written menu already, this can happen if request was started, and old variables loaded, but then menu rebuild completed.
variable_reset();

// If there is still a menu rebuild needed, continue

if (!variable_get('menu_rebuild_needed', FALSE)) {
  release_lock();
  return FALSE;
}

// Actually rebuild the menu
$transaction = db_transaction();

$rebuild_menu

try {
  // ...
}
catch (Exception) {
  $transaction->rollback();
  // @todo This is a problem as now all the waiting processes will get old entries or 404.
  //            However if menu_rebuild fails we probably have other problems ...
  return FALSE;
}

$transaction->commit();

 // The lock was acquired outside a transaction, so should be released outside a transaction.
lock_release();
return TRUE;

This pseudo-code should take care of all cases I can think of.

catch’s picture

Status: Needs work » Needs review
FileSize
2.94 KB

Updated patch based on #24, hoping the comments are descriptive enough about the various cases.

Moving to 7.x temporarily for the bot.

catch’s picture

Version: 8.0.x-dev » 7.x-dev

Status: Needs review » Needs work

The last submitted patch, 25: 356399_menu_rebuild-25.patch, failed testing.

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 25: 356399_menu_rebuild-25.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
2.19 KB

grumble grumble... sorry for the noise.

Status: Needs review » Needs work

The last submitted patch, 30: 356399_menu_rebuild-25.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
3.4 KB
catch’s picture

OK cleaned up now it's green, apologies again for the noise.

Fabianx’s picture

+++ b/includes/menu.inc
@@ -456,7 +456,19 @@ function menu_get_item($path = NULL, $router_item = NULL) {
+      $variables = variable_initialize();
+      if (!isset($variables['menu_rebuild_needed']) && !empty($variables['menu_masks'])) {
+        // Ensure this process doesn't attempt to rebuild the menu again later.
+        unset($GLOBALS['conf']['menu_rebuild_needed']);
+        // Ensure $conf['menu_masks'] is set correctly.
+        $GLOBALS['conf']['menu_masks'] = $variables['menu_masks'];
+      }
+      else {
+        menu_rebuild();
+      }

@@ -2712,9 +2724,25 @@ function menu_rebuild() {
+    $variables = variable_initialize();
+    if (!empty($variables['menu_masks']) && empty($variables['menu_rebuild_needed'])) {
+      unset($GLOBALS['conf']['menu_rebuild_needed']);
+      $GLOBALS['conf']['menu_masks'] = $variables['menu_masks'];
+      return FALSE;
+    }
+    // If we get here and menu_masks was not set, then it is possible a menu
+    // rebuild was triggered between the lock_wait() returning and the variables
+    // being reloaded, or that the process rebuilding the menu was unable to
+    // complete successfully. A missing menu_masks variable could result in a
+    // 404, so re-run the function.
+    else {
+      return menu_rebuild();
+    }

These two code snippets look almost the same:

Would it be possible to consolidate in a helper:

_menu_try_rebuild() or _menu_check_rebuild()

Even though comments would need to be a little more generic and a choice be taken regarding isset() vs. empty(), which would probably be good anyway.

catch’s picture

Issue summary: View changes
FileSize
2.96 KB

Updated the issue summary and a bit more cleanup.

catch’s picture

With the helper suggested in #34.

catch’s picture

Missed an underscore.

Fabianx’s picture

  1. +++ b/includes/menu.inc
    @@ -453,10 +453,10 @@ function menu_get_item($path = NULL, $router_item = NULL) {
    +      if (!_menu_check_rebuild()) {
    +        menu_rebuild();
    +      }
    

    Can we reverse logic?

    Feels strange ...

  2. +++ b/includes/menu.inc
    @@ -2693,6 +2693,21 @@ function menu_reset_static_cache() {
     /**
    + * Checks that a menu rebuild is necessary and performs it if so.
    + */
    

    Comment is misleading ...

  3. +++ b/includes/menu.inc
    @@ -2712,9 +2727,15 @@ function menu_rebuild() {
    -    return FALSE;
    -  }
     
    +    if (!_menu_check_rebuild()) {
    +      // If we get here and menu_masks was not set, then it is possible a menu
    +      // being reloaded, or that the process rebuilding the menu was unable to
    +      // complete successfully. A missing menu_masks variable could result in a
    +      // 404, so re-run the function.
    +      return menu_rebuild();
    +    }
    +  }
    

    We still need to return FALSE here after the check.

The last submitted patch, 36: 356399_menu_rebuild-36.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 37: 356399_menu_rebuild-36.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
2.69 KB
catch’s picture

Fixed the comment pointed out in #38 too.

Fabianx’s picture

  1. +++ b/includes/menu.inc
    @@ -453,10 +453,10 @@ function menu_get_item($path = NULL, $router_item = NULL) {
       if (!isset($router_items[$path])) {
    -    // Rebuild if we know it's needed, or if the menu masks are missing which
    -    // occurs rarely, likely due to a race condition of multiple rebuilds.
         if (variable_get('menu_rebuild_needed', FALSE) || !variable_get('menu_masks', array())) {
    

    nit - I don't think we need to remove this comments.

    They are still valid.

  2. +++ b/includes/menu.inc
    @@ -2712,9 +2727,16 @@ function menu_rebuild() {
       }
    -
       $transaction = db_transaction();
    

    nit - I don't think we need to remove whitespace here.

  3. +++ b/includes/menu.inc
    @@ -2736,6 +2758,12 @@ function menu_rebuild() {
    diff --git a/sites/default/default.settings.php b/sites/default/default.settings.php
    
    diff --git a/sites/default/default.settings.php b/sites/default/default.settings.php
    old mode 100644
    new mode 100755
    

    nit - This is left over cruft.

We are at nits, this is RTBC for 7.x in general.

catch’s picture

Fixed the nits.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC once tests pass!

Then we need to move to 8.x, get work there done and in a few weeks / months can finally get this into D7.

I really wished the backport policy would not apply to criticals ...

catch’s picture

Version: 7.x-dev » 8.0.x-dev

Up to 8.x.

catch’s picture

Status: Reviewed & tested by the community » Needs work
mikeytown2’s picture

Looks like in D8 this is in RouteBuilder::rebuild().

dawehner’s picture

Status: Needs work » Needs review
FileSize
1.42 KB

IN D8 the logic is a bit different for the menu masks. We nowhere actually check that the state exists.

Fabianx’s picture

Status: Needs review » Needs work

This now needs a reroll:

https://www.drupal.org/node/2352641

But should also be easier.

We need to always do two checks basically:

- 1. After the lock - DONE! :)
- 2. And then also in rebuildIfNeeded() we should check after the first fast state check if we really still need to rebuild ...

This can happen if a request comes in very late and the rebuild is almost finished, but not quite.

dawehner’s picture

Status: Needs work » Needs review
FileSize
1.15 KB

@Fabianx
Is there a reason you want to do that logic inside rebuildIfNeeded() and not in rebuild()? Couldn't there be code
which calls rebuild() itself?

Status: Needs review » Needs work

The last submitted patch, 51: 356399-51.patch, failed testing.

Fabianx’s picture

Yes, there could be ...

function rebuild() {
  if (!checkRebuildIsNeeded()) {
    return FALSE;
  }
// ...
}

should do the trick.

Fabianx’s picture

Shouldn't the state now use the new route.builder replacement state thingy, where you also do the setRebuildNeeded() part?

I mean: Instead of $this->state ...

nlisgo’s picture

Attempting to restore a working patch.

nlisgo’s picture

nlisgo’s picture

Status: Needs work » Needs review
nlisgo’s picture

Add method doc

Fabianx’s picture

Status: Needs review » Needs work

Now we only need #53 added and we are done :)

Thanks so much for your work!

nlisgo’s picture

The suggested fix in #53 causes all of the tests in RouteBuilderTest to fail.

nlisgo’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 60: router_rebuild-356399-60.patch, failed testing.

Fabianx’s picture

Fascinating ...

That might be due to the state service resetting the flag before calling rebuild?

Will take a look at the patch in context!

Thanks for your work!

catch’s picture

Component: menu system » routing system
Fabianx’s picture

From all that I can see the patch should be correct. Weird ...

nlisgo’s picture

Maybe the tests are broken :)

Berdir’s picture

My guess is this breaks manually called rebuild() calls. See the first failing test, CommentDefaultFormatterCacheTagsTest, calls rebuild() manually.

it is a kernel test, so I guess nobody triggers rebuildNeeded(), which means that it will not actually do anything.

This seems like a problem? If I manually, explicitly call rebuild(), then it should just do it and not do some checks if it's needed?

Fabianx’s picture

@Berdir: Doh, right!

Yeah, we need to move that logic to rebuildIfNeeded() instead.

Thanks!

Berdir’s picture

Status: Needs work » Needs review
FileSize
3.57 KB
3.27 KB

Moved it, also did some renaming and documentation improvements. Still don't like the naming there, I'd rather have a specific method for this on the router builder indicator instead of a protected wrapper. And if posssible, avoid talking about cache, that seems like an implementation detail. Thoughts?

Fabianx’s picture

Maybe move the isRebuildNeededUncached as isRebuildNeededNow or such to the Router builder indicator itself?

xjm’s picture

So the issue summary describes the D7 issue but not the issue in D8. @catch suggests in #22 that this is equally a problem in Drupal 8.

Fabianx’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update
Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

I am putting to RTBC tentatively:

- The naming is not perfect
- The location might be better suited to have the isRebuildNeededCached() in the Route Builder indicator.

But besides that (which someone needs to take a decision on), this is ready.

So it would be great to get some core committer feedback.

catch’s picture

I think it makes sense to move this to the route builder indicator, seems a better spot for it and looks like both Fabianx and Berdir feel like that as well.

Apart from isRouterRebuildNeededNow() I can only think of isRouterRebuildReallyNeeded() or isRouterRebuildStillNeeded() which seem OK for accuracy but the former is veering close to a tongue twister.

Fabianx’s picture

Status: Reviewed & tested by the community » Needs work

CNW based on #74, lets move it to the route builder indicator and choose a good name.

Fabianx’s picture

Issue tags: +Novice

Tagging novice - if someone wants to give that rather simple re-factor a try.

catch’s picture

I'm leaning towards isRouterRebuildStillNeeded() a bit - what we're doing there is checking if the router rebuild is still needed since the last time it was checked.

xjm’s picture

Issue tags: +Triaged D8 critical
martin107’s picture

Looking at the RouterBuilderIndicatorInterface

isRebuildNeed()
setRebuildDone()

it is clear the "rebuild" is acting on the Router

otherwise these should be isRouterRebuildNeeded and setRouterRebuildDone

So I have shortened the last suggestion to isRebuildStillNeeded()

and the function call will look like this

if ($this->routeBuilderIndicator->isRebuildStillNeeded()) {

martin107’s picture

Status: Needs work » Needs review
dawehner’s picture

FileSize
4.26 KB
4.03 KB

Just decided to move this critical a little bit further.

Ups, I worked on it in the meantime as well, but also fixed the unit test and made a small readability improvement:

+  public function isRebuildStillNeeded() {
+    // To absolutely ensure that the router rebuild is required, reset the cache
+    // in case they were set by another process.
+    $this->resetCache();
+    if ($this->isRebuildNeeded()) {
+      return TRUE;
+    }
+    return FALSE;
+  }

This can be just return $this->isRebuildNeeded()

I do agree with

isRouterRebuildStillNeeded

, given that its more describing what is going on.
Another idea I had was

ensureRouterRebuildNeeded

but I didn't like the non-boolean aspect of it.

martin107’s picture

Issue tags: -Novice

Novice tasks complete.

The last submitted patch, 79: router_rebuild-356399-79.patch, failed testing.

Fabianx’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Core/Routing/RouteBuilderIndicator.php
    @@ -34,6 +34,13 @@ public function __construct(StateInterface $state) {
    +  public function resetCache() {
    +    $this->state->resetCache();
    +  }
    +
    

    I don't think we need to keep the helper function ...

  2. +++ b/core/lib/Drupal/Core/Routing/RouteBuilderIndicator.php
    @@ -52,4 +59,14 @@ public function setRebuildDone() {
    +    // To absolutely ensure that the router rebuild is required, reset the cache
    +    // in case they were set by another process.
    +    $this->resetCache();
    

    Can use directly:

    $this->state->resetCache();

  3. +++ b/core/lib/Drupal/Core/Routing/RouteBuilderIndicatorInterface.php
    @@ -15,6 +15,11 @@
    +   * Reset the router builder indicator cache.
    +   */
    +  public function resetCache();
    

    Or do we still need this from the outside?

  4. +++ b/core/tests/Drupal/Tests/Core/Routing/RouteBuilderTest.php
    @@ -293,6 +297,9 @@ public function testRebuildIfNecessary() {
         // This will not trigger a rebuild.
         $this->assertFalse($this->routeBuilder->rebuildIfNeeded());
    +
    +    // This will not trigger a rebuild, because the indicator is not set at all.
    +    $this->assertFalse($this->routeBuilder->rebuildIfNeeded());
    

    Isn't that twice the same test?

dawehner’s picture

Status: Needs work » Needs review
FileSize
3.63 KB
1.54 KB

Thank you @Fabianx for the quick review!

1-3

You are absolutly right. This function is introduced by this patch, so its not called from anywhere in HEAD.
I can't think of a valid usecase, its a pure implementation detail that state has static caching, so in case someone
has that edge case, that someone could reset the static caching of $state.

Isn't that twice the same test?

No its, not. Let's have a look at a bit more context:

+    $this->routeBuilderIndicator->expects($this->exactly(3))
                 ->method('isRebuildNeeded')
-                ->will($this->onConsecutiveCalls(TRUE, FALSE));
+                ->will($this->onConsecutiveCalls(TRUE, TRUE, FALSE));
+
+    $this->routeBuilderIndicator->expects($this->exactly(2))
+      ->method('isRebuildStillNeeded')
+      ->will($this->onConsecutiveCalls(TRUE, FALSE));

So we call rebuildIfNeeded() 3 times, with one time having the need for a real rebuild, and once not.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

This looks a lot cleaner than previous patches, thanks!

Looks good to go to me, not 100% sure on the name and the documentation of the new method but have no better ideas, so let's get some feedback from @alexpott or @catch :)

catch’s picture

I came up with isRebuildStillNeeded() so I'm fine with the name.

Patch looks good in general now.

Leaving RTBC - I shouldn't commit this one since I did a decent amount of work on the original 7.x patch here.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

The code looks a bit funny:

    if ($this->routeBuilderIndicator->isRebuildNeeded()) {
     if (!$this->routeBuilderIndicator->isRebuildStillNeeded()) {
       return FALSE;
     }

That code path could effectively translate into:

$this->routeBuilderIndicator->isRebuildNeeded();
$this->routeBuilderIndicator->state->resetCache();
$this->routeBuilderIndicator->isRebuildNeeded();

Which calls into question why have and call a function whose result we won't believe. At a minimum, I think we need documentation as to why this is necessary. Ideally, we'd able to make the code more intuitive.

I also think the newly introduced recursion in rebuild() could be a bad implementation; if there are many processes, the recursion could really blow up the call stack with nested calls. If a process is never able to get the lock (possible at least in theory), it would keep recursing forever until the call stack runs out of memory. I'm used to seeing this kind of locking implemented using a while-loop to avoid the recursion.

dawehner’s picture

Assigned: Unassigned » effulgentsia
Status: Needs work » Needs review
FileSize
4.43 KB
1.5 KB

@berdir, @xjm, @dawehner discussed this issue and how do document the behaviour what is going on.
The problems mentioned in your last part of the review won't appear, given that its much easier to break with too many mysql
connections for example.

Assigning to @effulgentsia to review our new docs.

Anonymous’s picture

i don't like this patch.

- the state flag is written to outside the lock, so the process rebuilding the router info can unset the flag just after another process sets it, with undefined results for an undefined period
- the first thing we should try to do is make the rebuild less expensive. at a first approximation, caching the .yml info seems like a fairly obvious candidate
- we're still taking the likely most frequent case (readers) and mashing them against the lock system. seems like it would be better to make the writers fight for the lock, because they are (99% of the time) a) less frequent, and b) more expensive requests

catch’s picture

Discussed #90 a bit with beejeebus in irc.

Switching rebuilds to happen on write (and keeping the lock protection) seems like we should definitely try to do it for 8.x

I'm not sure it''ll be possible to backport that to 7.x - variable_set('menu_rebuild_needed') is a part of the API. However we have #44 running successfully in production on the site where we ran into this, and it's fine if we end up with two different solutions in two different versions.

Anonymous’s picture

FileSize
12.49 KB

here's a rough patch that caches the yml stuff in getRouteDefinitions(), and makes the module and theme systems call ->rebuild() instead of setting a flag.

on my VM, i see RouteBuilder::getRouteDefinitions() go from ~70ms to ~15ms. would love to see someone else run this and insert some simple microtime measurements in getRouteDefinitions() and report back.

should i open a separate issue for caching the .yml stuff separate from the other changes?

effulgentsia’s picture

Assigned: effulgentsia » Unassigned

Given #90-#92, unassigning myself for now. Please reassign if there's something you'd like my input on, though.

dawehner’s picture

@beejeebus
Note: Its not only the parsing of the YML files, but we also have to dump the entries into the database,
and rebuild parts of the menu trees, as they become potentially invalid ...
We certainly need numbers to be able to judge something here.

Here is a an example run, which mostly had a router rebuild. Note: This is a crapy slow machine in general.

Out of those the event dispatcher dominates (... this is mostly spend in rebuild the menu links and rebuild the routes of views).
There is also quite some costs to dump the routes for itself (Drupal\Core\Routing\MatcherDumper::dump).

dawehner’s picture

Issue tags: +Ghent DA sprint

.

Anonymous’s picture

FileSize
26.46 KB

ok, lets see how much i broke.

this removes Drupal\Core\Routing\RouteBuilderIndicator and all usage of it, updates the RouteBuilder and ThemeManager unit tests.

Status: Needs review » Needs work

The last submitted patch, 96: 356399-96.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
34.68 KB
9.01 KB

ok, this patch rips out all usage of rebuildIfNeeded, and replaces those calls with rebuild. mostly. there seem to be places where we do a dance around rebuildIfNeeded() where we can just remove it. i hope. see the changes in RouteProvider::getRouteCollectionForRequest().

looks like RouterRebuildSubscriber can just die, but i left it in for now.

Status: Needs review » Needs work

The last submitted patch, 98: 356399-97.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
41.42 KB
7.17 KB

this patch fixes the install issues with #98, and replaces Drupal::service('router.builder_indicator')->setRebuildNeeded() with Drupal::service('router.builder')->rebuild().

lulz at menu_ui.install, that may be able to just die, left it as is for now.

Status: Needs review » Needs work

The last submitted patch, 100: 356399-99-interdiff.patch, failed testing.

The last submitted patch, 100: 356399-99.patch, failed testing.

The last submitted patch, 100: 356399-99.patch, failed testing.

The last submitted patch, 100: 356399-99.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
4.7 KB
44.29 KB

oookay, removed a few ->rebuild() calls that happened on every request, and the RouterRebuildSubscriber().

also made sure to depend on system and create the router table in some of our franken-unit-tests, which were ostensibly doing route stuff without it.

Status: Needs review » Needs work

The last submitted patch, 105: 356399-105.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
45.08 KB
1.47 KB

remove references to RouterRebuildSubscriber(), and some leftover rubbish in MenuRouterRebuildSubscriber.

Anonymous’s picture

while i wait for that, i'm poking at the next bits that may be cacheable in RouteBuilder::rebuild().

the theory behind caching the yml is that (aside from dev mode) it shouldn't change outside of module install / uninstall / update, which has well known places to invalidate the cache.

once the yml parsing is replace by a cache->getMultiple(), the next thing that shows up as slow is this OO goop:

  foreach ($this->getRouteDefinitions() as $routes) {
      // The top-level 'routes_callback' is a list of methods in controller
      // syntax, see \Drupal\Core\Controller\ControllerResolver. These methods
      // should return a set of \Symfony\Component\Routing\Route objects, either
      // in an associative array keyed by the route name, which will be iterated
      // over and added to the collection for this provider, or as a new
      // \Symfony\Component\Routing\RouteCollection object, which will be added
      // to the collection.
      if (isset($routes['route_callbacks'])) {
        foreach ($routes['route_callbacks'] as $route_callback) {
          $callback = $this->controllerResolver->getControllerFromDefinition($route_callback);
          if ($callback_routes = call_user_func($callback)) {
            // If a RouteCollection is returned, add the whole collection.
            if ($callback_routes instanceof RouteCollection) {
              $collection->addCollection($callback_routes);
            }
            // Otherwise, add each Route object individually.
            else {
              foreach ($callback_routes as $name => $callback_route) {
                $collection->add($name, $callback_route);
              }
            }
          }
        }
        unset($routes['route_callbacks']);
      }
      foreach ($routes as $name => $route_info) {
        $route_info += array(
          'defaults' => array(),
          'requirements' => array(),
          'options' => array(),
        );

        $route = new Route($route_info['path'], $route_info['defaults'], $route_info['requirements'], $route_info['options']);
        $collection->add($name, $route);
      }
    }

i have a perhaps-wishful-thinking hunch that this should have the same lifecycle. is that remotely right? can the return value of this change outside of module install / update / uninstall events?:

          $callback = $this->controllerResolver->getControllerFromDefinition($route_callback);

Status: Needs review » Needs work

The last submitted patch, 107: 356399-107.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
54.37 KB

added a bunch of boring test stuff to fix tests that faux-exercise the route rebuild code paths without creating a route table. lulz.

also fix missing use for RouteBuilder in ThemeController.

i think at least KernelTestBaseTest will still fail.

Status: Needs review » Needs work

The last submitted patch, 110: 356399-110.patch, failed testing.

dawehner’s picture

In general its important to keep in mind the original issue which basically introduced the indicator: #2164367: Rebuild router as few times as possible per request

catch’s picture

I did a quick bit of archaeology on the menu_rebuild_needed variable, this race condition was rediscovered by Fabianx and me on a live Drupal 7 site recently so the Drupal 7 issue here is extremely real - it's not introduced by the 8.x listener. I've also seen the symptoms of it before in NewRelic without having actually tracked it down on other sites (since it always eventually self-repairs after a few minutes and various contrib modules rebuild menus sometimes).

The variable was originally added in #202955: Access denied after install - menu_rebuild() calls - this was to allow a menu rebuild to be triggered when in 'maintenance mode' (where you might not have a complete module list) so that it ran outside of maintenance mode. Not an optimization then, just a workaround.

Then field UI started to use it to optimize test runs since menu_rebuild() was getting triggered multiple times per request.

Then we've gradually added it elsewhere up until #2164367: Rebuild router as few times as possible per request which standardized that pattern.

So I think it'd be a good change to switch to setting a flag (in memory only, not state) that triggers a menu rebuild at the end of request, and that will satisfy not wanting to rebuild more than once per request. Where the rebuild has to happen mid-request, can just trigger that directly.

For Drupal 7 I think we'll have to leave the variable in, and probably use the 7.x version of this patch, but perhaps we could also set $conf['menu_rebuild_needed'] and add something in drupal_exit() looking for it in many cases that core uses this, to try to reduce the chances of a read stampede there as well.

Anonymous’s picture

i think it's worth adding back a flag to rebuild on shutdown.

we should be able to do that in a non-racy, per-process way.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
4.64 KB
56.48 KB

this should fix most of the fails due to the router table either not being there or already being there.

Anonymous’s picture

FileSize
9.79 KB
58.48 KB

added setRebuildNeeded() to route builder interface, made RouteBuilder implement destructable interface, call setRebuildNeeded() from non-module|theme places instead of rebuild.

should get us back to one rebuild per request most of the time.

The last submitted patch, 115: 356399-115.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 116: 356399-117.patch, failed testing.

Anonymous’s picture

FileSize
59.79 KB
1.98 KB

this one might be green.

Anonymous’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 119: 356399-119.patch, failed testing.

bnobleman queued 119: 356399-119.patch for re-testing.

The last submitted patch, 119: 356399-119.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
64.28 KB
0 bytes

mkay, so views created/altered programmatically within the test parent process, then accessed in a child via a http request, throw errors. because that sort of code is racy and evil.

Status: Needs review » Needs work

The last submitted patch, 124: 356399-124.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
3.2 KB
66.8 KB

aaaaaand more rebuilds in tests.

Anonymous’s picture

Assigned: Unassigned » effulgentsia

ok, now this is green, and has the 'do one rebuild for all non-module event' code in it, passing back to effulgentsia for a review.

dawehner’s picture

Note: one reason why we had that extra service is documented in https://www.drupal.org/node/2352641#comment-9236537 but this is no longer the case.

  1. +++ b/core/core.services.yml
    @@ -553,17 +553,11 @@ services:
       router.builder:
         class: Drupal\Core\Routing\RouteBuilder
    ...
    +    arguments: ['@router.dumper', '@lock', '@event_dispatcher', '@module_handler', '@controller_resolver', '@access_manager.check_provider', '@cache.default']
         tags:
    ...
    +      - { name: needs_destruction }
    

    Note: router.builder would be a perfect candidate for #1973618: DIC: Lazy instantiation of service dependencies (ProxyManager for "proxy services") ... we don't want to load the lock router.dumper etc. ... on every non-page-cache request

  2. +++ b/core/lib/Drupal/Core/Extension/ModuleInstaller.php
    @@ -356,6 +356,9 @@ public function uninstall(array $module_list, $uninstall_dependents = TRUE) {
    +      // @todo: make this a method on the RouteBuilder.
    +      \Drupal::service('cache.default')->delete('routebuilder:yml:' . $module);
    

    It is sad that we can't just use the extension cache tag here.

  3. +++ b/core/lib/Drupal/Core/Menu/LocalTaskManager.php
    @@ -189,9 +189,6 @@ public function getLocalTasksForRoute($route_name) {
    -        // Maybe some code asked to rebuild the routes, so rebuild the router
    -        // as we rely on having proper existing routes in dynamic local tasks.
    -        $this->routeBuilder->rebuildIfNeeded();
    

    Couldn't this result in a race condition, that something wants to rebuild, and so the local tasks end up with old tasks?

  4. +++ b/core/lib/Drupal/Core/Routing/RouteBuilder.php
    @@ -211,33 +219,52 @@ public function getCollectionDuringRebuild() {
    +  protected function getRouteDefinitions() {
    

    What about name it "getStaticRouteDefinitions" so its a little clearer, what is going on?

  5. +++ b/core/modules/config/src/Tests/DefaultConfigTest.php
    @@ -45,6 +45,7 @@ class DefaultConfigTest extends KernelTestBase {
    +    $this->installSchema('system', 'router');
    

    ... just curious here ... don't we just need the router in every kernel test now?

Anonymous’s picture

Assigned: effulgentsia »
Status: Needs review » Needs work

128.1 - should we add a note to that issue? or open a separate follow up?

128.2 - discussed this with dawehner in IRC. i'd like to keep the per-module clearing, because it will lead to better hit rates. however, i agree with him that it's kinda ugly - better implementation ideas most welcome.

128.3 - yes, but this is "normal". this patch brings the router builder cache back in line with most others - there's a window where a reader can get the "old" router information, just like every system all throughout Drupal with a cache. the "next" request, however, will get the new data. i don't think there's anything we need to do here - i think that's a better approach than blocking all requests and hammering the lock system.

128.4 - i like that suggestion, will put that in the next reroll.

128.5 - maybe? i'm not sure, to be honest i was just banging away at the frankentest stuff to make it go green.

Anonymous’s picture

discussed the caching yaml part of this with berdir, i think we can remove it and rely on the more general yaml caching here being added in #2395143: YAML parsing is very slow, cache it with FileCache.

dawehner’s picture

@beejeebus
What should we do about this issue?
The patch before you got involved solved a problem, and especially would allow us to get a fix for that problem into Drupal 7.

We can easily open up a new issue discussing your ideas.

catch’s picture

The earlier version of the patch is still right for 7.x - we still have to support menu_rebuild_needed there whatever happens.
I'm fine with the end-of-request write in Drupal 8 though, definitely simplifies things. YAML caching definitely should not happen here though.

If necessary we could fork the issue between 7 and 8 since they'll be completely different patches.

Anonymous’s picture

Assigned: » Unassigned

sorry for blocking progress.

related, i still seem to have assign-issue karma when i shouldn't. i'm not in maintainers.txt, so that should be removed, thanks.

dawehner’s picture

Does someone agree that we should go with the previous solution, which at least opens up solving the D7 problem as well
and then try out the ideas of @beejeebus in another issue?

catch’s picture

I'd be tempted to split the issue - move this to 7.x (and note it's applied and working on the 7.x client site where we rediscovered this) then open another issue for beejeebus' approach for 8.x (still critical).

If it turns out the end of request rebuild doesn't work we've still got the 8.x patch from here, but looking through the commit history for the current situation we got here almost entirely by accident so stepping back from that seems good.

dawehner’s picture

Title: Router rebuild lock_wait() condition can result in rebuild later in the request (race condition) » Optimize the route rebuilding process to rebuild on write
Related issues: +#2425259: Router rebuild lock_wait() condition can result in rebuild later in the request (race condition)
dawehner’s picture

Status: Needs work » Needs review
FileSize
54.41 KB

So for now this is just a reroll, but I have a couple of ideas how to improve things.
Note: One thing which should certianly try is to mark the RouteBuilder service as lazy. You should almost never

Berdir’s picture

Status: Needs review » Needs work

Wrong patch, this is the system_path stuff :)

dawehner’s picture

Status: Needs work » Needs review
FileSize
66.52 KB

Ha, I was indeed wondering about the filesize for a quick moment, but then simply stopped complaining about it.

Status: Needs review » Needs work

The last submitted patch, 139: 356399-139.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
66.59 KB
429 bytes

Let's first get the patch green again.

Status: Needs review » Needs work

The last submitted patch, 141: 356399-141.patch, failed testing.

Berdir’s picture

  1. +++ b/core/lib/Drupal/Core/Extension/ModuleInstaller.php
    @@ -222,7 +222,6 @@ public function install(array $module_list, $enable_dependencies = TRUE) {
             // Clear plugin manager caches and flag router to rebuild if requested.
             \Drupal::getContainer()->get('plugin.cache_clearer')->clearCachedDefinitions();
    -        \Drupal::service('router.builder_indicator')->setRebuildNeeded();
     
             // Set the schema version to the number of the last update provided by
             // the module, or the minimum core schema version.
    @@ -296,6 +295,7 @@ public function install(array $module_list, $enable_dependencies = TRUE) {
    
    @@ -296,6 +295,7 @@ public function install(array $module_list, $enable_dependencies = TRUE) {
     
         // If any modules were newly installed, invoke hook_modules_installed().
         if (!empty($modules_installed)) {
    +      \Drupal::service('router.builder')->rebuild();
           $this->moduleHandler->invokeAll('modules_installed', array($modules_installed));
         }
     
    

    This could be problematic.

    I recently got rid of a forced router rebuild after every module in my install profile (because there was an old call to it on default_content, it resulted in a 30% performance improvement: https://github.com/larowlan/default_content/pull/32#issuecomment-73381037). And that was *with* my ApcFileCache, that caches parsed yaml files. The expensive part is dynamic stuff like field_ui and views.

    Yes, this only rebuilds at the end of the request if I understand correctly, but if you use the UI installer, that's still a lot of rebuilds.

  2. +++ b/core/lib/Drupal/Core/Extension/ModuleInstaller.php
    @@ -376,6 +376,9 @@ public function uninstall(array $module_list, $uninstall_dependents = TRUE) {
    +      // @todo: make this a method on the RouteBuilder.
    +      \Drupal::service('cache.default')->delete('routebuilder:yml:' . $module);
    

    If we use the ApcFileCache for the parsing then we can drop it from here as suggested by @catch and don't need manual cache clears.

  3. +++ b/core/modules/content_translation/content_translation.admin.inc
    @@ -343,6 +343,6 @@ function content_translation_form_language_content_settings_submit(array $form,
       \Drupal::entityManager()->clearCachedDefinitions();
    -  \Drupal::service('router.builder_indicator')->setRebuildNeeded();
    +  \Drupal::service('router.builder')->setRebuildNeeded();
    
    +++ b/core/modules/content_translation/content_translation.module
    @@ -477,7 +477,7 @@ function content_translation_language_configuration_element_submit(array $form,
         \Drupal::entityManager()->clearCachedDefinitions();
    -    \Drupal::service('router.builder_indicator')->setRebuildNeeded();
    +    \Drupal::service('router.builder')->setRebuildNeeded();
       }
    

    I'm not sure why some places call setRebuildNeeded() and most now call rebuild().

  4. +++ b/core/modules/system/src/Tests/Theme/ThemeSettingsTest.php
    @@ -34,6 +34,7 @@ class ThemeSettingsTest extends KernelTestBase {
       protected function setUp() {
         parent::setUp();
    +    $this->installSchema('system', array('router'));
         // Theme settings rely on System module's system.theme.global configuration.
         $this->installConfig(array('system'));
    

    That we need this in so many kernel tests is annoying, what exactly is triggering this?

  5. +++ b/core/modules/user/src/Tests/Views/AccessRoleTest.php
    @@ -30,13 +30,14 @@ class AccessRoleTest extends AccessTestBase {
         );
         $view->save();
    +    $this->container->get('router.builder')->rebuild();
    

    Shouldn't a views save do this automatically if needed?

  6. +++ b/core/tests/Drupal/Tests/Core/Routing/NullRouteBuilder.php
    @@ -6,17 +6,17 @@
    -
    -  public function rebuildIfNeeded() {
    +  public function setRebuildNeeded() {
       }
    

    Hm, the method still exists here, left-over or do we really still have both?

dawehner’s picture

Status: Needs work » Needs review
FileSize
67.79 KB
1.2 KB

That we need this in so many kernel tests is annoying, what exactly is triggering this?

This particular example is trigged by the rebuild in ThemeHandler

Shouldn't a views save do this automatically if needed?

Well, in HEAD we call setRebuildNeeded() in View::postSave() => views_invalidate_cache() ...
The problem for these tests are probably now the one which lead to the architecture we have now. #2164367: Rebuild router as few times as possible per request
So for example in that case we save the view, which does not trigger the rebuild. In a normal request we potentially end up in a race condition that the terminate is not done yet, but another query is already executed in the test process.

Note: This is just a bunch of obvious fixes ... nothing serious.

Status: Needs review » Needs work

The last submitted patch, 144: 356399-144.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
5.29 KB
3.87 KB

Fixed the remaining failures, and addressed #1 and #2

Status: Needs review » Needs work

The last submitted patch, 146: 356399-146.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
67.97 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch, 148: 356399-148.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
67.96 KB
1.07 KB

Let's see whether things pass again when we rebuild the router again in the installer.

Status: Needs review » Needs work

The last submitted patch, 150: 3563991-150.patch, failed testing.

sidharrell’s picture

Status: Needs work » Needs review
FileSize
63.57 KB

rolled #150 against current HEAD.

Status: Needs review » Needs work

The last submitted patch, 152: optimize_the_route-356399-152.patch, failed testing.

sidharrell’s picture

Status: Needs work » Needs review
FileSize
63.53 KB
1.04 KB

move along, nothing to see here.
I did try to make an interdiff 150-152 but it broke.

Status: Needs review » Needs work

The last submitted patch, 154: optimize_the_route-356399-154.patch, failed testing.

sidharrell’s picture

Status: Needs work » Needs review
FileSize
1.26 KB
64.34 KB

found the source of the fatal error.

Status: Needs review » Needs work

The last submitted patch, 156: optimize_the_route-356399-156.patch, failed testing.

sidharrell’s picture

Status: Needs work » Needs review
FileSize
1.71 KB
64.33 KB

fixed some fails

Status: Needs review » Needs work

The last submitted patch, 158: optimize_the_route-356399-158.patch, failed testing.

dawehner’s picture

@sidharrell
I think we should avoid rebuilding the routes completly in submit handlers, are you sure that we need to add them for those cases?

sidharrell’s picture

in field_ui_entity_form_mode_presave and field_ui_entity_view_mode_presave? I changed it there cause it kept the FieldUIRouteTest from blowing up. I have no idea whether it is or is not the correct thing to do. I'm anxiously awaiting any kind of feedback, cause I know I have tons to learn about the whole system.
I was surprised that the change the other way in ThemeHandler didn't blow anything up.

Fabianx’s picture

So just a quick understanding of this patch:

- We have 'eventual consistency' taking into account that requests coming while the rebuilding request is running see the old router.
- If there is no route stored at all, the route is build just once by whoever gets the lock - so no build stampede.
- We have almost direct consistency for when the ->rebuild() is called directly, requests see the new route once the rebuild is finished.

Works for me!

sidharrell’s picture

talking over this with @dawehner in irc, we think the rebuildIfNeeded function on RouteBuilder can go away.

After talking, got to thinking:
It's not used anywhere in all of core and it really doesn't make sense with this internal change, to have the API oriented in such a way as to have client code be concerned at all with whether or not the routing table needs to be rebuilt RIGHT NOW. The client code should just maybe notify that they made a change that requires a rebuild. When that rebuild occurs, client don't care.

The form submit rebuild for the field_ui test it's probably just best to eat right now. I'm much more concerned with the results of Drupal\Tests\Core\Routing\RouteBuilderTest and Drupal\system\Tests\Routing\RouteProviderTest. Since this whole change is for a routing issue, those really should (if TDD really is the bee's knees) go to the heart of the issue.

Fabianx’s picture

#163: Fair, what about:

->rebuild()
->rebuildNow() // for tests and special cases.

instead?

With the understanding that as soon as an url() is created the router is rebuild, else just as a thing in the request.

sidharrell’s picture

I was actually looking at rebuild() earlier when banging my head against it, and it doesn't check for whether a rebuild is needed or not, it just does it, afaict. So a rebuildNow isn't necessary, client code can just call rebuild().
I may be (probably am) wrong, but I think I understand a little better now than when I started. The idea is two-fold.
Step 1 is to lower the number of times the routing table is rebuilt on each request, for a performance improvement. And that lowers the probablility of a race condition, but does not altogether eliminate it, which is why we need:
Step 2, fix the possibility of a race condition by some kind of locking mechanism.

Step 2 is where I get kinda fuzzy and have to consult wikipedia. Being a WP dev, we don't concern ourselves with such things.
"A race condition occurs when two or more threads can access shared data and they try to change it at the same time"
So each request-response lives in it's own thread, right? So are we talking about two apache(or nginx, whatevs) threads, each bootstrapping and executing, and what exactly is the shared data? The routing table stored in the DB, object cacheing? Or am I being to literal and detail oriented, and need to just let it go to the 'cache' or some other such abstraction?

Fabianx’s picture

We already have 2) the locking is in place and working fine. The race condition happened because we had a state() before that was set outside the race condition.

This eliminates state() as its only set for the current request, so should be fine.

My suggestion was to rename setRebuildNeeded() to rebuild() and rebuild() to rebuildNow(), but thats not important.

dawehner’s picture

My suggestion was to rename setRebuildNeeded() to rebuild() and rebuild() to rebuildNow(), but thats not important.

In case we do that, we should be aware, that existing code calling to rebuild() might do it on purpose, so its an API break which is IMHO more complex to debug than just a simple renamed function.

sidharrell’s picture

I'm officially stumped on the remaining fails. I'll keep poking at it, and following the issue, but I'm gonna try to find some lower hanging fruit.

dawehner’s picture

Status: Needs work » Needs review
FileSize
59.92 KB
10.97 KB

Some work on the failures.

dawehner’s picture

I recently got rid of a forced router rebuild after every module in my install profile (because there was an old call to it on default_content, it resulted in a 30% performance improvement: https://github.com/larowlan/default_content/pull/32#issuecomment-73381037). And that was *with* my ApcFileCache, that caches parsed yaml files. The expensive part is dynamic stuff like field_ui and views.

Does that mean modules which run stuff which potential could need a new router, like MenuLInkContent::preSave() should call RouteBuilder::rebuildIfNeeded() as well as default_content?
With that we would move the lazyness from the Route Provider into the calling code, which is argueably bad, but at least inserts don't happen often.

If we use the ApcFileCache for the parsing then we can drop it from here as suggested by @catch and don't need manual cache clears.

Just to be clear, the current patch indeed gets rid of the cache again and instead rely on the cache of YamlDiscovery

I'm not sure why some places call setRebuildNeeded() and most now call rebuild().

Well, places which previously called ->setRebuildNeeded should be still able to call setRebuildNeeded if we apply the logic mentioned above. Its tricky, but I think its okayish to apply additional forced rebuilds in tests.

That we need this in so many kernel tests is annoying, what exactly is triggering this?

The rebuild in ModuleInstaller::install() as far as I understand? MatcherDumper could be silent if the table does not exist.
Note: #2371709: Move the on-demand-table creation into the database API would have solved that nicely, even if hidden.

Hm, the method still exists here, left-over or do we really still have both?

Well, code still wants to trigger a rebuild at the end of the request, doesn't it?

Shouldn't a views save do this automatically if needed?

As maybe written already, it just sets the router to be rebuilt, which happens at the end of the request, which simply doesn't apply in a kernel test or a test in general.

Fabianx’s picture

+++ b/core/lib/Drupal/Core/Routing/RouteBuilder.php
@@ -227,9 +219,20 @@ public function rebuildIfNeeded() {
+    if (!isset($this->yamlDiscovery)) {
+      $this->yamlDiscovery = new YamlDiscovery('routing', $this->moduleHandler->getModuleDirectories());
     }
+    return $this->yamlDiscovery->findAll();

I don't think this is possible for the installer calling router->setRebuild() several times, but then the wrong ModuleDirectories are 'cached' in the yamlDiscovery object.

The helper is nice though.

If we think it helps a lot could compare before after moduleDirs and only create a new instance if that has changed ...

--

Overall the patch looks good.

dawehner’s picture

Issue tags: +needs profiling

Let's ensure that we don't make the installer slower as before.

I don't think this is possible for the installer calling router->setRebuild() several times, but then the wrong ModuleDirectories are 'cached' in the yamlDiscovery object.

What do you mean with possible? It happens at least the for UI installer already, because for every next batch request you have in HEAD the router marked,
and so we do a rebuild. Note: This helper method exists already in HEAD and was only introduced for better testability.
In real life a module install triggers a container rebuild, which triggers a new instance of the router builder anyway, so you afaik can never have wrong data as part of the YamlDiscovery object.

Status: Needs review » Needs work

The last submitted patch, 169: 356399-169.patch, failed testing.

Fabianx’s picture

Oh, that is a good reasoning. As long as the module handler is not synthetic, which I am sure it is not, this will work fine.

Thanks for the explanation!

dawehner’s picture

Status: Needs work » Needs review
FileSize
56.93 KB
10.37 KB

Let's fix the test failures and try to get rid of as many of the RouteBuilder->rebuild() calls. On top,
keep the existing rebuildIfNeeded calls and don't replace it with rebuild()

klausi’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Core/Routing/RouteBuilder.php
    @@ -107,22 +108,26 @@ class RouteBuilder implements RouteBuilderInterface {
    +public function setRebuildNeeded() {
    +    $this->rebuildNeeded = TRUE;
    +}
    +
    

    Indentation should be 2 spaces.

  2. +++ b/core/lib/Drupal/Core/Routing/RouteBuilder.php
    @@ -216,8 +218,8 @@ public function rebuildIfNeeded() {
       /**
        * {@inheritdoc}
        */
    -  public function setRebuildNeeded() {
    -    $this->routeBuilderIndicator->setRebuildNeeded();
    +  public function destruct() {
    +    $this->rebuildIfNeeded();
       }
    

    is this meant to be the PHP object destructor __destruct()? Then the @inheritdoc does not really fit?

    So we don't have any test coverage for this yet?

  3. +++ b/core/modules/block/src/Tests/BlockHiddenRegionTest.php
    @@ -53,6 +53,7 @@ public function testBlockNotInHiddenRegion() {
     
    +
         // Install "block_test_theme" and set it as the default theme.
    

    Unrelated change?

  4. +++ b/core/modules/system/src/Tests/Condition/CurrentThemeConditionTest.php
    @@ -22,6 +22,11 @@ class CurrentThemeConditionTest extends KernelTestBase {
    +  protected function setUp() {
    +     parent::setUp();
    +     $this->installSchema('system', array('router'));
    +  }
    

    doc block missing, indentation should be 2 spaces.

  5. +++ b/core/modules/system/src/Tests/Module/ModuleImplementsAlterTest.php
    @@ -16,6 +16,13 @@
     
    +  public static $modules = array('system');
    

    same here, need @inheritdoc

  6. +++ b/core/modules/system/src/Tests/System/InfoAlterTest.php
    @@ -18,6 +18,11 @@ class InfoAlterTest extends KernelTestBase {
     
    +  protected function setUp() {
    

    And here.

dawehner’s picture

Status: Needs work » Needs review
FileSize
56.53 KB
2.91 KB

Thank you for your quick review!!

Indentation should be 2 spaces.
doc block missing, indentation should be 2 spaces.
Unrelated change?
same here, need @inheritdoc
And here.

I'm sorry I should have just seen it

is this meant to be the PHP object destructor __destruct()? Then the @inheritdoc does not really fit?

So we don't have any test coverage for this yet?

We don't want to use the __destruct() as it has been causing issues in the past. Instead we use the \Drupal\Core\DestructableInterface which provides the destruct() method.

dawehner’s picture

FileSize
56.7 KB
565 bytes

Thank you for klausi for helping on the documentation of the destruct() method.

klausi queued 178: 356399-178.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 178: 356399-178.patch, failed testing.

klausi’s picture

Issue summary: View changes
+++ b/core/tests/Drupal/Tests/Core/Routing/RouteBuilderTest.php
@@ -258,42 +247,6 @@ public function testRebuildWithProviderBasedRoutes() {
-
-  /**
-   * Tests \Drupal\Core\Routing\RouteBuilder::rebuildIfNeeded() method.
-   */
-  public function testRebuildIfNecessary() {

Why is this test method removed completely? It should be modified to remove the routeBuilderIndicator, but otherwise stay the same?

And did you forget to remove the RouteBuilderIndicator class and RouteBuilderIndicatorInterface ? I don't think it makes sense to keep them for API compatibility, since they are now dead classes?

Otherwise looks almost ready.

klausi’s picture

Issue summary: View changes
dawehner’s picture

Status: Needs work » Needs review
FileSize
59.07 KB
3.78 KB

Why is this test method removed completely? It should be modified to remove the routeBuilderIndicator, but otherwise stay the same?

You are absolute right, this was probably removed temporarily because setRebuildIfNeeded was nota always there.

And did you forget to remove the RouteBuilderIndicator class and RouteBuilderIndicatorInterface ? I don't think it makes sense to keep them for API compatibility, since they are now dead classes?

Good point. Let's remove it!

jibran’s picture

Status: Needs review » Needs work

I think the only thing remaining here is profiling. Everything else look good.

dawehner’s picture

Status: Needs work » Needs review

Please keep it on needs review, otherwise people don't review it.

dawehner’s picture

FileSize
1.38 KB

These are tests running standard install.

Without patch

       27.99 real        20.72 user         1.22 sys
       25.02 real        20.64 user         1.17 sys
       24.62 real        20.35 user         1.12 sys

With patch

       32.93 real        26.23 user         1.41 sys
       35.04 real        28.09 user         1.60 sys
       34.33 real        27.59 user         1.52 sys

Tried to adapt the code a bit, see interdiff, but this did not helped at all:

       36.35 real        27.75 user         1.52 sys
       34.20 real        27.38 user         1.52 sys
       33.72 real        27.06 user         1.48 sys
klausi’s picture

FileSize
59.5 KB

klausi opened a new pull request for this issue.

klausi’s picture

@dawehner: Hm, so that looks bad. What exactly did you test? The whole install process? with drush si?

Experimenting a bit here with only setting the rebuild flag during install, let's see what goes wrong.

Status: Needs review » Needs work

The last submitted patch, 187: 356399.patch, failed testing.

dawehner’s picture

@dawehner: Hm, so that looks bad. What exactly did you test? The whole install process? with drush si?

Yeah, those are basic drush si calls ... with standard ... but you have to keep in mind, by doing it explicit, the other requests hitting the site,
won't have to deal with it any longer, which makes it easier for sites to recover.

@berdir suggested a different idea to test this.
Constantly request Drupal with for example ab.
While doing that enable a module or save a view, and keep looking at the load.
On HEAD, the lock should block those requests, with the patch, the rebuilding should be done explicitly, and the other requests just hit the previous router.

Experimenting a bit here with only setting the rebuild flag during install, let's see what goes wrong.

Ideally I think the logic could be:

  • Don't rebuild by default
  • Rebuild, in case any hook_install()/hook_modules_installed() ... etc. related code needs routing information.
    But I think doing that right, is tricky.
klausi’s picture

Status: Needs work » Needs review
FileSize
64.48 KB

klausi pushed some commits to the pull request.

For an interdiff please see the list of recent commits.

Status: Needs review » Needs work

The last submitted patch, 191: 356399.patch, failed testing.

dawehner’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
63.33 KB
1.4 KB

Oh wow. @klausi, do you think its okay to require people to rebuild the router manually, in case they do something with the router in their hook_install() code?

Reverted a hunk made by klausi, which make a) the patch size bigger and b) broke the unit tests. I don't think we need this change.

Another try to run the installer (time drush si -y):

Head

       24.79 real        20.28 user         1.11 sys
       25.41 real        20.38 user         1.11 sys
       24.71 real        20.48 user         1.12 sys

Patch

       24.12 real        19.96 user         1.07 sys
       25.30 real        20.21 user         1.08 sys
       24.12 real        20.07 user         1.09 sys
       24.74 real        20.62 user         1.13 sys

Status: Needs review » Needs work

The last submitted patch, 193: 356399-193.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
Issue tags: +Random test failure
FileSize
63.33 KB

This feels like a random test failures, let's reupload the patch.

Status: Needs review » Needs work

The last submitted patch, 195: 356399-193.patch, failed testing.

klausi’s picture

Status: Needs work » Needs review
FileSize
64.75 KB

klausi pushed some commits to the pull request.

For an interdiff please see the list of recent commits.

klausi’s picture

Status: Needs review » Needs work

OK, so as dawehner said I implemented a behavior change with this patch: routes are not automatically rebuild during the module install process.

That way we have two important performance improvements (or at least avoiding regressions) with this patch:
* No more lock stampede where multiple requests try to rebuild routes.
* No multiple route rebuilding during module installation

Disadvantages of this patch:
* Edge case: modules that rely on updated route information in hook_modules_installed() will have to call \Drupal::service('router.builder')->rebuildIfNeeded(); themselves.
* Some test cases need to call \Drupal::service('router.builder')->rebuild() themselves to make sure that the route information is up to date before they make requests.
* "drush en mymodule" is broken now, because drush will also have to do a \Drupal::service('router.builder')->rebuild(); ==> but this can easily be fixed in drush.

dawehner’s picture

Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Random test failure

Updated the issue summary.

I think solving the big problem of the lock is IMHO worth the disadvantages described in the issue summary.

dawehner’s picture

For benchmarking I tried the following approach.

In one terminal run: ab -n 1000 -c 10 http://d8.dev/
In another terminal run: drush ev '$view = \Drupal\views\Entity\View::load("frontpage"); $view->save();' in HEAD
or drush ev '$view = \Drupal\views\Entity\View::load("frontpage"); $view->save(); \Drupal::service("router.builder")->rebuild(); ' with the branch here.

Sadly both lead to a apache segfault.

dawehner’s picture

Alright, this time on another machine.

Went with ab -n 100 -c 4 http://d8.dev (no xdebug)

HEAD, without a view save: ~6 seconds taken for the tests
HEAD, with view saving: ~9 seconds taken for the tests
PATCH, without a view save: ~6 seconds taken for the tests
PATCH, with view save: ~7 seconds taken for the tests.

Interesting is probably the standard derivation for HEAD with view save and PATCH with view save, as it shows, how slow the individual requests are:

HEAD:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:   222  378 582.5    238    3476
Waiting:      206  355 574.6    217    3418
Total:        222  378 582.6    238    3477


PATCH:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:   221  287 130.8    240    1053
Waiting:      204  264 125.9    219    1017
Total:        221  287 130.7    240    1053

Note: The $view->save() triggers not only a router rebuild, but also some other cache rebuilds.

Wim Leers’s picture

Wow, an 80% reduction in standard deviation!

Seems like some awesome improvements are happening here :)

catch’s picture

I'd prefer to make the opt-in route rebuilding a follow-up even if this makes things slower initially.

It seems worth looking at, but prior to that change this issue is exchanging an issue that brings production sites down in Drupal 7 at least for a slower installer, which is an OK trade-off.

If we want to make the API change to speed up the installer and multiple-module installs generally, that could also be a good change but seems separate.

dawehner’s picture

FileSize
59.34 KB

Fair. Took my old patch and applied some doc changes on top of it.

Let's discuss that on #2451665: Don't rebuild the route on ModuleInstaller::install() (30% installer speedup)

klausi’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community

Ok, let's continue with the 30% installer performance regression here and work on that in #2451665: Don't rebuild the route on ModuleInstaller::install() (30% installer speedup).

RTBC, assuming the bot comes back green.

dawehner’s picture

Issue tags: -needs profiling

I guess we no longer needs profiling, we know its slower :P

Fabianx’s picture

Issue tags: +Performance regression
JulienThomas’s picture

Sorry for the "out of topic" question as I go back on the Drupal7 port. Is the speed issue only related to Drupal 8 or does it apply to 7 too?

catch’s picture

It's only relevant to the Drupal 8 patch.

JulienThomas’s picture

Thanks. when reviewing code I was thinking that but wanted confirmation. Great!

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs change record updates, +Needs change record

Looks like the following CR will need to be revised https://www.drupal.org/node/2185947 - I think once this lands it should just point to the new CR...
Also I think we need a CR for this change.

alexpott’s picture

Also I'm not certain that we should be removing the lightweight indicator service see #2352641: Break router.builder dependency - are we sure that this is the right thing to do?

alexpott’s picture

+++ b/core/lib/Drupal/Core/Routing/RouteBuilder.php
@@ -86,6 +80,13 @@ class RouteBuilder implements RouteBuilderInterface {
+   * Flag that indiciates if we should rebuild at the end of the request.

Misspelling of indicates

biperch queued 193: 356399-193.patch for re-testing.

biperch queued 195: 356399-193.patch for re-testing.

The last submitted patch, 195: 356399-193.patch, failed testing.

The last submitted patch, 193: 356399-193.patch, failed testing.

klausi’s picture

Status: Needs work » Needs review
FileSize
68.05 KB

klausi pushed some commits to the pull request.

For an interdiff please see the list of recent commits.

klausi’s picture

FileSize
10.32 KB

Regarding #2352641: Break router.builder dependency: that issue was about a service dependency circle between route builder and route provider - this patch changes route provider to not depend on route builder at all. We just forgot to remove that dependency, I did that now with the latest patch, yay!

The change notice at https://www.drupal.org/node/2185947 will only have to be revised slightly, it basically stays the same.

The CR for this issue should just mention that the RouteBuilderIndicator service is removed and that router rebuild now are never triggered at the beginning of requests when routes are looked up.

Attached is also an interdiff to dawehner's patch before (for the Github haters :-P )

Anonymous’s picture

re. #212 - yes, i do think removing that code is the right thing to do. i'm happy to work on a CR for this patch, given i pushed for the change in direction at #90.

i think this is ready, but i've been away from it for ages, so i'll leave the RTBC for someone else.

dawehner’s picture

Just wrote one, feel free to improve it.

https://www.drupal.org/node/2452735

klausi’s picture

CR looks good, made a small improvement.

@dawehner: can you sanity-check my patch and then set this to RTBC again?

klausi’s picture

Issue tags: -Needs change record
Anonymous’s picture

Status: Needs review » Reviewed & tested by the community

dawehner++

/me dares to be brave

dawehner’s picture

CR looks good, made a small improvement.

Good improvements!

Went through the patch with storm and it looked still fine.

YesCT’s picture

Issue summary: View changes

.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 218: 356399.patch, failed testing.

sidharrell’s picture

Status: Needs work » Needs review
FileSize
67.97 KB

straight reroll of 219 against head.

sidharrell’s picture

Status: Needs review » Reviewed & tested by the community

sorry, meant #218.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Committed 40d57b3 and pushed to 8.0.x. Thanks!

alexpott’s picture

Issue tags: -Needs backport to D7

Published the CR. Can someone update https://www.drupal.org/node/2185947.

  • alexpott committed 40d57b3 on 8.0.x
    Issue #356399 by dawehner, beejeebus, catch, sidharrell, klausi, nlisgo...
klausi’s picture

Fabianx’s picture

Yes! Thanks for getting this in!

dawehner’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

apaderno’s picture

Issue tags: -Performance regression