Problem/Motivation

I am profiling an application in production and it looks like always sorting cache tags/context whenever a new dependency is added has an unexpected side-effect on page performance, sometimes it can be considered a performance killer. (See cache_tags_sort_perfomance_issue.png below).

I raised the question on Slack why it is necessary, why on-demand sorting is not enough?

It seems the order of cache _contexts_ matters when cache ids are generated from them (see \Drupal\Core\Render\RenderCache::createCacheID()), but is there any other place in core (except tests) where the order changes anything? Update: it seems there is none.

As @berdir highlighted in #14, core components can also benefit from this change. Cacheability is always calculated/collected, yet sometimes it is not used.

Proposed resolution

Sort cache tags/context on-demand (only where it is actually needed).

(In 10.x version consider using lazy collections instead of arrays.)

Remaining tasks

1. Investigate if keeping cache tags in sorted order is inevitable or on-demand sorting can work. Based on the current state of the MR974, on-demand can work.
2. Investigate if keeping cache contexts in sorted order is inevitable or on-demand sorting can work. Based on the current state of the MR974, on-demand can work.

User interface changes

None.

API changes

However odd it sounds, possibly removing sorting can be considered a breaking change. TBD

Data model changes

None.

Release notes snippet

Cache contexts and tags are not sorted anymore when they are merged.

Issue fork drupal-3225328

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mxr576 created an issue. See original summary.

mxr576’s picture

Issue summary: View changes

Fingers crossed that only those tests will fail that also assert on the order of elements.

mxr576’s picture

Issue tags: +Performance

Getting better and better... :)

Berdir’s picture

Following, will have a look.

I've seen this too and I agree that we can definitely improve things. Especially in big tables, the merging and so on that is going on is immense.

That said, blackfire tends to massively over-report quick function calls that are called many times. The time you can gain from it is likely a lot less impressive than what blackfire says. That said, even just getting rid of the noise in blackfire reports will make it easier to identify possibly . And it does cost something.

mxr576’s picture

@berdir, thanks for having this on your list! I completely agree with what you wrote, yes, this could be a micro-optimization, I also would like to see if this patch changes anything on the site where I captured this screenshot. Although, even if it is a micro-optimization, I do not see why functions/methods that are called "mergeXY" should do more than "merge".

mxr576’s picture

Okay, so now I have numbers... drum roll!

Time view (before/after):
time view

CPU view (comparision):
CPU view

I have only captured the relevant parts, the page is loading 32% less time and uses 32% fewer resources than before. I do not think we have changed anything else since the previous release other than adding this patch that would contribute to this improvement, but for this issue what matters is what you can see on the screenshots above.

mxr576’s picture

Btw, I guess PHPBench could be also used to profile these (and other) methods in Drupal core. I am not sure if anybody is using that for core development. sort() becomes much slower and more resource hungry as the dataset grows and becomes more random.

longwave’s picture

I think the comment and/or assertion needs updating in EntityUnitTest, otherwise this is looking promising.

Should we also try to figure out why that other test doesn't work with assertEqualsCanonicalizing?

Berdir’s picture

I would recommend you compare with something like plain ApacheBench to get a better understanding of the real difference and make sure you disable both blackfire and xdebug extensions. Because those absolute numbers are pretty wild, is that page really *that* slow?

And if you have enough cache tags to to have a problematically large dataset for sort, then you have a other problems too. Because querying the checksum backend to verify that they are still up to date is going to be far slower than a sort. It's more likely to be many, many calls to sort with a small array.

longwave’s picture

159,000 calls to Cache::mergeTags() is quite a lot. I don't know how many tags there are on average here, but some sample runs of sort() alone with different data set sizes:

$ time php -r '$a = range(0, 100); for ($i = 0; $i < 159000; $i++) sort($a);'

real	0m0.497s
user	0m0.488s
sys	0m0.008s

$ time php -r '$a = range(0, 200); for ($i = 0; $i < 159000; $i++) sort($a);'

real	0m1.177s
user	0m1.157s
sys	0m0.020s

$ time php -r '$a = range(0, 500); for ($i = 0; $i < 159000; $i++) sort($a);'

real	0m3.428s
user	0m3.404s
sys	0m0.024s

$ time php -r '$a = range(0, 1000); for ($i = 0; $i < 159000; $i++) sort($a);'

real	0m8.019s
user	0m7.999s
sys	0m0.020s

If we can eliminate all these then it would seem to be a good improvement, even in the 100 tags case.

mxr576’s picture

> Because those absolute numbers are pretty wild, is that page really *that* slow?

Unfortunately, yes, otherwise I would go down this long road for patching a fundamental piece of core :)

My guess is only 1-5% of Drupal >8 sites you can see this performance issue caused by sorting because most sites when rendering a huge list of something that has complex cacheability are either:
* using a pager, what we could also, but that would not work with the expected design
* exposing this information on an admin UI where proper cacheability information "does not matter" so it is not calculated (at least I see a tendency here and there for this)
* using lazy builders to speed up the page
* or go with (progressively) decoupled for the same reason as the previous

Possibly we also go with (progressively) decoupled later, but for now, this patch (and other project-specific changes) improved the speed of the page in question drastically.

Thanks @longwave, #11 proves perfectly the problem with sort().

mxr576’s picture

Next, I am answering PR comments inline, but in general, what I was experienced while I was fixing these tests that several tests asserted on the order the cache tags/context but I believe they must not. They tested actual implementation instead of expected behavior because in a given context it should/does not matter if cache tags are bar config:baz entity:foo or entity:foo bar config:baz , what matters is that cacheability is set/inherited/calculated properly and the expected cache tags or contexts are available/set.

Berdir’s picture

> They tested actual implementation instead of expected behavior because in a given context it should/does not matter if cache tags are bar config:baz entity:foo or entity:foo bar config:baz , what matters is that cacheability is set/inherited/calculated properly and the expected cache tags or contexts are available/set.

Agreed and IMHO several of my commands go in that direction. (Exception are headers, where we still sort and should also test it like that IMHO) That we change the expected order instead of asserting in an order-agnostic way, like you did with many of the tests. The difference is that so far, tests could assume an alphabetic order and hardcode that, now they need to be explicitly not explicit about the order ;)

> * exposing this information on an admin UI where proper cacheability information "does not matter" so it is not calculated (at least I see a tendency here and there for this)

Note: Cacheability metadata is always collected, even if we decide that we don't cache it. regular render cache is active everywhere, only dynamic page cache explicitly excludes admin pages. But that's just the final decision on whether or not it _uses_ that info. It's there.

So one page where you should see a benefit with just core would be the permission table or views tables with a lot of rows.

mxr576’s picture

> Note: Cacheability metadata is always collected, even if we decide that we don't cache it. regular render cache is active everywhere, only dynamic page cache explicitly excludes admin pages. But that's just the final decision on whether or not it _uses_ that info. It's there.

Yes, yes, yes, you are absolutely correct, I did not want to go into details about this here, yes cacheability is collected but paths starting with `/admin` are not cached and I have a feeling/experience that due to that some devs do not add the proper cacheability information to the result... which becomes a huge problem when a page's path changes from `/admin/foo` to `foo`. But as I said, I really would like to stick with the main topic here :)

Berdir’s picture

Maybe you misunderstood me. I'm trying to point out that this will actually help on /admin pages too and might have a measurable impact on certain pages out of the box, without having the same rather special use case/scenario as you do. Which will make it easier get this in, if it's not just for pages that maybe shouldn't be built the way they are.

mxr576’s picture

Yes, I did not see that... (no excuses) Thanks!

mxr576’s picture

Issue summary: View changes

The new PR update is also all green \o/.
Updated the issue description and maybe it is time to start thinking about if this is considered an API change or not, can it land in 9.x or only in 10.x?

catch’s picture

We don't count data structures as API, and the order of cache tags/contexts is barely a data structure, so this could land in a minor release. Caveat that I haven't actually reviewed the patch yet.

mxr576’s picture

meh, \Drupal\Tests\Core\Render\RendererTestBase::assertRenderCacheItem() is still going to fail, first because I have forgotten to fix it, second it is challenging fixing that because assertEquals() asserts on array keys too... although assertEqualsCanonicalizing() does too much.

mxr576’s picture

Title: Keeping cache tags/contexts sorted is a performance killer » Improve page performance by sorting cache contexts/tags on-demand
Issue summary: View changes
mxr576’s picture

mxr576’s picture

Issue tags: +Needs change record

Just spotted this in a contrib, so even if this lands in 9.x (which I would be glad) then a CR could be necessary because contribs were also adopted the same (bad) patterns
https://git.drupalcode.org/project/twig_tweak/-/blob/3.1.2/tests/src/Ker...

longwave’s picture

Status: Needs review » Reviewed & tested by the community

All review points have been addressed; ultimately this is a simple change to the runtime code with lots of changes to supporting tests, so over to core committers to see what they think.

  • catch committed 6951f4d on 9.3.x
    Issue #3225328 by mxr576, Berdir, longwave: Improve page performance by...
catch’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: -Needs change record

Patch looks good to me, I added a change record to help out any failing contrib tests.

While it's tricky to quantify the exact impact of removing function calls on raw performance due to profiling exaggerating the savings, removing hundreds or thousands of function calls from requests also removes noise from profiling - making it easier to spot other issues. So it's generally worth doing either way.

Committed 6951f4d and pushed to 9.3.x. Thanks!

Wim Leers’s picture

Very nice!

We intentionally did this back in #2429257: Bubble cache contexts to optimize for DX until proven otherwise. It was proven otherwise here, in pretty extreme situations.

Also: TIL assertEqualsCanonicalizing()!

mxr576’s picture

\o/ Thank you!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.