During the 8.x cycle we've introduced several known performance regressions compared to Drupal 7, which we need to resolve before release so that Drupal 8 isn't slower than Drupal 7.

This doesn't mean every single regression needs to be individually optimized away - in some cases it might be necessary to do that, or trying to micro-optimize it won't be worth the extra effort.

However there are places where we've introduced something with the intention that it will allow us to make performance or scalability enhancements elsewhere (like blocks as sub-requests for example), and introducing the performance regression without getting the nice performance feature at the end of it puts us in a not very happy place.

Opening this as a meta-issue to try to track those regressions as they get committed, along with the issues that are attempting to resolve them - making this a critical task since I'm not prepared to release Drupal 8 with obvious performance regressions compared to Drupal 7, it was bad enough from 6 to 7 and we shouldn't do that again.

Working spreadsheet

For up-to-date information on work being done on performance and caching, see the Drupal 8 performance issue spreadsheet.

High priority issues

Several new core APIs have lost optimizations from Drupal 7 and earlier where multiple objects could be loaded with a single request from the database/cache (i.e. compare CMI to variable_init()), the following issues attempt to add some kind of multiple load/pre-loading/CacheCollector support to those systems:

Routes: #2058845: Pre-load non-admin routes (related: menu tree caching #1805054: Cache localized, access filtered, URL resolved, (and rendered?) menu trees)
Configuration objects: #1880766: Preload configuration objects
State lookup: #1786490: Add caching to the state system
Plugin discovery caching: #2114319: Lots of cache requests from plugin discovery

As a generic performance improvement, there is also a focus on enabling render caching by default for all entities, this is being tracked in the D8 cacheability tag:

Other general performance issues

Use the Performance tag to find any other performance issues. Filtering by 'critical' or 'major' should find the most serious ones.

Original issues where some of the more serious regressions were originally committed

This is an incomplete list, but it would be useful to track specific commits that made performance worse - feel free to add them here. However a lot of performance issues are introduced by new APIs and only become measurable as core is converted to the new API and backwards compatibility layers removed...
#1571632: Convert regional settings to configuration system
#1290694: Provide consistency for attributes and classes arrays provided by template_preprocess()
#1599108: Allow modules to register services and subscriber services (events)
#1535868: Convert all blocks into plugins
#916388: Convert menu links into entities
#636454: Cache tag support (the minimum lifetime removal and also #1848968: Too many checksum tag queries executed by the cache backend)
#1786490: Add caching to the state system
#1272870: No semantics for nested comments / bad for screen-readers
#1696640: Implement API to unify entity properties and fields
#2102777: Allow theme_links to use routes as well as href

Comments

Title:Resolve known performance regressions in Drupal 8[meta] Resolve known performance regressions in Drupal 8

Clarifying the title.

Just as a thresholds check, the following are preventing features from being committed to D8 atm:

#1187726: Follow-up: Add caching for configuration / rework config object loading (Was: Memory usage and i/o from config objects)
#1578090: Benchmark/profile kernel
#1743590: Isolated Block Rendering

All are related to performance. Is it possible we can reduce one or more to a "major" task (only 83 of those atm) and hinge them off this critical, rather than taking up 4 slots in the critical task queue?

I bumped #1578090: Benchmark/profile kernel down to major.

The config and wscci/scotch issues deserve to be critical by themselves IMO since they're major new functionality that's seriously slowing things down atm but which both ought to be nicely fixable.

Fair enough; thanks for being flexible! :)

Issue summary:View changes

Added property API.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Added #1786490: Add caching to the state system to the list.

Edit: Closed the previous issue as a duplicate of that one.

Issue summary:View changes

Updated issue summary.

Actual HEAD does 120 SQL queries on front page load logged as admin:

  1. 45 cache backend CacheBackendInterface::getMultiple() calls including 27 for config CachedStorage::read()
  2. 34 for cache backend CacheBackendInterface::checksumTags()
  3. 15 for the key value store getMultiple() (actually triggered by get()) triggered by menu API functions mostly
  4. The rest is random API pieces triggering queries for building stuff

Just posting that as a brief synthesis of some XDebug traces I just made.

DatabaseStorageController::getFieldDefinitions surely could use some optimization, likely in the form of some well deserved caching.

Issue summary:View changes

Added two twig related issues now that twig is in.

Issue summary:View changes

Updated issue summary.

Working a bit on the config and state caches. On a frontpage with 10 nodes from users with profile pictures, I'm seeing 204 queries (102 cache, 48 state).

With the latest patches from #1187726: Follow-up: Add caching for configuration / rework config object loading (Was: Memory usage and i/o from config objects) and #1786490: Add caching to the state system, I'm down to 91 (37 cache, zero state).

Also, the checksumTags() bug was identified and fixed a few days ago. That is down to 10 of the queries.

Issue summary:View changes

Updated issue summary.

so, #1535868: Convert all blocks into plugins slowed down head by 15%, attempting to get some of that back in #1880766: Preload configuration objects

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

add blocks as regression.

Issue summary:View changes

Updated issue summary.

According to Alex's findings at #914382-145: Contextual links incompatible with render cache, D8 is now about 500% slower than stock D7. We should start ramping up our efforts in this area.

#10 was viewing the front page with 5 node teasers. I just ran some numbers again (ab -c1 -n100) with no front page content at all (just hitting the front page immediately after a Standard profile install).

7.x HEAD: 61ms
8.x HEAD: 222ms (+264%)

If someone gets a substantially different ratio on a different machine, please share.

I took a look at the front page difference and I also got a huge difference.

The biggest chunk I found was all EntityNG. 50ms alone spent in the magic methods.

Yeah - right now we've BC-mode in use, what means we've an extra mapping layer on *each* entity property read or write. We need to move on with conversions such that we can remove that.

Category:task» bug

222ms makes this a bug...

#1855260: Page caching broken by accept header-based routing is going to bite us if we don't get that figured out.

Some more info:

- I tried it again today, and got 230ms. Not sure if the extra 8ms is due to HEAD changes since #11, or random factors on my computer.
- If in _node_add_access(), I hard code a return FALSE at the top, that drops it down to 185ms. That's a way to isolate the effects of #1979094: Separate create access operation entity access controllers to avoid costly EntityNG instantiation and let us look for what other causes of regression there are.
- If in my settings.php, I uncomment the $settings['class_loader'] = 'apc'; line, that drops it down to 166ms. Yay for there being an easy way to remove autoloader inefficiency!
- 382 PHP files are loaded to show an anonymous home page with no content. And that's even with the early return in _node_add_access() mentioned above. Yowza! I thought that simply the require on that many files was a huge factor. But it turns out not to be. Changing my apc.stat configuration to 0 was able to shave off 5ms. And timing a script that just does a require on those files turned out to only be another ~10-20ms. I need to rewrite and rerun the script to get a more precise number, and will post that when I do, but the good news is that simply loading all that extra Symfony code and OOP Drupal code isn't where our biggest problems are.

Opened #1983114: Make the autoloader swappable, ideally we'll allow for contrib to provide alternative autoloaders.

Also I hope everyone who thinks autoloading is done for performance reasons reads #17 ten times and repents.

Not sure if it is relevant or not, but when we moved composer.json to the top level, we didn't rebuild the composer autoloader.
So all the paths specified in it are wrong.
You can see that in #1959660: Replace xpath() with WebTestBase::cssSelect() by leveraging Symfony CssSelector which is the first issue since then to add a new entry to composer.json entry and run composer update.
Could be a factor?

@effulgentsia: I still can't reproduce your numbers, not even remotely. Can you provide some more information about your setup?

With uid 1 and no nodes on the frontpage, I get "Executed 137 queries in 10.14 ms. Queries exceeding 5 ms are highlighted. Page execution time was 134.94 ms. Memory used at: devel_boot()=4.61 MB, devel_shutdown()=13.81 MB, PHP peak=14 MB." That varies a bit, but not a lot. ab on frontpage is 96.918ms, a 404 page is 54ms.

- Is this a laptop, with/without power plugged in? (I have huge differences with and without power, @dawehner for example didn't)
- Is xhprof/xdebug enabled?
- How many queries, how long do they take? I do have a somewhat optimized mysql configuration and my queries are quite fast, given the number of them.
- When you test the front page, that means we still have to load and execute the view, and that's a considerably higher overhead than the old node_default_page() which was just a single query. A lot of that is one time overhead and is less and less relevant as you display more views/content. Might make more sense to compare a page that hasn't changed that much, e.g. 404.
- I'm also not seeing a big difference when I add the return FALSE to node_access(), possibly that's due to the entity field definitions cache that was commited today.
- Can you check how #1786490: Add caching to the state system and #1971158-15: Follow-up: Add loadMultiple() and listAll() caching to (cached) config storage affect those numbers? The second one only gets interesting with a lot of config files and configurations so you will probably not a see big difference with that but it's huge with real, large sites.

Here's what I came up with this morning for a default front page with no content:

http://rpubs.com/msonnabaum/d8d7_response_times

That's with both xdebug and xhprof disabled, just throwing the output of microtime into a csv.

So there's clearly some xhprof overhead we're seeing, but it goes both ways. The difference is still rather staggering.

We could also compare Drupal 6 + views with an empty node view, to Drupal 7 front page, which would allow us to quantify more of the non views related changes (of course views is changed also in D7, but I think the performance profile is probably still pretty similar).

I have also used the login form as a comparative benchmark in the past - it does a bit more work than a 404.

See https://github.com/symfony/symfony/pull/8081 adds 4% performance gain in class loader

http://drupal.org/node/1427826 contains instructions for updating the issue summary with the summary template.

Issue summary:View changes

Added the 'menu links as entities' issue.

Issue summary:View changes

Added #2002094 to performance improvements

Issue summary:View changes

Added #2002104 to performance improvements

Issue summary:View changes

Added #2002108 to performance improvements

Issue summary:View changes

Added #2002222 to perf improvements

I did some benchmarking yesterday comparing Drupal 7.22 with D8 dev. You can see more details here: http://www.netstudio.gr/en/blog/early-drupal-7-vs-drupal-8-performance-c....

In fact, I found this issue through a comment on the above blog post.

Added #2029075: Configuration translation step in the installation takes a reeeeaaallly long time when installing in a non-English language to the issue summary. For non-English installs, this might be the most noticeable performance regression at all.

Issue summary:View changes

Updated issue summary.

FWIW, on my machine (PHP 5.3, APC 3.1.9), I had

  • 61ms for 7.x right after standard install, no caching
  • 7ms for 7.x after enabling the page cache
  • 132ms for D8 right after standard install, no caching
  • 42ms for D8 with the page cache and the normal classloader
  • 33ms for D8 with the page cache and the APC classloader

fgm's metrics really look devastating.

But if we don't fear the results of a realistic comparison, we really should really define a number of representative configurations (e.g., first impression, basic site, typical site, feature-rich site, data-intensive site) plus a few targets plus two or three concurrency levels, and then start profiling all of these both continuously and automatically.

If we don't define fair profiling configurations ourselves, slightly simplistic comparisons that don't take D7 contrib into account, like the ones by Yannis or fgm, or by completely unexperienced people, will make performance parity an impossible goal, and might finally hurt our reputation regarding performance.
Even more now that Symfony2 hit the headlines for being an exceptionally slow framework. Would be nice to demonstrate that we're selectively leveraging "the best" from different PHP frameworks and are not bound to be even slower.

In the end, I'd really like to see a graph that nicely displays how performance improves from week to week, and in a few cases the D8 configuration would outperform the D7 one, in others it would stay behind, but altogether it would remain comparable. That should be our goal.

I'm currently working on automating Drupal 8 builds via Chef and Vagrant, and I should be done with that today or tomorrow. At that point I'll be building out a basket of representative D7 vs. D8 performance test sites over the coming weeks. My current targets include:

  • Stock install: Out of the box D8 vs. D7, no content on the site.
  • Brochure site: Mostly static D8 vs. D7+now-in-core modules. Basic Views, Panels, and content types.
  • Custom site: Larger, cache-backed site with some roles, configuration, custom blocks, etc.
  • Dynamic site: The above, but with dynamic content/conditions driven by simulated user registration, content creation, batch runs of VBOs, etc. Behind Varnish, and ideally leveraging ESI support (not sure how far along D8 is with this, though).

It'd be cool to get feedback about how to structure the environments, what modules to base them off of, what tests to run, and so on.

@Eronarn:
That really sounds awesome!
Out of the blue, I can't exactly say which configurations would be the most relevant and correct, but we should have at least one multilingual configuration that extensively uses Entity translation, i18n and all the additional stuff we don't need anymore in D8.

Generally, we should leverage some of the more popular contrib modules that have been included into D8 core or which aren't necessary anymore. Instantly, these come into my mind:
WYSIWYG + CKEditor, Date module, Entity API, Entity reference, Entity Translation, Views, Profile2, Context, Administration Menu, Diff, RESTful API... what else?

[edit:] Removed Profile2 from the list - it's yet to be included: #1668292: Move simplified Profile2 module into core

#31

what else?

George Clooney.

#31

Thanks for the reminder about translation. That definitely wasn't on my radar, but is an important consideration. I will include a frontend performance monitoring component of this, so it should also be feasible to monitor node editing performance, including WYSIWYG.

Has anyone heard of Drush Make being ported to D8? Drush itself is fine, but the latter doesn't seem very functional right now. I could just tarball an entire site, but it'd be nice to something more easily versioned that I can point people to.

It would be great to have a benchmark target (or targets) that could be used for different benchmarking and instrumentation activities. I think the "Dynamic site" is probably the biggest win, since it is the kind of site that causes most scaling challenges (in addition to just page load performance) Brochure type sites rarely have scaling challenges in my experience (although .

Given the rate of D8 development, I was assuming a script to configure the content structure and populate dummy content is pretty much a requirement - I doubt a database snapshot will last for long before schema changes break it. Not sure I understand using drush make with D8 yet - are there sufficient stable & API chasing contrib modules to make this worthwhile?

I'd prefer scripting using Drush Make plus some post-processing setup script leveraging Drush because that means the build is more standardized and easier to contribute to. It's not a requirement by any means, just intended to make it easier for people other than me to contribute to the build (pretty annoying to do if it's a huge git tree with all of core in it). If anyone has other suggestions, I'm totally open.

I don't think there are many ported contrib D8 modules yet (it's a pretty miserable process - I did this for Tracelytics/TraceView for DrupalCon and it already needs extensive rewrites). However, I want this to be something that will be run over the course of several months (probably will start off with the alpha releases but maybe move to nightlies if enough people are interested in setting up nodes), and I'm hopeful that we'll see more D8 contrib alphas and betas by then.

EDIT: Note that drush already works with cron, devel generate, etc. in D8. So that part of the scripting will be trivial.

Awesome - totally agree that a scripted setup is what we need - probably devel generate will need some love. I think the make file will pretty much just be 2 lines that point at core (for D8 anyway, at least to start with), but it will do the job just fine :)

I wonder if it would be best to split performance test targets out as a separate issue (if there isn't an existing one), since this is supposed to be meta.

hello! PHP 5.5 / Apache 2.4.4 here

Fresh install d8 and d7: (logged in)

Overall Summary D7 D8
Total Incl. Wall Time (microsec): 132,380 microsecs 256,260 microsecs
Total Incl. CPU (microsecs): 128,000 microsecs 248,000 microsecs
Total Incl. MemUse (bytes): 15,651,360 bytes 11,777,208 bytes
Total Incl. PeakMemUse (bytes): 15,821,408 bytes 11,790,360 bytes
Number of Function Calls: 8,868 56,956

Like the memory usage?:)
(wasnt able to disable xdebug - had some nice segfaults once i did:P buggy 2.4.4 still so this should affect stuff)

I just did this for fun, mostly to check php5.5 and zend opcode cache..i found it interesting, so i posted it. i know that we cant actually compare vanilla d8 and d7 and also tests should be run with some content.
i am only posting it cause i found the memory usage interesting (which means that oop and autoloading stuff seems to work)

Apache 2.4.4 here

With Apache 2.4.4, I'm guessing that the req/s you're seeing here isn't Drupal delivering the page, it's the built-in cache of Apache 2.4 (similar to nginx or Varnish). I know different servers will get different results, but there's no way Drupal/PHP/MySQL is going to deliver 7800 req/s, even with Drupal's page cache.

edit: i removed ab results, apache obviously lies and it makes no sense anyway. just left the memory usage result, which was the only point i wanted to make:)

If doing profiling, it's best to enable the APC ClassLoader in settings.php first - that's a known performance regression of dozens if not hundreds of milliseconds with an existing workaround.

FWIW, I did various measurements for my presentation at DevDays Dublin, and here are the differences for an anonymous home (10k hits/ concurrency 10):

Default classloader:

Time taken for tests: 46.318 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Requests per second: 215.90 [#/sec] (mean)
Time per request: 46.318 [ms] (mean)
Time per request: 4.632 [ms] (mean, across all concurrent requests)
Transfer rate: 1689.04 [Kbytes/sec] received

APC Classloader:

Time taken for tests: 32.602 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Requests per second: 306.73 [#/sec] (mean)
Time per request: 32.602 [ms] (mean)
Time per request: 3.260 [ms] (mean, across all concurrent requests)
Transfer rate: 2399.61 [Kbytes/sec] received

So this does indeed mean at 14ms (30%) speedup over the default classloader.

Instead of, or in addition to APC, perhaps Zend Opcache should be tested, as the php team included opcache instead of APC into php core (meaning noone will use APC after 5.5 :D )
https://blogs.oracle.com/opal/entry/using_php_5_5_s
I already isntalled php-opcache easily with yum for example for php 5.3.

Indeed on PHP 5.5 this is a must. If I understand correctly, there might not even be an official release of APC for 5.5, while OpCache is provided as a default, something APC never was.

The APC classloader users the user cache provided by APC. This is completely different to opcode caching and Zend OpCache has no equivalent.

If ZendOpcache is used as an opcode cache, you can't also have APC installed to use the user cache, but there's the new APCu fork which just provides the user cache an is compatible.

Interesting to have a fork reducing functionality, that's not so common. It is available on PECL:
- http://pecl.php.net/package/APCu

...and the code is on Github:
- https://github.com/krakjoe/apcu

ISTR seeing on php-internals discussion about how the APC allocator was specifically optimized for its opcode caching tasks, and was not a good fit for user caching because of the difference in cache access patterns. The currently existing commits do not appear to have changed the logic, focusing on the removal of opcode-related features and general cleanup.

Also see #2023325-15: Classloader isn't swappable for my design of a class loader that is not stupidly inefficient and supports namespace introspection.

@Damien Tournoud: the stream wrapper idea is nice in theory, but implementing one means lots of low-level methods to implement, although most of them are individually rather simple. But the very fact that they are needed suggests lots of inter-method (hence user-space) calls.

This is a much more involved interface than a class loader, and I do not see what other parts of our code base could make use of this particular name space to justify the extra code involved. Of course, as always, this would need to be benchmarked against the alternatives.

@fgm: you are talking about the cold path. The hot path has zero userspace intervention, which is the whole point.

Issue summary:View changes

Removing 1187726, beejeebus said its no longer relevant.

Issue summary:View changes

Removed 2033501 since that issue isn't directly related to a performance regression and the benefit is questionable.

In the interest of keeping this issue focussed so we can work from it, I went ahead and removed some issues from the summary that were either no longer relevant or not directly related to a performance regression in d8.

Issue summary:View changes

Removing 1800286 since it's not a D8 regression.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary

Issue tags:+Prague Hard Problems

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Adding another issue.

FYI, a small D8 Performance team has started meeting weekly to make progress on these issues. We use this Google Doc to help us prioritize, assign, etc. You can see issues that we recently fixed in that doc. If anyone wants to join the meeting, please contact me.

@moshe, I'd like to be involved, timezones Permitting

Issue summary:View changes