Opening this issue for general benchmarks / profiling of HEAD - in the hope we can find some of the major bottlenecks before code freeze when it's still possible to get big API changes in.

To get started, I set up a Drupal 6 site, then upgraded it to D7 - with 1,000 nodes.

Front page, 10 nodes, ab -c1 n500

D6: 8.80 #/sec

D7: 7.86 #/sec

node/1
D6: 15.54 #/sec

D7: 12.31 #/sec

webgrind screenshots for /node attached.

I'd really like to see some benchmarks on db_insert() / node_save() as well.

Files: 
CommentFileSizeAuthor
#47 parser.patch749 bytescatch
#10 bottle.patch9.11 KBcatch
webgrind-7.png1.21 MBcatch
webgrind-6.png910.34 KBcatch

Comments

Subscribing.

/me subscribes

Here's a fun one - we do the equivalent of rendering /admin (almost) on every page request in core at the moment #520364: system_admin_menu_block_access() makes no sense

HEAD means that the D7 performance patches that are committed are included? That is still between 10% to 20% slower than D6, though an improvement over the benchmarks that were done by kbahey many moons ago (http://2bits.com/articles/performance-benchmarking-drupal-512-drupal-66-...), still a compelling reason to look for further performance improvements. Subscribing.

HEAD means the latest from CVS, so yes all committed patches are taken into account.

#521838: Clean up drupal_get_schema_versions()

@catch: I thought D7 was now faster then D6 on frontpage view. We broke something in the process?

Status:Active» Needs work
StatusFileSize
new9.11 KB

So some of these patches look a bit like micro-optimization, and I didn't post ab benchmarks for them individually, so adding them up. ab results are going to be less impressive than the profiler, because we're measuring the entirety of the http request, not just PHP, but still about 7% faster overall.

Front page, 10 nodes, ab -c1 -n500

HEAD:

Time taken for tests:   69.576 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      9290500 bytes
HTML transferred:       9058000 bytes
Requests per second:    7.19 [#/sec] (mean)
Time per request:       139.152 [ms] (mean)
Time per request:       139.152 [ms] (mean, across all concurrent requests)
Transfer rate:          130.40 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   131  139  20.1    135     526
Waiting:      126  134  19.3    130     501
Total:        132  139  20.1    135     526

Patch:

Time taken for tests:   64.616 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      9290500 bytes
HTML transferred:       9058000 bytes
Requests per second:    7.74 [#/sec] (mean)
Time per request:       129.232 [ms] (mean)
Time per request:       129.232 [ms] (mean, across all concurrent requests)
Transfer rate:          140.41 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   123  129   9.2    126     202
Waiting:      118  124   9.1    121     195
Total:        123  129   9.2    126     202

subscribe! Um. Yikes! :)

Another registry-related one #528856: Optimize module_implements().

A few patterns emerging:

1. Calling a function which takes 'no time at all', is a lot slower than not calling the function, because simply calling a function in the first place is relatively expensive once you get 50 calls to find exactly the same thing. We have a few places in core where we'll run defined('MAINTENANCE_MODE') 50 or 100 times inside the same function, and that can take up 50% of the execution time of the whole thing, replacing those 50 calls with 1 + a static saves the cost.

2. In 99% of cases where drupal_function_exists() is called, the function already exists. A couple of patches change:

<?php
if (drupal_function_exists($function)) {
?>

to
<?php
if (function_exists($function) || drupal_function_exists($function)) {
?>

Which removes the nested function call. i.e. #528896: Small optimization to module_invoke_all()

3. If drupal_static() is being used for more than one variable in a function, we should look at using

<?php
$var
= &drupal_static(__FUNCTION__, array(
 
'var1' => 'foo',
 
'var2' =>  'bar',
));
?>

<?php
$var1
= &drupal_static(__FUNCTION__);
$var2  = &drupal_static(__FUNCTION__ . ':foo', FALSE);
?>

See #523298: Rationalize drupal_static() usage in drupal_lookup_path() for more.

Sub. Really, really great and important work catch.

Loading 300 comments is now twice as slow as loading 300 nodes in HEAD. I was about to blame the rendering and fieldable comments patches, but they were both benchmarked, and webgrind suggests it's not those at all:

Nodes:

Document Path:          /
Document Length:        229372 bytes
Concurrency Level:      1
Time taken for tests:   528.784 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      114918500 bytes
HTML transferred:       114686000 bytes
Requests per second:    0.95 [#/sec] (mean)
Time per request:       1057.568 [ms] (mean)
Time per request:       1057.568 [ms] (mean, across all concurrent requests)
Transfer rate:          212.23 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   990 1057  64.6   1038    1481
Waiting:      978 1043  62.8   1025    1461
Total:        990 1058  64.6   1038    1481

Comments:

Server Hostname:        d7.7
Server Port:            80
Document Path:          /node/6
Document Length:        685615 bytes
Concurrency Level:      1
Time taken for tests:   120.859 seconds
Complete requests:      65
Failed requests:        0
Write errors:           0
Total transferred:      44595200 bytes
HTML transferred:       44564975 bytes
Requests per second:    0.54 [#/sec] (mean)
Time per request:       1859.363 [ms] (mean)
Time per request:       1859.363 [ms] (mean, across all concurrent requests)
Transfer rate:          360.34 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  1724 1847  81.2   1836    2111
Waiting:     1703 1825  80.7   1814    2088
Total:       1724 1847  81.2   1836    2111
Percentage of the requests served within a certain time (ms)
  50%   1833
  66%   1866
  75%   1903
  80%   1925
  90%   1956
  95%   1992
  98%   2016
  99%   2111
100%   2111 (longest request)

NB, attached webgrind screenshots are logged out (but no page caching obviously) since when logged in the additional links on comments (delete, edit, reply) add a lot of overhead due to url(), drupal_attributes(), check_plain() and others.

#537110: Optimize theme_username()
#537094: Optimize template_preprocess() (300% speedup)

@Subscribers, this queue could really do with a bit less yellow and a bit more green ;)

Catch, you forgot to attach screenshots in #16 :)

perhaps catch is doing benchmarks with xdebug profiler enabled. if so, this certainly adds a little time to each function call. to be safe, it would be best if xdebug extension was not even present. the act of observing may be perturbing the subject. heisenberg alert!

OK here's some clean, isolated benchmarks, followed Crell's method from http://www.garfieldtech.com/blog/magic-benchmarks

APC on, xdebug off.

<?php
error_reporting
(E_ALL | E_STRICT);
define('ITERATIONS', 2000000);
$start = microtime(true);
for (
$i=0; $i < ITERATIONS; ++$i) {
}
$stop = microtime(true);
echo
"no-op: " . ($stop - $start) . " seconds". PHP_EOL;
$start = microtime(true);
for (
$i=0; $i < ITERATIONS; ++$i) {
 
function_exists('count');
}
$stop = microtime(true);
echo
"function_exists(): " . ($stop - $start) . " seconds". PHP_EOL;
$start = microtime(true);
for (
$i=0; $i < ITERATIONS; ++$i) {
  static
$function;
  if (!isset(
$function)) {
   
$function = function_exists('count');
  }
}
$stop = microtime(true);
echo
"static/if/isset: " . ($stop - $start) . " seconds". PHP_EOL;
?>

no-op: 0.442554950714 seconds
function_exists(): 1.27987313271 seconds
static/if/isset: 0.947004795074 seconds

So, calling the function at all, takes 50% of the time - this is consistent with #523058: Optimize check_plain() where just replacing a function call with inline code sped things up 50%.

Doing a static/if/isset instead of the function call each time brings total call cost down by around 25% - again, more or less consistent with the profiling I've been doing on these other issues.

There's not much about why this is the case via google, but this post at least has an explanation rather than just putting it down to voodoo like most others:

1) When working with strings and you need to check that the string is either of a certain length you'd understandably would want to use the strlen() function. This function is pretty quick since it's operation does not perform any calculation but merely return the already known length of a string available in the zval structure (internal C struct used to store variables in PHP). However because strlen() is a function it is still somewhat slow because the function call requires several operations such as lowercase & hashtable lookup followed by the execution of said function. In some instance you can improve the speed of your code by using a isset() trick.

Ex.
if (strlen($foo) < 5) { echo "Foo is too short"; }
vs.
if (!isset($foo{5})) { echo "Foo is too short"; }

Calling isset() happens to be faster then strlen() because unlike strlen(), isset() is a language construct and not a function meaning that it's execution does not require function lookups and lowercase. This means you have virtually no overhead on top of the actual code that determines the string's length.

http://ilia.ws/archives/12-PHP-Optimization-Tricks.html

Here's comments vs. nodes without xdebug enabled. I leave the xdebug extension enabled but not the profiler usually. xdebug has some overhead just from being on, but it looks to me like the relative values are more or less equivalent.

300 nodes:

Concurrency Level:      1
Time taken for tests:   372.207 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      114918500 bytes
HTML transferred:       114686000 bytes
Requests per second:    1.34 [#/sec] (mean)
Time per request:       744.414 [ms] (mean)
Time per request:       744.414 [ms] (mean, across all concurrent requests)
Transfer rate:          301.51 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   681  744 641.0    702   15001
Waiting:      664  726 633.6    685   14822
Total:        681  744 641.0    702   15001

Comments:

Document Path:          /node/6
Document Length:        685615 bytes
Concurrency Level:      1
Time taken for tests:   613.541 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      343040000 bytes
HTML transferred:       342807500 bytes
Requests per second:    0.81 [#/sec] (mean)
Time per request:       1227.083 [ms] (mean)
Time per request:       1227.083 [ms] (mean, across all concurrent requests)
Transfer rate:          546.01 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  1127 1227 317.5   1194    8164
Waiting:     1105 1204 315.9   1171    8109
Total:       1127 1227 317.5   1194    8164

If function calls are toxic, is there any interest in my hooks compiler? It uses the registry to compile all potential hooks to direct function calls.

Most hook implementations aren't too much of an issue, since the results are statically cached in a lot of situations anyway - i.e. node_load() might get called 600 times when viewing a lot of comments, but since it's all on the same nid, hook_node_load() gets called once. Something like custom_url_rewrite() as a hook or similar is likely to break that though - where is this hook compiler?

Title:Find bottlenecks in HEAD - meta issuephp optimisation
Category:task» feature

Hi,

I posted some comments about php-optimisation and was told that my comments are duplicate, because that is handled in this branche.
Well since i am not familiar with drupal-development, i am not sure what to do.

I have found many function-calls inside a loop-declaration. Like: foreach(func() as $item){}. func() is now called any loop.....
Then there is the single vs double quote issues. I loved the doublequoting, but it seems to be slower.
There are more things (see my post and the article) but these are the most important I think.

More info in my post: #538732: Function calls in Loops etc. (php optimisation)

Greetz, Joly

Title:php optimisationFind bottlenecks in HEAD - meta issue
Category:feature» task

oops

"foreach(func() as $item){}. func() is now called any loop"
That's just not true, func() is called once. The article you refer to is about for($i = 0; $i < count($array); $i++), not foreach.

uh.... no, cant be > trying at once....
oops... no!... please! no! ... really?! arrcchh!!!!!
I was going to prove it.... But my prove, did prove me being very wrong :-(
woopss.... swallow... aaarrrrcchhhhh
slap! auch!

ok, learned something!
Thanks!!!

But I think I prefer not to use functions inside any loop-declaration. If 'for' handles functions this way, who says a later php-version will start doing this for other loops as well. Both are loops, so it is not logic that they handle functions different.
While() also calls a function any loop, and make use of that as a feature. So again, foreach seems to be different from the rest.

---------------------------------------------------------------------
function test_array(){
static $counter = 1;
vfDebug('calling array: '.$counter);
++$counter;
return array(1,2,3,4,5);
}
foreach(test_array('any') as $item){
vfDebug('inside foreach: '.$item);
}

shows:
calling array: 1
inside foreach: 1
inside foreach: 2
inside foreach: 3
inside foreach: 4
inside foreach: 5

@joly If PHP changes this behaviour, it would have to be in a major version - PHP6 or PHP7, we don't need to think that far ahead. Also foreach allows you to loop over a variable, that this variable is returned from a function doesn't matter to it, very different from for or while.

Single quotes are already in the coding standards, see http://drupal.org/coding-standards#quotes - any patch which tidies up instances and doesn't require escaping with backslashes is likely to be accepted.

On general micro-optimizations, my own view is any optimizations we can make in the critical path (called on commonly pages and called a lot of times) is worth having, if it's not in the critical path, I don't think it's worth the effort of trying to change.

@joly: Not only that, but foreach is designed to be different. The while and for loops each have a "condition", whereas the foreach does not. This allows you to modify the array being foreach'ed without causing too much confusion. Here is the for() equivalent to foreach():

<?php
foreach($input as $index => $item) {
...
}

$from = $input;
$key = 'index';
$value = 'item';
for ($$value = current($from), $$key = key($from); $$key !== NULL; $$value = next($from), $$key = key($from)) {
...
}

Often functions like path_to_theme() or variable_get() are used more then once inside one function.

I was wondering if we can also get the benchmarks of Drupal 6. I believe Drupal 7 is slower than Druapl 6. So, if we can get the Drupal 6 numbers, maybe we could set that as our goal for this optimization i.e. Drupal 7 shouldn't be slower than Drupal 6 out of the box.

What do you think?

I did some here a while ago, obviously Drupal 7 has changed since then: http://civicactions.com/blog/2009/may/19/drupal_6_vs_drupal_7_performanc...

Priority:Normal» Critical

Bumping to critical.

See http://spreadsheets.google.com/pub?key=t9ln9MYX0RCR4VlH3l4D0ZA&output=html for latest benchmark data.

See #607244: Decrease performance impact of contextual links for most recent serious performance regression.

Subscribe.

How about caching the whole node_load() process?

Catching up here.. there's a separate task regarding caching node_load (#111127: Cache node_load) which spun off a module (http://drupal.org/project/entitycache).

These benchmarks are very helpful for getting both an aggregate and a micro view of D7's performance. I also am glad that we've isolated the contextual links' performance degradation.

However, it still feels like we're chewing at the edges here of D7's biggest performance problems. Is the slow-down really caused by death by 1000 cuts, or are we missing some huge elephant in the room (field API storage, DBTNG?) that's causing the bulk of our issues? Anyone know how to dissect this?

Some DB related internal benchmarking is happening here: #550124: Remove prepared statement caching

Right now it confuses me.

IMO it's death by a thousand cuts. Unless there's a huge, massively critical performance issue somewhere which is undetectable by xdebug profiling. This is possible, but seems unlikely.

The biggest elephant in the room is body as field / #438570: Field API conversion leads to lots of expensive function calls which caused a 30-40% slowdown all by itself - however that issue, even in isolation was shown to be the result of a very long tail of performance issues at various points in the request with no obvious place to optimize.

Field API storage can't be causing slowdown, because we load fields from the field cache on 99% of requests (especially when benchmarking, because caches should be primed in sane benchmark, and there's no real traffic to clear caches). That cache is also efficiently got even with db caching - one query per multiple load (which on most requests is at most 2-3 cache_get()s).

dbtng is definitely slow building db_select() queries, and PDO itself maybe slow but #340169: Identical db_select() queries re-built during a page request just got moved to Drupal 8, so that's not a clear place to optimize, and not mind-bogglingly slow. We also did a good job of removing pointless database queries in terms of sheer number (well, up until contextual links got in anyway), which would reduce the PHP overhead of building/parsing/executing queries overall apart from actual db gains.

To be clear: I bumped db_select() duplication to D8 because I suspect it's not a huge difference in practice and I don't know that we can do anything without an API change, which we cannot do at this point. If, however, it turns out that is a major problem and/or Drieschick will still allow necessary changes I'm fine with bringing it back down to D7.

I suspect what we'd have to do there is make SelectQuerys recycleable. Currently I believe they cannot be reused as execute() is a destructive operation. Although, the preExecute() work that happened in August may change that. I suppose it's worth investigating if it seems like it would be worth the time. catch, do you know how much time is spent on those?

Er, right, and the other problem with SelectQuery is that half of the placeholders are dynamically generated and the other half are not, so it would be really hard to re-specify values. That would almost certainly require an API change to fix, even if we could.

I don't think db_select() matters too much, since most of the time it's for list building queries, or multiple load - both of which are once-per-page operations. The only problem may be incorrect usage in contrib.

Another one - chx reckons this will dramatically speed up node_save() - #108818: Add transactions to key _save routines.

StatusFileSize
new749 bytes

Moshe found http://derickrethans.nl/xdebug_and_tracing_memory_usage.php

Here's the results from the front page with one node on it, default profile, user 1, devel module enabled - obviously it'd be good to test other pages too - the script is very, very easy to use.

My xdebug settings fwiw:

zend_extension="/usr/lib/php5/20060613/xdebug.so"
xdebug.profiler_enable_trigger = on
xdebug.profiler_output_dir = /home/catch/www/cachegrind
xdebug.remote_enable = 1
xdebug.remote_port = 9000
xdebug.remote_host = localhost
xdebug.profiler_output_name = cachegrind.out.%R
xdebug.profiler_append = TRUE
xdebug.auto_trace = 1
xdebug.trace_format = 1

although the trace files end up in /tmp for me.

The table is ordered by 'memory own' - i.e. how much memory each function consumes

Showing the 30 most costly calls sorted by 'memory-own'.
                                               Inclusive        Own
function                               #calls  time     memory  time     memory
-------------------------------------------------------------------------------
drupal_load                                37  0.0140   799248  0.0055   535792
_registry_check_code                        7  0.0032   475792  0.0010   321200
_drupal_bootstrap_full                      1  0.0414  1721632  0.0016   303904
has_krumo                                   1  0.0038   301152  0.0033   300848
drupal_bootstrap                            3  0.0717  3154272  0.0012   295224
_drupal_bootstrap_database                  1  0.0006   258424  0.0005   248528
module_load_include                         7  0.0024   196928  0.0012   194248
require_once                               31  0.0055   437816  0.0026   114032
unserialize                                 8  0.0004    86696  0.0004    86696
Database::openConnection                    1  0.0012    80344  0.0003    73944
system_list                                 3  0.0092    64296  0.0016    51344
ob_start                                    1  0.0000    41016  0.0000    41016
drupal_static                              63  0.0041    31560  0.0028    31560
require                                     4  0.0007   149464  0.0002    19848
module_implements                           8  0.0032    86808  0.0006    13824
PDOStatement->fetchObject                   6  0.0002    10760  0.0002    10760
PDOStatement->execute                      11  0.0056    10592  0.0056    10592
include_once                               34  0.0066   212480  0.0024    10344
module_list                                 4  0.0097    74128  0.0002     9560
_drupal_bootstrap_variables                 1  0.0058   131056  0.0002     8552
_drupal_bootstrap_page_cache                1  0.0088   402848  0.0003     8496
variable_get                               39  0.0009     8320  0.0009     8320
drupal_get_filename                        85  0.0039     7408  0.0029     7408
PDO->prepare                                7  0.0008     6784  0.0003     5920
MergeQuery_mysql->__toString                1  0.0002     7552  0.0001     5744
drupal_environment_initialize               1  0.0005     7312  0.0002     5416
DatabaseConnection->prefixTables           11  0.0011    10304  0.0006     5248
define                                    174  0.0035     5200  0.0035     5200
strtr                                      11  0.0003     5056  0.0003     5056
drupal_settings_initialize                  1  0.0018    12600  0.0004     4784

I've also attached a patch to xdebug/api which makes it work on PHP5.2 - only works with memory-own sorting, but that's good enough for our purposes.

How does drupal_load() use that much memory? Only thing I can come up with is that memory usage of files included using include/include_once is added to drupal_load(). require/require_once are however in your top list.

Yeah I'm not sure about that, was wondering if actually calling require_once adds to memory usage during the life of the function or something? I think it's unlikely we have a real memory leak here.

This confirms my worries about system_list() - we currently pull everything out of the system table then chuck half of it away, need to either give up on doing that in one query, or cache what we actually need and use that most requests instead.

@catch, It looks like the total memory usage is ~3.2 megs (if we assume that drupal_bootstrap is inclusive of most everything). Rather than worry about the 51 kb that system_list is taking up, make a page with 50, 100 nodes and see how the functions scale. If you want to see a huge memory problem, check out SimpleTest. drupal_install_modules() inside of setup() consistently runs out of memory on my system for tests suites with a lot of sub tests.

Index: modules/simpletest/drupal_web_test_case.php
===================================================================
RCS file: /cvs/drupal/drupal/modules/simpletest/drupal_web_test_case.php,v
retrieving revision 1.171
diff -u -b -r1.171 drupal_web_test_case.php
--- modules/simpletest/drupal_web_test_case.php 18 Nov 2009 04:56:17 -0000      1.171
+++ modules/simpletest/drupal_web_test_case.php 18 Nov 2009 16:45:30 -0000
@@ -407,7 +407,9 @@
     foreach (get_class_methods(get_class($this)) as $method) {
       // If the current method starts with "test", run it - it's a test.
       if (strtolower(substr($method, 0, 4)) == 'test') {
+$this->error('PreSetup '.$method.' '.memory_get_usage(), 'User notice', _drupal_get_last_caller(debug_backtrace()));
         $this->setUp();
+$this->error('PostSetup '.$method.' '.memory_get_usage(), 'User notice', _drupal_get_last_caller(debug_backtrace()));
         try {
           $this->$method();
           // Finish up.

@jrchamp - with APC, vanilla head runs at around 5-7mb per apache process for me, which sounds about right. There are obviously memory intensive operations like menu_rebuild() we could look at, and lots of nodes on a page etc. I was just pleased to get this working in the first place ;) (and it's good to see there's nothing overly stupid in there too).

How about caching system_list() ?

#626688: Add caching for system_list() ;) no patch though. We'd definitely see a memory improvement, because we could cache an array of what we actually need, so not need to load the whole table into memory, and it'd be less foreaching too.

It's probably already been thought about (#615822-29: Benchmarking / profiling of D7), but I do notice that there is *much* more code in D7 than D6 (yay for functionality). bootstrap.inc (2,300 lines) has doubled (quadrupled since 4.6), and is now has more lines than common.inc had in 4.6; common.inc is now 6,500 lines. Although modules are now split into separate files, even with the proliferation of core include files the base include files that have been present since 4.x.x have grown hugely. Is this worth looking at?

gpk: I've been ranting about our ever-increasing code weight for quite some time now. :-) Unfortunately, there's no way to break up some of those mega-includes sanely without massive refactoring. We need to either:

1) Establish that you will *alway* need to call something like drupal_library_include() on those files you will need in your function (and every function) to ensure that the files you need get lazy loaded just-in-time, and then refactor our core code into a lot more small, independent pieces.

2) Refactor the core libraries to be object-oriented so that we can have PHP's autoload support do the lazy loading for us.

3) Suck it up and assume that everyone has an opcode cache, so code weight becomes a non-issue.

Personally I favor #2, but at this stage for Drupal 7 #3 is about all we can do. For Drupal 8, we *must* make either #1 or #2 a priority system-wide. Of them, #2 is a common and easy to implement approach with a lot of side benefits in exchange for needing to refactor things to be OO and #1 is the "simple and obvious" solution that is just adding epicycles to the system and is going to come crashing down on us sooner or later. (Not that I have a bias between those two, not at all. :-) )

But that's Drupal 8. We can't do that level of major refactoring in Drupal 7, I don't think. Not when we're a week away from alpha.

Subscribing now.

I vote for solution 3). Everyone uses or should use a bytecode cache.

@scroogie: That's not a discussion to be had in this thread, as we can't make major changes at this point anyway. That said, congrats, you've just said that Drupal should never run on a shared host.

subscribing

Status:Needs work» Closed (duplicate)

Marking as a duplicate of #615822: Benchmarking / profiling of D7 which has more up-to-date information.