I've been profiling drupal and found that a slight change to module_hook boosts site performance measurably. It's a very small change, but in my opinion it's worth it.

Benchmarks can be found on my blog:
http://papasian.org/2007/01/11/drupal-module-performance/

Comments

ChrisKennedy’s picture

Title: Performance: increase in speed in module_hook() by caching function_exists() output. » Performance: cache function_exists() output in module_hook()
Version: 5.0-rc2 » 5.x-dev
Status: Needs review » Needs work

It's best to include, or at least summarize, benchmarks in the issue entry for both convenience and posterity rather than needlessly making everyone visit your blog. The benchmarks shows a performance improvement of about 2%.

How much does the performance improvement depend on the number of modules you have enabled (and how many functions they define)? You could even use PHP's get_defined_functions() to see how the performance gain varies with the exact number of defined functions for a few test setups (e.g. small, medium, and large number of modules enabled).

Also, your patch is difficult to apply in its current form. See http://drupal.org/patch for info on how to create a basic cvs patch. The IF statement needs a space in it too (see the coding standards).

dannyp’s picture

Status: Needs work » Needs review
StatusFileSize
new623 bytes

Mea culpa - was in a bit of a rush last night and didn't have access to a working copy, so I wasn't looking forward to typing twice. Sorry if that inconvienenced anyone.

That benchmark was on out of the box drupal 5rc2. The more modules present, the bigger the expected performance increase.

Good call on the whitespace change - I've incorporated it and generated a diff against 1.93.

RobRoy’s picture

Status: Needs review » Needs work

That last } has a trailing white space, need to remove that.

Also, IMO all functions that implement static caching need to implement a $reset parameter to reset the static cache and Doxygen. For example, we may programmatically disable a module then run some other code that thinks this hook still exists (omitting a module_exists() call). We would have to add code to module_disable to reset this static cache. See http://drupal.org/node/106015 for a bit more explanation.

ChrisKennedy’s picture

Good point, a reset parameter should be added.

moshe weitzman’s picture

i have no idea how php internals work, but can't see a benefit here. php knows in its symbol table about all functions that exist. you can't delete a function once it has been loaded, can you? that synbol table has got to be as fast as our static array. i see that benchmarks prove otherwise, but i'd like to sanity check.

ByteEnable’s picture

This is really really low hanging fruit. The benchmark numbers show a delta of only 300ms, but yet the author claims linear correlation with number of modules loaded but offers no proof. IMHO, bogus so far.

Byte

dannyp’s picture

Actually, the msec/first-response is about two standard deviations apart between tests.

without:
msecs/first-response: 54.8222 mean, 64.249 max, 53.426 min, 0.814898 stddev
msecs/first-response: 54.7124 mean, 60.375 max, 53.48 min, 0.636029 stddev

And with it:
msecs/first-response: 53.6751 mean, 58.663 max, 52.455 min, 0.681214 stddev
msecs/first-response: 53.6477 mean, 58.638 max, 52.5 min, 0.508483 stddev

Which does mean that they're statistically different.

It's a fair point about dynamically defining (or undefining) modules, so I think the rest is moot, but there was certainly a performance gain to be had here. I've attached the trace that I used with kcachegrind - if you'll notice, 5.07% of the time of the request ends up getting spent in module_hook, of which 4.61% of the time is spent in module_hook itself.

That's really interesting, because it means that the call to function_exists is cheap, but that the call to module_hook is not - perhaps it's the concatenation of the module and hooknames together, times 329, plus the overhead of a PHP function?

dannyp’s picture

StatusFileSize
new563.73 KB

Sorry, here's the trace.

m3avrck’s picture

Very interesting patch. Clearly the benchmarks are showing improvement which is great.

Moshe, I believe the speedup can be attested to the fact that isset() is extremely fast. This is apparent when compared to function_exists(). That would explain the speedup, and prove the sanity check :-)

dannyp’s picture

Actually, my bet for the mechanism of the speed increase is that it actually has nothing to do with function_exists, rather, it's the time it takes to allocate memory and copy "$module_$hook" into it (and then to free it upon return) - I am no expert on PHP internals, but it would strike me as something that would require ('the equivilent of' if PHP does internal memory management, else 'an actual') malloc and free call and the time required to copy the string.

Apparently, based on the benchmarks and the profiling indicating that function_exists isn't where the time is being spent, that's more expensive than using the hashing and array functions to dereference the result from a multidimensional array.

I don't have the time to test it, especially given that it doesn't seem like it will be able to be used because modules can apparently be defined at runtime, but to test this hypothesis we could use a static array and then use as the key "$module_$hook" and see if the performance gain is lost.

m3avrck’s picture

Building upon this patch, we should repeat this logic in the theme layer, for example:

    if (function_exists($theme_engine .'_init')) {
      call_user_func($theme_engine .'_init', $themes[$theme]);
    }
function theme_get_function($function) {
  global $theme, $theme_engine;

  // Because theme() is called a lot, calling init_theme() only to have it
  // smartly return is a noticeable performance hit. Don't do it.
  if (!isset($theme)) {
    init_theme();
  }

  if (($theme != '') && function_exists($theme .'_'. $function)) {
    // call theme function
    return $theme .'_'. $function;
  }
  elseif (($theme != '') && isset($theme_engine) && function_exists($theme_engine .'_'. $function)) {
    // call engine function
    return $theme_engine .'_'. $function;
  }
  elseif (function_exists('theme_'. $function)){
    // call Drupal function
    return 'theme_'. $function;
  }
  return FALSE;
}

If we apply this same logic in the theme layer, we could *significantly* improve the performance of Drupal. Without a doubt, the theme area is the slowest area in Drupal. This trick that Danny points out, reused in the theme layer could show some significant performance boost.

m3avrck’s picture

Any updates? This patch seems pretty solid and worthwhile...

dannyp’s picture

I think it's near dead, because the functionality of defining a function in a node has to be maintained.

The alternative is to only cache the output if the function does exist (functions won't fall out of existance) but I think the slight performance gain we could have had here would be lost even more by doing so. If someone who has more time than me would like to play with that and see, go ahead.

owen barton’s picture

I can't reproduce any performance increase at all here:

ab -c1 -n500 on the homepage, default devel content generated and some blocks enabled.

First test:

Without patch 16.75 #/sec:
Mean 59, SD 3.4

With patch 16.63 #/sec:
Mean 59, SD 3.3

Second test:

Without patch 16.86 #/sec:
Mean 58, SD 3.4

With patch 16.64 #/sec:
Mean 59, SD 3.2
marcingy’s picture

Version: 5.x-dev » 8.x-dev

Bumping version

valthebald’s picture

Status: Needs work » Active
Issue tags: +Ancient
StatusFileSize
new657 bytes

For impatient: patch seems non-relevant for current states of PHP/Drupal
Detailed version:

I have rerolled suggested patch for D8 and tested if it still relevant.
Server environment:
php 5.3.8 with suhosin patch (Debian testing), Apache 2.2.21
Drupal 8: 8.x-dev snapshot d78f6a6 - Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein, csevb10, alex_b, xjm, Steve Dondley: Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests).

Apache bench: ab 2.3 (bundled in Debian)

Tests were performed for the following configurations:
apc on, drupal cache off, CSS/JS aggregation off
apc on, drupal cache on, CSS/JS aggregation on
apc off, drupal cache off, CSS/JS aggregation off
apc off, drupal cache on, CSS/JS aggregation on

with drupal cache off, ab was run with -c1 -n500 parameters
with drupal cache on, ab was run with -c10 -n5000 parameters

after each configuration change, drupal cache was cleared (via admin/config/development/performance), and 5 identical runs of ab were performed.
First run was ignored to compensate cache filling effect, remaining 4 series combined to produce average result.

Finally, test results (less is better):

Config without patch with patch
APC on, cache off 26 +/- 1.5 26 +/- 1.6
APC on, cache on 11 +/- 4.8 10.5 +/- 4.8
APC off, cache off 124 +/- 4.2 124 +/- 3.7
APC off, cache on 39.8 +/- 14.5 40 +/- 14.4

As you see, results differ within standard deviance.
My guess is that later versions of PHP made internal symbol table more effective.
I suggest to mark this "won't fix", please advise

owen barton’s picture

Status: Active » Closed (won't fix)
damien tournoud’s picture

The real reason is that the implementation cache kicks in: only direct hook invocations go through module_hook() in Drupal 7+.