Figure out why D7 is so much slower than D6

catch - October 27, 2009 - 12:53
Project:Drupal
Version:7.x-dev
Component:base system
Category:task
Priority:critical
Assigned:Unassigned
Status:active
Issue tags:Performance
Description

Decided to compare node/1 between Drupal 7 and Drupal 6. To give some semblance of a level playing field, I put return array(); at the top of menu_contextual_links() since that's our single biggest recent performance regression, and we should have a killswitch / permission for it soon which will do much the same thing. Note the dbs were different, different modules enabled etc., this is just to get an idea

Highlights:
7: 764 different functions called in 335 milliseconds
6: 412 different functions called in 197 milliseconds

I've attached the cachegrind files themselves too, so people can stick them in webgrind / kcachegrind themselves.

AttachmentSizeStatusTest resultOperations
6.png133.77 KBIgnoredNoneNone
7.png254.73 KBIgnoredNoneNone
cachegrind.out_.6.txt477.57 KBIgnoredNoneNone
cachegrind.out_.7.txt961.75 KBIgnoredNoneNone

#1

catch - October 27, 2009 - 13:00

grr, managed to get cache miss for schema in the 7 cachegrind. fresh screenshot and grind file attached.

AttachmentSizeStatusTest resultOperations
7.png576.08 KBIgnoredNoneNone
cachegrind.out_.7.txt961.38 KBIgnoredNoneNone

#2

catch - October 27, 2009 - 13:40

pretty callgraphs from index.php
6.png

7.png

AttachmentSizeStatusTest resultOperations
6.png65.83 KBIgnoredNoneNone
7.png111.62 KBIgnoredNoneNone

#3

catch - October 27, 2009 - 13:43

So looking at all this, all I can see is the same story I've been banging on about for months:

1. PDO is slow so we should reduce the number of database queries - not only to reduce strain on the database, but also PHP overhead building and executing the queries.

2. We have a long tail of function calls which take between 1-10ms each over what we used to have in D6, these add up to a much slower page request overall.

#4

webchick - October 27, 2009 - 18:53

Hm. It seems to me that a lot of the fault lies with the render API, unless I'm misreading the graphs.

#5

yched - October 27, 2009 - 19:20

Well, we put a whole lot more stuff behind drupal_render() in D7, so it's hard to compare.

#6

catch - October 28, 2009 - 02:42

Yeah our entire block rendering process is within the page building now, the graphs really look like two different applications to me.

Either way I'm running out of ideas just staring at webgrind tables, so expect more pretty graphs soon :)

#7

catch - October 28, 2009 - 07:10

/me loves kcachegrind.

AttachmentSizeStatusTest resultOperations
screenshot_007.png228.03 KBIgnoredNoneNone

#8

JoshuaRogers - October 29, 2009 - 23:37

Subscribe

#9

Crell - October 30, 2009 - 20:17

catch, I don't see any part of the DB system listed in the graphic flow. In fact the only part of PHP that is listed is call_user_func_array(), which we know is annoyingly slow.

That said, reducing DB queries is a good thing in general, no question.

#10

catch - October 31, 2009 - 02:22

Crell - that's because it's only a high level overview - we have enough indirection that producing a full callgraph would be pretty hard.

I annotated the top 22/23 expensive functions on node/1 in a new webgrind screenshot. However I really recommend anyone asking detailed questions on this thread to get kcachegrind or webgrind set up locally and poke around yourselves - I'd much rather spend 30 minutes in irc helping people get set up and interpret results than taking screenshots.

Just a note on this screenshot - while queries are only 10% of execution time in webgrind, that's likely due to the overhead of producing the callgrind itself - I visited the same example page with devel query logging on, and it said:

Executed 67 queries in 71.27 milliseconds. Page execution time was 281.13 ms.

Which puts it at around 1/3rd.

Yet another important note on the screenshot - these are only the top 22 functions (for self time), between them they're less than 200ms, the total execution time was 438ms - meaning over 50% of time is spent in the long tail of function calls.

webgrind-annotated.png

AttachmentSizeStatusTest resultOperations
webgrind-annotated.png140.43 KBIgnoredNoneNone

#11

casey - October 31, 2009 - 10:20

Like yched said at #5 there's a whole lot more functionality in D7. For comparison-benchmarks with D7, D6 should at least include CCK (especially at node views).

#12

sun - October 31, 2009 - 21:44

1) First thing that looks a bit strange to me is the block of functions and % for block building.

2) Second, I'd like to see the innards of this still very big cufa() block. Hence, the attached patch. Please note that, again, only field API functions pass up to 7 arguments, and while this patch is primarily meant to see a bit more here, I really really wonder whether we shouldn't implement + restrict the number of arguments just like we did for drupal_alter() now. That could not only help in performance profiling, but is most likely also faster, and on top of that, it would introduce some more consistency.

AttachmentSizeStatusTest resultOperations
drupal.unveil-cufa.patch2.51 KBIdleFailed: Failed to install HEAD.View details | Re-test

#13

sun - October 31, 2009 - 21:57

sorry, did a silly mistake.

AttachmentSizeStatusTest resultOperations
drupal.unveil-cufa.3.patch2.55 KBIdlePassed: 14648 passes, 0 fails, 0 exceptionsView details | Re-test

#14

CorniI - October 31, 2009 - 23:43

@#11: There's a principle, it says, you shouldn't pay for stuff you don't use...
@issue: There's (probably) no major performance sink hidden anywhere, so just extending code freeze for performance related stuff doesn't really work. This really would require more abstract design work than just throwing some static caches here and there in. But as long as the main focus is on features, with i think 3 exceptions or so to get more (really cool!) features in, this doesn't really work.
When you go to a higher level, this is kind of #smallcore, because some people don't care if they can store there fields contents (node bodys...) in a CDN or not, they just want to have their blog running, and if it runs slow, because the flexibility of the Field API _costs_ performance, even in this simple case, drupal sucks for them.
Not that Drupal getting slower with each major version would be new: #206570: GHOP #127: Benchmark 5.x vs. 6.x
Another approach would be to use code generation, simple text-copy inlining for often-used and smaller functions (say, check_plain), micro optimizations like TRUE=>true, there are quite some examples out their.
And i'm sure that the php performance of call_user_func_array() could be improved, if someone would step up and profile php and then submit a patch to PHP, the same goes for PDO, but PDO is (according to some posts on php-internals) pretty abandoned, so that's probably more complicated...

#15

catch - November 1, 2009 - 03:06

@Cornil - this has nothing to do with smallcore - you can't have nodes use field API or not use it for node body and also have a consistent API.

Both fields and the database layer are also subsystems where we trade some performance for (at least the potential of) greater scalability, let alone features - it makes sense to make that trade-off in some situations. i.e. with dbtng we have more or less eliminated LOWER() queries, and Crell just filed a bunch of patches for master-slave support.

The point of this thread and #513984: Find bottlenecks in HEAD - meta issue is to understand where those trade-offs are, what their impact is - something which is sadly very, very rarely done before patches get committed, and much harder to do afterwards (although we did do it for body as field). Then we can look at mitigating that impact, or trying to make gains elsewhere.

For example if PDO is adding a lot of overhead for each actual MySQL query - then it could make sense to try 1-1 exchanges for direct database queries which are in the critical path - obvious ones in HEAD would be places like module_list(), some filter queries which are shown on any page with a formatted textarea. These would be no-ops in HEAD, but allow any site with a pluggable caching backend to eliminate not only the cost of the database hit, but also the cost of PDO itself.

Additionally, Sun's patch is to look at what c_u_f_a() is hiding, not to try to factor it out (although we did successfully do that for drupal_alter() a few weeks ago). New pretty graph with that applied (and return array() at the top of menu_contextual_links()).

nocontextualinksnocufa.png

AttachmentSizeStatusTest resultOperations
cachegrind.out_.nomenucontextuallinksorcufa.txt1.12 MBIgnoredNoneNone
nocontextualinksnocufa.png176.41 KBIgnoredNoneNone
headnocontextuallinks.png578.4 KBIgnoredNoneNone

#16

peterx - November 2, 2009 - 01:20

Comparing D7 with D6+CCK will not work for sites that do not currently use CCK. They write their own modules and can optimise the performance of those modules or cost out the lack of performance. They need to know the cost of the bit they cannot optimise, core, and decide to upgrade hardware or donate time working on core performance or just leave the upgrade for a year.

Most of the systems I look at hit database performance issues before CPU. APC makes code load and execute cheap. Databases perform beautifully in test systems because the tables are cached in memory by the database. You then put the system into production and there is not enough memory to cache all the active data. The uncached database call is 100 times more expensive. Adding cache inside Drupal may not help because it pushes something else out of memory.

The best contribution I could make would be to look at a trace of all the SQL calls. Is anyone recording the SQL?

I am looking at the first use of D7 in a couple of weeks for projects starting early 2010. Is anyone looking at the SQL before then? Perhaps a child thread on SQL research and keep this thread for code research?

#17

catch - November 2, 2009 - 01:29

Your best bet for SQL calls is devel module - install HEAD, install devel HEAD, turn on query logging, generate some content.

In general that's improved over D6 - we added indexes where needed, and reduced a lot of overall hits to the database with node_load_multiple() (now entity loader), and a path alias whitelist and cache. There's two main areas still to look at - some of the queries which are still unindexed, mainly statistics, forum etc. and more aggressively removing queries which are made on every page / most pages.

#18

catch - November 2, 2009 - 02:47

In response to Dries on twitter:

http://twitter.com/Dries/status/5342711918

edit: on irc, it was pointed out that the parameters of this test weren't completely clear from the comment.

What I did, was a completely clean install of each version, no CCK - just to test the cost of baseline bootstrapping and page rendering. This shows that this baseline is currently twice as expensive in Drupal 7, as Drupal 6 - I think this is in large part down to some horrible bugs introduced by d7ux patches like dashboard and shortcuts.

I will also try to run these again with no optional modules enabled, and with CCK installed, to give D7 a bit more of a fair run. But the idea that the only difference between D7 and D6 is that 6 doesn't have CCK installed is patently false, as you can see extremely clearly from the cachegrinds attached.

/endedit

I installed Drupal 6 and Drupal 7 - but without any generated content at all, then hit the front page with ab -c1 -n1000

D6:

Concurrency Level:      1
Time taken for tests:   45.300 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6023000 bytes
HTML transferred:       5457000 bytes
Requests per second:    22.08 [#/sec] (mean)
Time per request:       45.300 [ms] (mean)
Time per request:       45.300 [ms] (mean, across all concurrent requests)
Transfer rate:          129.84 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:    35   45   8.5     41      67
Waiting:       35   45   8.5     41      67
Total:         35   45   8.5     41      67

D7:

Concurrency Level:      1
Time taken for tests:   82.862 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6144000 bytes
HTML transferred:       5657000 bytes
Requests per second:    12.07 [#/sec] (mean)
Time per request:       82.862 [ms] (mean)
Time per request:       82.862 [ms] (mean, across all concurrent requests)
Transfer rate:          72.41 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    72   83  11.0     79     193
Waiting:       72   83  10.9     79     193
Total:         73   83  11.0     80     194

AttachmentSizeStatusTest resultOperations
cachegrind.out_.d6.txt252.9 KBIgnoredNoneNone
cachegrind.out_.d7.txt590.88 KBIgnoredNoneNone

#19

catch - November 2, 2009 - 03:22

Disabled all optional modules in D7 (includes shortcuts, toolbar, dashboard - some of which have outstanding critical performance issues)

Concurrency Level:      1
Time taken for tests:   69.814 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6144000 bytes
HTML transferred:       5657000 bytes
Requests per second:    14.32 [#/sec] (mean)
Time per request:       69.814 [ms] (mean)
Time per request:       69.814 [ms] (mean, across all concurrent requests)
Transfer rate:          85.94 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    59   70  12.2     66     299
Waiting:       59   70  12.2     66     299
Total:         59   70  12.2     66     299

Not a great deal of change - especially compared to 20 reqs/second

#20

catch - November 2, 2009 - 05:11

CCK vs. Fields benchmark, which is probably more what Dries wanted to see.

I took the following steps:

In D6, enabled CCK and text modules, removed the 'body' field from 'story', added a CCK text field_body to replace it, posted a node with just a title and a body.

In D7, all core optional modules still disabled (including some which are enabled on the D6 install), I just posted an article with a title and a body.

Then I ran ab -c1 -n1000 http://example.com/node/1 on each install.

D6:

Concurrency Level:      1
Time taken for tests:   60.457 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      5429000 bytes
HTML transferred:       4863000 bytes
Requests per second:    16.54 [#/sec] (mean)
Time per request:       60.457 [ms] (mean)
Time per request:       60.457 [ms] (mean, across all concurrent requests)
Transfer rate:          87.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    52   60   8.7     57     110
Waiting:       51   60   8.7     57     109
Total:         52   60   8.7     57     110

HEAD:

Document Path:          /node/1
Document Length:        6324 bytes

Concurrency Level:      1
Time taken for tests:   93.700 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6811000 bytes
HTML transferred:       6324000 bytes
Requests per second:    10.67 [#/sec] (mean)
Time per request:       93.700 [ms] (mean)
Time per request:       93.700 [ms] (mean, across all concurrent requests)
Transfer rate:          70.99 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    84   94   7.9     92     142
Waiting:       84   93   7.9     91     142
Total:         84   94   7.9     92     142

#21

nicholasThompson - November 2, 2009 - 10:32

What is the result of Devel's debugging on those pages (num queries, query time, PHP exec time, Mem use, etc)?

#22

catch - November 2, 2009 - 11:37

This is the node/1 page CCK single text field, xdebug disabled, APC on:

D6:
Executed 34 queries in 18.99 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 60.69 ms.

HEAD:
Executed 49 queries in 43.78 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 123.79 ms.

HEAD with contextual links disabled:
Executed 36 queries in 27.24 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 89.86 ms.

However I get deviation of well over 30ms for each of these - just picked what looked to be the median of 10 or so requests each.

#23

sime - November 2, 2009 - 14:01

Do we need to look past bootstrap? I get significantly different results for D6 and D7 when i exit before the active handler. (Attached the full results).

D6

Concurrency Level:      1
Time taken for tests:   9.177291 seconds
Complete requests:      1000
Requests per second:    108.96 [#/sec] (mean)
Time per request:       9.177 [ms] (mean)

D7

Concurrency Level:      1
Time taken for tests:   14.947638 seconds
Complete requests:      1000
Requests per second:    66.90 [#/sec] (mean)
Time per request:       14.948 [ms] (mean)

AttachmentSizeStatusTest resultOperations
bootstrap_only_required_d6_d7.txt3.04 KBIgnoredNoneNone

#24

sime - November 2, 2009 - 14:03

I suppose it's only 5 ms, only a part of the full page difference.

#25

sime - November 2, 2009 - 15:33

A further comparison of D6/D7 with some simple modifications. To start with I:
-- disabled d7 contextual links
-- uninstalled every module possible
-- disabled all blocks in D6.

Then I did the following comparisons. The following categories correspond to the graph and the attached benchmark data. I've also attached the spreadsheet summary.

  1. "Bootstrap" - exit before menu_execute_active_handler
  2. "Node/1" - create 1 node with title/body.
  3. "User" - just the /user page
  4. "Node/1+block" - node/1 plus I enabled blocks in D7 and added login block to both.
  5. "Node/1+fields" - disabled blocks again and added CCK to D6 (content, noderef, userref, text, number, optionwidgets) and extended node/1 in both sites to have 1 plain textfield and 1 plain numeric field.

AttachmentSizeStatusTest resultOperations
ab-summary.ods17.19 KBIgnoredNoneNone
ab-ms-request.png19.63 KBIgnoredNoneNone
node-1-block_d6_d7.txt3.09 KBIgnoredNoneNone
node-1_only_required_d6_d7.txt3.08 KBIgnoredNoneNone
user_only_required_d6_d7.txt3.08 KBIgnoredNoneNone

#26

sime - November 2, 2009 - 15:34

Next I reverted to the basic setup (no blocks, no optional modules) and I installed devel. Refreshing about 10 times in each site, there was no meaningful difference between the time spent in database queries.

#27

sun - November 2, 2009 - 16:00

Nice work, sime!

I'd especially be interested in call graphs for the /user page (ideally with that cufa() patch, too). There, we have blocks + a "basic" renderable array (a form) in the main system block. Hence, comparing /user to /user should be the most comparable page in the system. I'd even say that we should use it as primary target for performance benchmarking, analysis, and patches.

Please also note that we should get #607244: Decrease performance impact of contextual links committed ASAP.

#28

int - November 2, 2009 - 16:18

Subscribe

#29

moshe weitzman - November 2, 2009 - 16:45

I'd like to see some data with opcode cache. Time spent parsing php is usually substantial. We can dig up some old patches if that turns out to be a significant contributor.

#30

catch - November 2, 2009 - 16:48

@moshe - all my benchmarks are done with APC enabled. Or do you mean without?

#31

sime - November 2, 2009 - 16:49

@moshe, yes I should have said that my data is running on MAMP with APC.

#32

int - November 2, 2009 - 17:01

I think that it would be beneficial if in testing results we see the last core time execution and with the patch time execution.

Like this:

Testbed results
fix_performance_2.patch passed (15s less) Passed: 14630 passes, 0 fails, 0 exceptions Detailed results

15 secounds less time testing that without this patch. or more: (1:13s more)

Isn't one exhaustive benchmark, but I think that is useful.

#33

catch - November 2, 2009 - 17:08

User callgrind. All modules except for block disabled.

AttachmentSizeStatusTest resultOperations
user.png145.8 KBIgnoredNoneNone

#34

catch - November 2, 2009 - 17:10

Found the graph settings, here's a more detailed graph :p

AttachmentSizeStatusTest resultOperations
catchfoundthegraphsettingsinkcachegrind.png307.09 KBIgnoredNoneNone

#35

effulgentsia - November 2, 2009 - 17:12

subscribing

#36

scroogie - November 2, 2009 - 17:28

Very interesting. Subscribing.

#37

Crell - November 2, 2009 - 17:46

We should also see about a temporary patch to split up _drupal_bootstrap() so that we can see which parts of bootstrap are more expensive. Right now with the switch statement we can't tell if one part is considerably more expensive than another.

#38

sun - November 2, 2009 - 18:27
Status:active» needs review

Nice idea, Crell.

That, we could even consider to not do temporarily, because we're apparently calling a couple of functions that are plain indirection there. For that sake, I want to know what the bot thinks about the patch this time. ;)

AttachmentSizeStatusTest resultOperations
drupal.unveil-bootstrap-cufa.patch12.05 KBIdlePassed: 14633 passes, 0 fails, 0 exceptionsView details | Re-test

#39

moshe weitzman - November 2, 2009 - 18:28
Status:needs review» active

A permanent patch. Once we are afraid to make a function call, we are lost. This adds less than 10 calls per request. Temprary patches would be more appropriate for highly hit functions like drupal_render(), theme(), check_Plain(), url()

#40

webchick - November 2, 2009 - 18:39
Status:active» needs review

Marking needs review so that the bot can have a crack.

#41

Crell - November 2, 2009 - 19:00

sun, can you roll a version with just the bootstrap changes and not the cufa() change? Sounds like that could be a for-reals patch.

#42

sun - November 2, 2009 - 20:20

If we really want to do that, then we should open a separate issue for it, because there are a couple of things that would need to be fleshed out properly. This issue should solely focus on meta discussion, finding the causes for the slowdown, and comparing apples with apples properly. We already made good progress on that, and we should continue to do so.

#43

chx - November 3, 2009 - 12:36

sun, if we want to compare apples to apples then we indeed need a temporary patch that separates out the bootstrap stages so we can find out wth is so slow. From looking at sime's graph I am eyeing PDO and I am not happy.

#44

catch - November 3, 2009 - 12:39

Yeah the bootstrap change sounds great, but let's do that in a separate issue.

While we're on D6 vs. D7 comparisons, here's inclued output for the /user page as anonymous.

The actual structure looks fine, but in terms of actual quantity of included files things have gone up a fair bit.

Hat tip to: http://talks.php.net/show/drupal08/14 http://pecl.php.net/package/inclued and http://www.chehodgins.com/php/sorting-out-your-php-includes-using-inclued/

AttachmentSizeStatusTest resultOperations
user_inclued.png909.95 KBIgnoredNoneNone
user_inclued_d6.png626.64 KBIgnoredNoneNone

#45

sime - November 3, 2009 - 13:57

The other day I was timing each require_once ... in _drupal_bootstrap_full(). I was surprised that pager.inc took significantly longer to "include" than menu.inc, even though menu.inc has much more code.

I found that the bottleneck in pager.inc was in the class extension at the top of pager.inc (class PagerDefault extends SelectQueryExtender { ... }). If I take the class out completely, the include time for the file is again reduced.

I just thought it was interesting. I plan to look further, but I have very bad profiling skills.

#46

sun - November 3, 2009 - 15:27

#47

sun - November 3, 2009 - 15:52

@sime: Interestingly, in #97293: Drupal 7: Tablesort Cleanup I already found out that we could include pager.inc + tablesort.inc entirely conditionally via autoloading. Due to the new query extenders, those functions are only ever used and required when a select query is extended. The patch removes the permanent loading of tablesort.inc on all pages, which completely works fine (except that totally wonky modules like forum.module may need to load it on their own, because it's doing totally wonky stuff in at rendering time, using cached query results). That, however, I'd call an edge-case, and I think that modules like that can happily load the code on their own.

#48

Crell - November 3, 2009 - 16:05

Including a class shouldn't be any more expensive than the equivalent amount of functions, except that it may trigger an autoload. Then we have an extra SQL query in the autoload plus whatever other file needs to be loaded (select.inc). I suspect that's what is making it take longer.

As sun said in #47, we can probably remove most if not all explicit includes of those files and let autoload handle it. That way, pager.inc won't be loaded until after select.inc has already been loaded (because you need the select query before you can extend()), so that should even out the numbers. It probably won't change the amount of time taken, though, unless we can optimize autoload some more, which is on my todo list. I need to dig into that this week. Blargh.

#49

peterx - November 3, 2009 - 18:36

#48 If all the code to autoload was in /includes/, your autoload would not need a database lookup to auto include. Drupal core would lose a bit of structure. No other disadvantage.

#50

Crell - November 3, 2009 - 19:09

@peterx: I do not even understand what you're saying. Because Drupal is so modular it is not even remotely possible to use the common Class_Name -> Directory/Name mapping pattern, which has all kinds of other flaws anyway. That's not even on the table. It's also off topic for this thread. :-) I was just pointing out that sun is correct that we don't need to explicitly load pager.inc 99% of the time.

#51

sime - November 4, 2009 - 05:45

I satisfied myself that there was nothing suspicious in #45.

I did a big hack of D6 and D7, piping all includes (where possible) to a function where I could measure the time take to include the file. Call it poor-mans-profiler.

So there are about 15 extra includes in D7 on logged in /user. On my machine with APC simply including these files adds up to at least 2ms, even 3ms. In some of my tests this represented 10% of the average page load.

Can anyone confirm that includes, even with opcode cache, have this impact? Because even if we don't wish to merge inc files, we should flag it simply as "one of the reasons D7 is slower than D6".

#52

sime - November 4, 2009 - 07:20

Put another way: if I can show that we gain, say, 1% speedup on #smallcore, by merging the field.module's .inc files - is that worth pursuing?

#53

catch - November 4, 2009 - 07:34

I'd very much like to see a before/after comparison - and it's not vast amounts of work to make a patch for that. It also somewhat impacts attempts to further split module files into includes for memory / non-APC savings -which is hotly debated and pushes us in the opposite direction.

I've also noticed we spend some time just including .tpl.php files - but not sure if that's the actual include, or the way in which we do it.

The hard thing though would be measuring impact - we need with apc/no apc, with apc.stat on and off, possibly different operating systems. I've tried to find reliable benchmarks just on things like require vs. require_once and there's precious little information about that on the internet, and what there is is inconclusive and contradictory.

#54

peterx - November 8, 2009 - 03:55

How wide can you go as far as changes?

Looking at D6, not D7, there are things that leap out and probably have not changed for D7. I enclose one example that I would compare on D7 if it fits in the range of viable changes. A page load starts with 0.72 ms:
SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 AND bootstrap = 1 ORDER BY weight ASC, filename ASC
followed a short time later by 0.84 ms:
SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC

From a performance quick gain perspective, you would replace the first query with the second query and pass the result of the first query to the code that requires the second query. Both queries are in module_list making the change relatively easy. Update: This is now at http://drupal.org/node/623992 Reduce {system} database hits

Another example. The following query is executed 6 times. That appears to be once for content and once for every block. Whoever calls cache_get 6 times with the same query should cache the result in memory.
SELECT data, created, headers, expire, serialized FROM cache_nodewords WHERE cid = 'tags_info'
Further down the page the same query is repeated another 6 times.
SELECT data, created, headers, expire, serialized FROM cache_nodewords WHERE cid = 'tags_info'

#55

sime - November 5, 2009 - 02:17

Follow-up of #52 and #53. Just want to rule them out.

I focussed on Field module which currently loads 6 .inc files on every page. I combined the files together and then benchmarked a number of times. No discernible difference in performance. (I shut down Firefox and other apps to try and avoid anomalies.)

#56

mattyoung - November 5, 2009 - 07:18

.,.

#57

yched - November 5, 2009 - 10:34

/me is happy to not have to merge field's .inc into a "War and Peace"-sized file.

#58

peterx - November 7, 2009 - 05:42

Is anyone testing with path aliases on a big menu?
http://drupal.org/node/625898 and http://drupal.org/node/625968 speed up menu creation in D6 by reducing database accesses. The menu system is changed in D7. If the same multiple alias lookup occurs as in D6, the menu system could use the path change to get all the aliases in one call then apply the aliases in one hit. That should reduce calls to the path functions. The stats for database accesses from devel are:
Before: Executed 595 queries in 207 milliseconds.
After: Executed 158 queries in 95.37 milliseconds.
The database calls went down by 437 but menu still called path 437 times to get aliases. A change to menu could eliminate those calls.

#59

catch - November 7, 2009 - 05:54

@peterx - we implemented a path whitelist and system path cache in Drupal 7 #456824: drupal_lookup_path() speedup - cache system paths per page. and #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() which eliminated a very large number of database lookups for paths. However that sounds like a third approach which wouldn't necessarily conflict with what's already in there.

#60

peterx - November 7, 2009 - 06:22

@catch Thank you for the link to #456824: drupal_lookup_path() speedup - cache system paths per page. I searched path for similar issues and forgot to search menu. #456824 should reduce the system calls for the navigation menu. My change is aimed at the primary menu when used with drop downs and lots of entries. They can share the same cache.

I do not see changes in #456824 for menu. Does that mean menu still calls drupal_lookup_path many times in D7? My big primary menu appears to call drupal_get_path_alias() and that is the call that could be reduced to one, getting an array back when the input path is an array. I have not traced back from drupal_lookup_path or drupal_get_path_alias to where it is activated in menu.

#61

catch - November 7, 2009 - 06:34

So the way it works in D7 is this:

All consumers of drupal_lookup_path() call it one by one.

We avoid database lookups by

1. Only hitting the database for whitelisted paths - if you never create an alias for a path at admin/* it'll never look for one either - this list is generated dynamically when new paths are added / removed.
2. Caching all the paths looked up per page when there's a cache miss, so that these can be queried with a single IN() query - once the cache is primed, individual cache misses are looked up one by one.

So between your menu patches, I think it would make sense for drupal_lookup_path() to additionally accept an array of system paths (from menu, possibly elsewhere) - this would then reduce the number of path lookups where the system paths cache hasn't been primed, or where it's gone stale. However due to the optimizations which are already in there, it'd be less dramatic of a change to the number of queries on most pages - if we can do it without an actual change in the API though, then it seems like a good addition to what's already there.

#62

peterx - November 8, 2009 - 01:16

@catch D6 menu change worked out in #625968: Speed up path alias lookups in menu links. For the use case of the primary menu linking to nodes with aliases, drupal_lookup_path is called only once for the 400 links. menu passes the alias as part of the menu item and url() does not have to call drupal_lookup_path. The change might translate to D7.
Before: Executed 616 queries in 224.26 milliseconds.
After: Executed 178 queries in 97.14 milliseconds.

#63

peterx - November 8, 2009 - 01:19

@catch D6 menu change worked out in http://drupal.org/node/625968. For the use case of the primary menu linking to nodes with aliases, drupal_lookup_path is called only once for the 400 links. menu passes the alias as part of the menu item and url() does not have to call drupal_lookup_path. The change might translate to D7.
Before: Executed 616 queries in 224.26 milliseconds.
After: Executed 178 queries in 97.14 milliseconds.

#64

casey - November 17, 2009 - 15:48

@catch could you give us an updated graph of the one you provided in #34? I also would like to see such graphs for a node view and node edit. thnx.

#65

peterx - November 17, 2009 - 21:42

I looked at a D6 node edit with devel and cache_get ran the following query six times, sometimes back to back. Might be an area to look.
SELECT data, created, headers, expire, serialized FROM cache_nodewords WHERE cid = 'tags_info'

#66

sun - November 17, 2009 - 21:46

@peterx: Could you please stop derailing this issue with a nodewords issue? Please file a bug report here: http://drupal.org/project/nodewords

#67

catch - November 18, 2009 - 01:45

Three graphs attached. I'll repeat the offer made previously to get people up and running with this stuff themselves in irc.

AttachmentSizeStatusTest resultOperations
front.png255.82 KBIgnoredNoneNone
node3.png267.05 KBIgnoredNoneNone
node3edit.png301.58 KBIgnoredNoneNone

#68

Jeremy - November 25, 2009 - 00:51

Subscribe.

#69

yched - November 25, 2009 - 00:59

Linking to #438570-55: Field API conversion leads to lots of expensive function calls which has a proof-of-concept patch for enhancing field rendering performance (in short: formatters as render arrays instead of formatters as theme functions - see comment #52 over there for more details).

Help needed for benchmarks, because my setup notably sucks at perf measures.

#70

peterx - November 25, 2009 - 05:40

front.png shows drupal_session_commit turning into 37 PDO executes. DatabaseConnection->query calls DatabaseStatementBase->execute 37 times. It looks like it should be one insert or one update followed by one insert. If this is session cleaning out 36 old sessions, it could be deferred to some sort of shut down task so it is not part of the page response.

#71

carlos8f - November 25, 2009 - 05:59

subscribe.

#72

catch - November 25, 2009 - 06:49

@peterx

DatabaseConnection->query calls DatabaseStatementBase->execute 37 times

That just means that 37 queries were executed on the page in total, but the most expensive one was from drupal_session_commit() - which would be right, since that's an insert query and the rest are selects.

 
 

Drupal is a registered trademark of Dries Buytaert.