Download & Extend

A site with statistics module enabled is much slower in serving cached pages in D7 than in D6

Project:Drupal core
Version:7.x-dev
Component:base system
Category:bug report
Priority:critical
Assigned:Unassigned
Status:closed (fixed)
Issue tags:D7 upgrade path, Performance

Issue Summary

Spin-off issue from #320331-133: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks.

hook_url_inbound_alter() is great, but surely there must be some way to not have it destroy the performance gains from page caching when using a module that just needs to know $_GET['q'] within hook_boot() or hook_exit().

Comments

#1

tagging

#2

subscribing.

#3

What does hook_url_inbound_alter() need to make it run successfully, apart from drupal_alter() being available?

#4

Mostly it was drupal_alter() and the assumption that since this is the opposite of hook_url_outbound_alter(), it should run the same bootstrap. We could make a copy of it and put it into bootstrap.inc (bootstrap_alter). Also in the test function we had used user_load_by_name() but I guess that could be converted to a simpler query: SELECT uid FROM {users} WHERE name = :name instead of relying on user.module being loaded. I guess if a module implementation of hook_url_inbound_alter() needed a full bootstrap it could do so itself and we can add BOOTSTRAP_PATH back in.

Other changes required to revert removing the bootstrap:
- Add 'url_inbound_alter' to http://api.drupal.org/api/function/bootstrap_hooks/7
- Add to the hook documentation that a full bootstrap is not available.

#5

I'd also like to see more benchmarks with the current code and statistics module enabled. Ones I did indicated little difference. Recent ones indicate huge difference.

#6

@5: my hunch in #320331-121: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks was that the page cache probably wasn't turned on for your tests.

Re 4:
> - Add 'url_inbound_alter' to http://api.drupal.org/api/function/bootstrap_hooks/7
Given that the 'url_inbound_alter' hook is only required during a cached page response if a module (e.g. statistics) has upped the bootstrap phase to DRUPAL_BOOTSTRAP_PATH, can we distinguish between boot/exit/watchdog and url_inbound_alter? Otherwise we could be loading modules unnecessarily. The bootstrap column in {system} is already a full-blown int so there's certainly room to store the info..

#7

It would needed to be added to bootstrap_hooks since it's invoked prior to all modules loaded. Otherwise modules that do implement url_inbound_alter() but not the other bootstrap hooks would never be run.

#8

I was thinking that in many cases this could cause such modules to be loaded unnecessarily, however possibly it would require too big an API change at this stage to finesse the loading of these modules. Same could also be said for modules implementing hook_watchdog - in 99% of cached page requests they will not be needed.

#9

Priority:critical» normal

Not critical. Please read and understand Priority levels of Issues, thanks.

#10

Priority:normal» critical

To be honest, I don't entirely understand the description of "critical" on that page, but if "critical" means we should not release Drupal 7 until this is resolved, then this issue is critical as per #320331-126: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks and remaining discussion on that issue. Turning url inbound rewriting from a handler to an alter hook resulted in a site with statistics module enabled having 500% or worse performance degradation for serving cached pages, and according to webchick, this is unacceptable. This issue can be marked fixed if we roll that other one back. But I don't accept that that's the only path towards fixing this.

#11

Yes, we either fix this, or we roll back hook_url_*_alter(). Note that the hooks also cause potentially infinite recursion if you call url() inside taxonomy_term_load() per #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter().

#12

Title:Full bootstrap is required for $_GET['q'] to be initialized: makes page caching incompatible with statistics module» A site with statistics module enabled is much slower in serving cached pages in D7 than in D6

Better issue title I think, and one that got me thinking about an alternate way to solve the problem.

I don't think it's necessary for hook_boot() to have a fully normalized $_GET['q']. I think it's reasonable for us to say that the process of normalizing $_GET['q'] requires module involvement, and therefore doesn't happen until full bootstrap. I'm not too keen on making hook_url_inbound_alter() a bootstrap level hook (#6 - #8), unless some other compelling use-case forces us into that.

The issue with statistics module is more focused to hook_exit() during cached pages. This should have a normalized $_GET['q']. So, how about we add a "path" field to the "cache_page" table, and cache the normalized path along with the page content? Then either within drupal_serve_page_from_cache() or right after calling it from _drupal_bootstrap_page_cache(), we set $_GET['q'] to that cached value.

Thoughts? If it sounds good, anyone up for rolling such a patch?

#13

Status:active» needs review

So, how about we add a "path" field to the "cache_page" table...

Um, I guess we can't add fields to cache tables. So...

How about this?

AttachmentSizeStatusTest resultOperations
cached_page_info-692044-13.patch8.83 KBIdlePASSED: [[SimpleTest]]: [MySQL] 17,894 pass(es).View details

#14

We already have something for ESI tie in in drupal_render_cache_set(), couldn't the same pattern be used here? http://drupal.org/node/651902

So far we have two critical bugs open due to hook_url_*_alter() - I think it should just be rolled back to be honest, consequences were not thought through, all the fixes create pain.

#15

So far we have two critical bugs open due to hook_url_*_alter()

What's the other one? #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() is not marked critical.

#16

I would rather drop statistics module from core than take away url_alter hook for all developers. logging all page views in mysql is a pretty bad idea, and we should just acknowledge that. it can live on in contrib for folks who can't seem to find google analytics or apache logs or any other alternative.

#17

Quoting Moshe in #16: logging all page views in mysql is a pretty bad idea, and we should just acknowledge that. it can live on in contrib for folks who can't seem to find google analytics or apache logs or any other alternative.

I completely agree with this statement, having wondered repeatedly how Drupal's Statistics module fits into the general web access statistics world.
While I am only superficially familiar with Drupal's API and other internals, I am keenly aware of performance issues caused by disk access patterns. The less the better, and dropping Statistics from core would seem to fit the bill.

#18

To be honest, I think this is getting way to complex. What about rolling back hook_url_*_alter() ?

#19

What about rolling back hook_url_*_alter() ?

If we have to, we have to. I'm -1 for it, but I can appreciate the arguments for doing so. Let's realize we're actually talking about two hooks here: hook_url_inbound_alter() and hook_url_outbound_alter(). This issue points to a weakness of hook_url_inbound_alter(), though I think #13 is an okay solution, but I haven't had a chance yet to investigate what is suggested in #14.

I suspect if we have to roll back hook_url_inbound_alter(), then we need to roll back hook_url_outbound_alter() too. To do that, we would need to remove existing core implementations of that hook. For forum_url_outbound_alter(), we need to do that anyway, and have a patch for it: #699440: Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6). That would leave locale_url_outbound_alter().

#20

I agree with Moshe and others that we should keep hook_url_inbound/outbound_alter(), mainly from a strategic perspective.

Hopping through the API and bootstrap process, I'd see two potential solutions:

1) Add a {cache_page}.data column to store additional information (here, $_GET['q'], drupal_get_title()) with each entry. If done properly, that would most likely be interesting and helpful for other use-cases. Potential data handling (getter/setter) functions: hook_boot() and hook_exit(). Alternatively, or additionally, cache_get() and cache_set() could and probably should invoke a hook_cache_get|set_BIN().

2) Revert the removal of DRUPAL_BOOTSTRAP_PATH and turn hook_url_inbound_alter() into a bootstrap hook. Not sure why that was removed in the first place. There may be other use-cases other than statistics that require knowledge of the current system path prior to DRUPAL_BOOSTRAP_FULL.

#21

@Dries - in #16 I suggested that the hook is more important to the framework than statistics module. Are you in disagreement based on your suggestion in #18?

#22

At the risk of sticking my nose in ...

> the hook is more important to the framework than statistics module
part of the tension here is I think that large/busy sites have different requirements and performance issues from small/quieter sites. At the moment as far as I can make out Drupal is trying to work for both, which creates certain challenges.

> live on in contrib for folks who can't seem to find google analytics or apache logs or any other alternative.
I use both of those, but they have different uses than statistics module. And I imagine that alternatives (i.e. contrib modules yet to be written..??) would have the same problem with the bootstrap. The point about statistics module is that you just turn it on if you need it. However I agree it might ultimately have a better time in contrib - it's not had much love of late.

Another workaround might be to store the requested URL in {accesslog} which could then be used to get the normalised path for a cached response. However this wouldn't provide the additional benefits for other potential use cases (#20 option 1).

#23

I like sun's option #2. Add back DRUPAL_BOOTSTRAP_PATH and make hook_url_inbound_alter() a bootstrap hook. Then we don't need to make any schema changes just to accommodate statistics.module

We absolutely need to fix #699440: Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6) whether we were to rollback the alter hooks or not, so that in itself isn't an argument for a rollback.

http://api.drupal.org/api/function/locale_url_outbound_alter/7 is a proper use of the hook IMO - or at least afaik the code is extremely similar to the Drupal 6 code (which I can't locate at the moment).

So if these two are fixed, I'm in favour of leaving the hook in - the issue less the hook as such, more that we converted too many special cases to it for which it's not really suited.

#24

I'd be on-board with either of the two options in #20. While I think #13/#20.1 is cool, there's something to be said for invoking the principle of "solve the issue in the simplest way possible". Therefore, #20.2 is probably the better way to go. It was suggested in #4, and I don't even remember why I didn't like the idea back then. I think that like gpk (#6), I was concerned about the consequences of extra bootstrap modules. But, on thinking about it more, there really shouldn't be very many modules that implement hook_url_inbound_alter(). So +1 to that approach. #13/#20.1 can wait for another use-case, perhaps not until D8 (though maybe in D7 contrib).

#25

Status:needs review» needs work

CNW since a patch for that is needed.

#26

just roll back the !@#$%^&* hook. there was a good reason it never was a hook and it should still not be a hook. it's pointless to be a hook and it causes a major performance hit. what's so hard?

#27

Status:needs work» needs review

Why not simply removing this old crap that is {cache}.headers (that is used *only* for page cache), and simply implement the page cache in a more reasonable manner.

This allows us to pass some additional meta-data in the cached page, solving the issue at hand.

AttachmentSizeStatusTest resultOperations
692044-cached-page-metadata.patch9.97 KBIdlePASSED: [[SimpleTest]]: [MySQL] 0 pass(es).View details

#28

This thing is like weed, it goes everywhere. Removed more 'headers'.

AttachmentSizeStatusTest resultOperations
692044-cached-page-metadata.patch13.77 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,336 pass(es).View details

#29

Very nice work. Thats how you slay a critical!

Waiting for bot and perhaps more feedback before rtbc.

#30

Agreed. This is awesome. The patch leads to some bugs in statistics_exit() for cached pages. I'm working on adding tests for that. I have confidence that it will be easy to make those tests pass.

#31

With the test and minor code additions to make the test pass.

AttachmentSizeStatusTest resultOperations
692044-cached-page-metadata-31.patch19.2 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,371 pass(es).View details

#32

Status:needs review» needs work

The last submitted patch, 692044-cached-page-metadata-31.patch, failed testing.

#33

Status:needs work» needs review

#31: 692044-cached-page-metadata-31.patch queued for re-testing.

#34

Looks good to me, thanks for the additional tests and the fixes.

#35

Status:needs review» reviewed & tested by the community

#36

Looks great here too. Will need to document the API/schema change but it's a small one, and the schema change won't cause actual breakage anyway.

#37

Status:reviewed & tested by the community» needs work

+++ includes/bootstrap.inc 18 May 2010 01:10:40 -0000
@@ -1134,7 +1138,7 @@ function drupal_serve_page_from_cache(st
-      // $cache->data is already gzip'ed, so make sure zlib.output_compression
+      // $cache->data['body'] is already gzip'ed, so make sure zlib.output_compression
       // does not compress it once more.

Exceeds 80 chars.

+++ includes/bootstrap.inc 18 May 2010 01:10:40 -0000
@@ -1142,11 +1146,12 @@ function drupal_serve_page_from_cache(st
+  // Print the page.
+  print $cache->data['body'];

+++ includes/common.inc 18 May 2010 01:10:41 -0000
@@ -4518,28 +4518,32 @@ function drupal_page_set_cache() {
+        'body' => ob_get_clean(),

Wouldn't 'html', 'markup', or 'content' be a more appropriate array key instead of 'body'?

+++ modules/statistics/statistics.module 18 May 2010 01:10:45 -0000
@@ -51,9 +51,23 @@ function statistics_help($path, $arg) {
+  // If this is running for a cached page using the page_cache_without_database
+  // configuration, then we may not have bootstrapped the system variables yet.
+  // Websites running non-database page caches will almost certainly not have
+  // the statistics module enabled, but just in case, bootstrap the system
+  // variables. This wipes out most if not all of the performance gain from a
+  // non-database page cache, but if the statistics module is enabled, then it
+  // is also likely to have the 'statistics_enable_access_log' variable set, in
+  // which case, we'll be bootstrapping to the session phase anyway. In summary,
+  // we do not invoke an expensive full bootstrap, but for any site that does
+  // not want to bootstrap beyond DRUPAL_BOOTSTRAP_PAGE_CACHE for cached pages,

Awesome explanation + information. However, can we shorten all sentences, while keeping the information?

"When serving cached pages with the 'page_cache_without_database' configuration, system variables need to be loaded. This is a major performance decrease for non-database page caches, but with Statistics module, it is likely to also have 'statistics_enable_access_log' enabled, in which case we need to bootstrap to the session phase anyway."

+++ modules/statistics/statistics.module 18 May 2010 01:10:45 -0000
@@ -51,9 +51,23 @@ function statistics_help($path, $arg) {
+  // do not enable the statistics module. ¶

1) The "Statistics" module (proper name).

2) Trailing white-space.

+++ modules/statistics/statistics.module 18 May 2010 01:10:45 -0000
@@ -51,9 +51,23 @@ function statistics_help($path, $arg) {
+    // We need this to use the arg() function.
+    require_once DRUPAL_ROOT . '/' . variable_get('path_inc', 'includes/path.inc');

I wonder whether we shouldn't move arg() into bootstrap.inc or something, too? It's a few lines only, and we already have a group of path-related functions in bootstrap.inc it could be moved into. (IIRC, not having arg() available early was a WTF for me a couple of times already in the past)

Alternatively, we could simply inline $arg = explode('/', $_GET['q']); here.

I'd prefer to do the former though (move arg()).

+++ modules/update/update.install 18 May 2010 01:10:45 -0000
@@ -167,3 +167,10 @@ function update_update_7000() {
\ No newline at end of file

Missing newline.

Powered by Dreditor.

#38

Status:needs work» needs review

New version, integrating all of @sun's comments, except one: I stick with 'body', which is the HTTP terminology (headers / body).

AttachmentSizeStatusTest resultOperations
692044-cached-page-metadata.patch21.15 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,382 pass(es).View details

#39

+++ includes/bootstrap.inc
@@ -1134,19 +1134,24 @@ function drupal_serve_page_from_cache(stdClass $cache) {
+  // Restore the metadata that was cached with the page.
+  $_GET['q'] = $cache->data['path'];
+  drupal_set_title($cache->data['title'], PASS_THROUGH);

I wonder whether we shouldn't move this into drupal_page_get_cache() or _drupal_bootstrap_page_cache(), so as to make both available to hook_boot() and hook_exit()?

Powered by Dreditor.

#40

I need to give this a closer review to verify the smaller details but from a conceptual level this looks pretty clever.

It would be interesting to see some performance results.

I'm OK with 'body' -- that seems to be official term according to the HTTP specification. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.

#41

With #39. So tempting to put it into drupal_page_get_cache() for symmetry with drupal_page_set_cache(), but global side-effects within a "get" function just seems wrong. Therefore, put it in _drupal_bootstrap_page_cache().

AttachmentSizeStatusTest resultOperations
692044-cached-page-metadata-41.patch22.32 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,388 pass(es).View details

#42

Status:needs review» reviewed & tested by the community

Thanks, looks good.

AFAICS, there shouldn't be any measurable positive or negative performance impact from this patch.

#43

If there wasn't, then this issue wouldn't be critical - ideally someone should benchmark cached pages with statistics module on and off. Agreed that without statistics on this ought to be no change though.

#44

Benchmarks of anonymous user visiting the front page after a clean "standard profile" install (no content):

Statistics Off
without page cache: HEAD = 41.4ms, Patch = 41.3ms
with page cache: HEAD = 6.6ms, Patch = 6.6ms

Statistics On (access log and count content views enabled)
without page cache: HEAD = 43.0ms, Patch = 42.8ms
with page cache: HEAD = 32.8ms, Patch = 8.5ms

Each number is the "ab" (Apache Benchmark) average of 500 requests.

#45

Status:reviewed & tested by the community» fixed

Reviewed and looks very solid. Committed to CVS HEAD. Thanks.

#46

Status:fixed» needs review
Issue tags:+D7 upgrade path

Some upgrade issues:

In update_prepare_d7_bootstrap() we are creating table cache_bootstrap with headers field. We then drop it in an update hook. Better to just create it correctly in the first place.

Also, cache_block needs to have its header field dropped.

AttachmentSizeStatusTest resultOperations
remove_more_cache_headers.patch2.58 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,425 pass(es).View details

#47

Status:needs review» reviewed & tested by the community

Good catch!

#48

Project:Drupal core» HEAD to HEAD
Version:7.x-dev» <none>
Component:base system» Code
Priority:critical» normal
Status:reviewed & tested by the community» needs work

Committed to CVS HEAD. Moving to HEAD to HEAD project as this will break HEAD to HEAD upgrades.

#49

Project:HEAD to HEAD» Drupal core
Version:<none>» 7.x-dev
Component:Code» base system
Priority:normal» critical
Status:needs work» fixed

Let's just open a new issue: #803574: update for A site with statistics module enabled is much slower in serving cached pages in D7 than in D6

#50

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nobody click here