Closed (fixed)
Project:
Drupal core
Version:
7.x-dev
Component:
base system
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
21 Jan 2010 at 23:12 UTC
Updated:
24 Jan 2019 at 01:26 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
effulgentsia commentedtagging
Comment #2
catchsubscribing.
Comment #3
gpk commentedWhat does hook_url_inbound_alter() need to make it run successfully, apart from drupal_alter() being available?
Comment #4
dave reidMostly it was drupal_alter() and the assumption that since this is the opposite of hook_url_outbound_alter(), it should run the same bootstrap. We could make a copy of it and put it into bootstrap.inc (bootstrap_alter). Also in the test function we had used user_load_by_name() but I guess that could be converted to a simpler query:
SELECT uid FROM {users} WHERE name = :nameinstead of relying on user.module being loaded. I guess if a module implementation of hook_url_inbound_alter() needed a full bootstrap it could do so itself and we can add BOOTSTRAP_PATH back in.Other changes required to revert removing the bootstrap:
- Add 'url_inbound_alter' to http://api.drupal.org/api/function/bootstrap_hooks/7
- Add to the hook documentation that a full bootstrap is not available.
Comment #5
dave reidI'd also like to see more benchmarks with the current code and statistics module enabled. Ones I did indicated little difference. Recent ones indicate huge difference.
Comment #6
gpk commented@5: my hunch in #320331-121: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks was that the page cache probably wasn't turned on for your tests.
Re 4:
> - Add 'url_inbound_alter' to http://api.drupal.org/api/function/bootstrap_hooks/7
Given that the 'url_inbound_alter' hook is only required during a cached page response if a module (e.g. statistics) has upped the bootstrap phase to DRUPAL_BOOTSTRAP_PATH, can we distinguish between boot/exit/watchdog and url_inbound_alter? Otherwise we could be loading modules unnecessarily. The bootstrap column in {system} is already a full-blown int so there's certainly room to store the info..
Comment #7
dave reidIt would needed to be added to bootstrap_hooks since it's invoked prior to all modules loaded. Otherwise modules that do implement url_inbound_alter() but not the other bootstrap hooks would never be run.
Comment #8
gpk commentedI was thinking that in many cases this could cause such modules to be loaded unnecessarily, however possibly it would require too big an API change at this stage to finesse the loading of these modules. Same could also be said for modules implementing hook_watchdog - in 99% of cached page requests they will not be needed.
Comment #9
sun.core commentedNot critical. Please read and understand Priority levels of Issues, thanks.
Comment #10
effulgentsia commentedTo be honest, I don't entirely understand the description of "critical" on that page, but if "critical" means we should not release Drupal 7 until this is resolved, then this issue is critical as per #320331-126: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks and remaining discussion on that issue. Turning url inbound rewriting from a handler to an alter hook resulted in a site with statistics module enabled having 500% or worse performance degradation for serving cached pages, and according to webchick, this is unacceptable. This issue can be marked fixed if we roll that other one back. But I don't accept that that's the only path towards fixing this.
Comment #11
catchYes, we either fix this, or we roll back hook_url_*_alter(). Note that the hooks also cause potentially infinite recursion if you call url() inside taxonomy_term_load() per #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter().
Comment #12
effulgentsia commentedBetter issue title I think, and one that got me thinking about an alternate way to solve the problem.
I don't think it's necessary for hook_boot() to have a fully normalized $_GET['q']. I think it's reasonable for us to say that the process of normalizing $_GET['q'] requires module involvement, and therefore doesn't happen until full bootstrap. I'm not too keen on making hook_url_inbound_alter() a bootstrap level hook (#6 - #8), unless some other compelling use-case forces us into that.
The issue with statistics module is more focused to hook_exit() during cached pages. This should have a normalized $_GET['q']. So, how about we add a "path" field to the "cache_page" table, and cache the normalized path along with the page content? Then either within drupal_serve_page_from_cache() or right after calling it from _drupal_bootstrap_page_cache(), we set $_GET['q'] to that cached value.
Thoughts? If it sounds good, anyone up for rolling such a patch?
Comment #13
effulgentsia commentedUm, I guess we can't add fields to cache tables. So...
How about this?
Comment #14
catchWe already have something for ESI tie in in drupal_render_cache_set(), couldn't the same pattern be used here? http://drupal.org/node/651902
So far we have two critical bugs open due to hook_url_*_alter() - I think it should just be rolled back to be honest, consequences were not thought through, all the fixes create pain.
Comment #15
effulgentsia commentedWhat's the other one? #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() is not marked critical.
Comment #16
moshe weitzman commentedI would rather drop statistics module from core than take away url_alter hook for all developers. logging all page views in mysql is a pretty bad idea, and we should just acknowledge that. it can live on in contrib for folks who can't seem to find google analytics or apache logs or any other alternative.
Comment #17
christophweber commentedQuoting Moshe in #16: logging all page views in mysql is a pretty bad idea, and we should just acknowledge that. it can live on in contrib for folks who can't seem to find google analytics or apache logs or any other alternative.
I completely agree with this statement, having wondered repeatedly how Drupal's Statistics module fits into the general web access statistics world.
While I am only superficially familiar with Drupal's API and other internals, I am keenly aware of performance issues caused by disk access patterns. The less the better, and dropping Statistics from core would seem to fit the bill.
Comment #18
dries commentedTo be honest, I think this is getting way to complex. What about rolling back hook_url_*_alter() ?
Comment #19
effulgentsia commentedIf we have to, we have to. I'm -1 for it, but I can appreciate the arguments for doing so. Let's realize we're actually talking about two hooks here: hook_url_inbound_alter() and hook_url_outbound_alter(). This issue points to a weakness of hook_url_inbound_alter(), though I think #13 is an okay solution, but I haven't had a chance yet to investigate what is suggested in #14.
I suspect if we have to roll back hook_url_inbound_alter(), then we need to roll back hook_url_outbound_alter() too. To do that, we would need to remove existing core implementations of that hook. For forum_url_outbound_alter(), we need to do that anyway, and have a patch for it: #699440: Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6). That would leave locale_url_outbound_alter().
Comment #20
sunI agree with Moshe and others that we should keep hook_url_inbound/outbound_alter(), mainly from a strategic perspective.
Hopping through the API and bootstrap process, I'd see two potential solutions:
1) Add a {cache_page}.data column to store additional information (here, $_GET['q'], drupal_get_title()) with each entry. If done properly, that would most likely be interesting and helpful for other use-cases. Potential data handling (getter/setter) functions: hook_boot() and hook_exit(). Alternatively, or additionally, cache_get() and cache_set() could and probably should invoke a hook_cache_get|set_BIN().
2) Revert the removal of DRUPAL_BOOTSTRAP_PATH and turn hook_url_inbound_alter() into a bootstrap hook. Not sure why that was removed in the first place. There may be other use-cases other than statistics that require knowledge of the current system path prior to DRUPAL_BOOSTRAP_FULL.
Comment #21
moshe weitzman commented@Dries - in #16 I suggested that the hook is more important to the framework than statistics module. Are you in disagreement based on your suggestion in #18?
Comment #22
gpk commentedAt the risk of sticking my nose in ...
> the hook is more important to the framework than statistics module
part of the tension here is I think that large/busy sites have different requirements and performance issues from small/quieter sites. At the moment as far as I can make out Drupal is trying to work for both, which creates certain challenges.
> live on in contrib for folks who can't seem to find google analytics or apache logs or any other alternative.
I use both of those, but they have different uses than statistics module. And I imagine that alternatives (i.e. contrib modules yet to be written..??) would have the same problem with the bootstrap. The point about statistics module is that you just turn it on if you need it. However I agree it might ultimately have a better time in contrib - it's not had much love of late.
Another workaround might be to store the requested URL in {accesslog} which could then be used to get the normalised path for a cached response. However this wouldn't provide the additional benefits for other potential use cases (#20 option 1).
Comment #23
catchI like sun's option #2. Add back DRUPAL_BOOTSTRAP_PATH and make hook_url_inbound_alter() a bootstrap hook. Then we don't need to make any schema changes just to accommodate statistics.module
We absolutely need to fix #699440: Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6) whether we were to rollback the alter hooks or not, so that in itself isn't an argument for a rollback.
http://api.drupal.org/api/function/locale_url_outbound_alter/7 is a proper use of the hook IMO - or at least afaik the code is extremely similar to the Drupal 6 code (which I can't locate at the moment).
So if these two are fixed, I'm in favour of leaving the hook in - the issue less the hook as such, more that we converted too many special cases to it for which it's not really suited.
Comment #24
effulgentsia commentedI'd be on-board with either of the two options in #20. While I think #13/#20.1 is cool, there's something to be said for invoking the principle of "solve the issue in the simplest way possible". Therefore, #20.2 is probably the better way to go. It was suggested in #4, and I don't even remember why I didn't like the idea back then. I think that like gpk (#6), I was concerned about the consequences of extra bootstrap modules. But, on thinking about it more, there really shouldn't be very many modules that implement hook_url_inbound_alter(). So +1 to that approach. #13/#20.1 can wait for another use-case, perhaps not until D8 (though maybe in D7 contrib).
Comment #25
effulgentsia commentedCNW since a patch for that is needed.
Comment #26
chx commentedjust roll back the !@#$%^&* hook. there was a good reason it never was a hook and it should still not be a hook. it's pointless to be a hook and it causes a major performance hit. what's so hard?
Comment #27
damien tournoud commentedWhy not simply removing this old crap that is
{cache}.headers(that is used *only* for page cache), and simply implement the page cache in a more reasonable manner.This allows us to pass some additional meta-data in the cached page, solving the issue at hand.
Comment #28
damien tournoud commentedThis thing is like weed, it goes everywhere. Removed more 'headers'.
Comment #29
moshe weitzman commentedVery nice work. Thats how you slay a critical!
Waiting for bot and perhaps more feedback before rtbc.
Comment #30
effulgentsia commentedAgreed. This is awesome. The patch leads to some bugs in statistics_exit() for cached pages. I'm working on adding tests for that. I have confidence that it will be easy to make those tests pass.
Comment #31
effulgentsia commentedWith the test and minor code additions to make the test pass.
Comment #33
effulgentsia commented#31: 692044-cached-page-metadata-31.patch queued for re-testing.
Comment #34
damien tournoud commentedLooks good to me, thanks for the additional tests and the fixes.
Comment #35
yesct commentedComment #36
catchLooks great here too. Will need to document the API/schema change but it's a small one, and the schema change won't cause actual breakage anyway.
Comment #37
sunExceeds 80 chars.
Wouldn't 'html', 'markup', or 'content' be a more appropriate array key instead of 'body'?
Awesome explanation + information. However, can we shorten all sentences, while keeping the information?
"When serving cached pages with the 'page_cache_without_database' configuration, system variables need to be loaded. This is a major performance decrease for non-database page caches, but with Statistics module, it is likely to also have 'statistics_enable_access_log' enabled, in which case we need to bootstrap to the session phase anyway."
1) The "Statistics" module (proper name).
2) Trailing white-space.
I wonder whether we shouldn't move arg() into bootstrap.inc or something, too? It's a few lines only, and we already have a group of path-related functions in bootstrap.inc it could be moved into. (IIRC, not having arg() available early was a WTF for me a couple of times already in the past)
Alternatively, we could simply inline
$arg = explode('/', $_GET['q']);here.I'd prefer to do the former though (move arg()).
Missing newline.
Powered by Dreditor.
Comment #38
damien tournoud commentedNew version, integrating all of @sun's comments, except one: I stick with 'body', which is the HTTP terminology (headers / body).
Comment #39
sunI wonder whether we shouldn't move this into drupal_page_get_cache() or _drupal_bootstrap_page_cache(), so as to make both available to hook_boot() and hook_exit()?
Powered by Dreditor.
Comment #40
dries commentedI need to give this a closer review to verify the smaller details but from a conceptual level this looks pretty clever.
It would be interesting to see some performance results.
I'm OK with 'body' -- that seems to be official term according to the HTTP specification. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.
Comment #41
effulgentsia commentedWith #39. So tempting to put it into drupal_page_get_cache() for symmetry with drupal_page_set_cache(), but global side-effects within a "get" function just seems wrong. Therefore, put it in _drupal_bootstrap_page_cache().
Comment #42
sunThanks, looks good.
AFAICS, there shouldn't be any measurable positive or negative performance impact from this patch.
Comment #43
catchIf there wasn't, then this issue wouldn't be critical - ideally someone should benchmark cached pages with statistics module on and off. Agreed that without statistics on this ought to be no change though.
Comment #44
effulgentsia commentedBenchmarks of anonymous user visiting the front page after a clean "standard profile" install (no content):
Statistics Off
without page cache: HEAD = 41.4ms, Patch = 41.3ms
with page cache: HEAD = 6.6ms, Patch = 6.6ms
Statistics On (access log and count content views enabled)
without page cache: HEAD = 43.0ms, Patch = 42.8ms
with page cache: HEAD = 32.8ms, Patch = 8.5ms
Each number is the "ab" (Apache Benchmark) average of 500 requests.
Comment #45
dries commentedReviewed and looks very solid. Committed to CVS HEAD. Thanks.
Comment #46
jbrown commentedSome upgrade issues:
In update_prepare_d7_bootstrap() we are creating table cache_bootstrap with headers field. We then drop it in an update hook. Better to just create it correctly in the first place.
Also, cache_block needs to have its header field dropped.
Comment #47
damien tournoud commentedGood catch!
Comment #48
dries commentedCommitted to CVS HEAD. Moving to HEAD to HEAD project as this will break HEAD to HEAD upgrades.
Comment #49
pwolanin commentedLet's just open a new issue: #803574: update for A site with statistics module enabled is much slower in serving cached pages in D7 than in D6
Comment #52
mfbDoes anyone happen to remember why statistics module needs to bootstrap to the session phase in statistics_exit()? If the request didn't already bootstrap to session phase - i.e. it's being served from cache - then this would generate a "bogus" session ID that can be logged by statistics module. But this doesn't seem particularly useful.
In recent versions of PHP, initializing session after serving the page throws a PHP warning, which I was researching in #3009351: "session_id(): Cannot change session id". I don't use statistics module so not clear what it's trying to achieve.