This is a first attempt at per-path path caching, but a much dumber approach than #223075: Adaptive path caching.
We cache two things:
1. Per $_GET['q'] cache the system paths for that page.
This is set to CACHE_TEMPORARY because it's likely to get outdated fairly quickly, and because it's just a single cache entry per page so cheap to rebuild. This is taken from information already collected in the $map static from drupal_lookup_path() and is set in drupal_page_footer()
2. Cache each path alias individually by language.
This avoids duplicating storage, and we're getting the paths out of cache in one query due to cache_get_multiple(). These cache entries will need to be cleared in path_set_alias() - not done that yet.
In drupal_lookup_path() - we fetch all the system paths which were on that page when we last cached them, and use cache_get_multiple() to fetch all the path aliases which match those. That means all paths which actually have aliases are dealt with in two database queries.
Anything not fetched from cache, goes through the normal drupal_lookup_path() query.
This works out as a query less per page per path alias which appears on that page. The only consequence of the system_path cache getting stale is that less paths will be fetched from cache - nothing will actually get broken.
I'm thinking about caching which system paths are no-ops as well, but again not there yet, just posting what I have so far.
Comments
Comment #1
catchBetter version.
Handles cache clearing in path_set_alias() - all cache entries matching the system path for the alias are cleared, as are all per-page cache entries for system paths.
Path tests are passing, manual testing suggests it's working well too - i.e. no stale entries when updating a path.
When we try to fetch a path from cache and don't get a record back, we don't try to get it again - this is because clearing the system_path cache entries in path_set_alias should mean we never miss a new path alias when it's added. It's now 2 queries for all cached paths on a page, and no no-op queries. The only extra queries now will be cache_set().
10 nodes, front page, 5 nodes have aliases.
HEAD:
82 queries
Patch:
54 queries
Comment #2
catchand the patch.
Comment #3
catchSame thing except this time we don't cache path aliases at all. The only cache maintained is an array of system paths per page. 57 instead of 54 queries for some reason, but only a maximum of 1 insert per page.
Comment #4
catchComment #5
catchgrr, missed one thing.
What happens now is this:
In drupal_page_footer() we cache the system paths for the current page. This is set as CACHE_TEMPORARY so it'll get cleared regularly, it's a cheap cache to rebuild - one cache_set() per page. The system paths are already available in $map and we just grab it from drupal_static().
The first time drupal_lookup_path() gets called for a page, we fetch all aliases corresponding to these paths in a single query and stuff them into $map.
If the paths for that page have changed since we cached, or if we don't have a cache of system paths for that page, then we just do the usual drupal_lookup_path() query to fetch the other aliases one by one. The more stale the cache gets, the more queries we have to do, but that's the only symptom.
Comment #6
catchouch. Now without cache_get_multiple() since that's not necessary any more.
Last patch of the night.
Comment #8
catchFixed the notice, added lots of documentation.
Comment #9
Dries CreditAttribution: Dries commentedThis looks nice and simple. It is the right approach, IMO.
1. Benchmarks would be useful.
2. I don't see when/where we clear the cache.
Comment #10
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI am not seeing that this would help a terrible lot for non-admins.
If #223075: Adaptive path caching is deemed too complicated, we could go back to #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() or (preferably ;) #100301: Page level path alias caching
The latter patch is part of the advcache module and thus probably used by some sites.
Comment #11
catchchx reviewed in irc and thought we'd fall down on #446346: url_aliases storage is not normalized. We actually don't because the paths are queried by ASC and then array keys are overwritten, but I hadn't clarified this in the code comments, so updated for that.
@Dries:
1. I couldn't get path generation working in devel HEAD - either need to track that down or fix up a D6 site with lots of paths to upgrade. But yes this definitely needs benchmarks.
2. The current plan is by setting CACHE_TEMPORARY, we allow each cache record to grow stale and get cleared up by the normal garbage collection process in cache_get(). I think we could then pair this approach with #186638: Add a 'keep until' column for cache tables which would allow contrib to provide a UI for a minimum cache lifetime.
I went for this over explicitly cache clearing because the only problem with a stale cache is a gradually increasing number of drupal_lookup_path() queries rather than stale data being displayed. I think it's better to have a few pages with additional queries because their caches are stale than have to rebuild the cache all in one go.
This assumes we can rely on CACHE_TEMPORARY to do reliable garbage collection of course. We also might want to make it REQUEST_TIME + 24 hours or something similar. I don't quite grok exactly what lasts how long and how in cache garbage collection, and there's a few bug reports outstanding on it as well, so could do with feedback on how best to deal with that.
Comment #12
catchkilles - could you explain why you think it won't help non-admins much?
I think 24 hours is a better expiry, so done it like that.
Comment #14
catchGenerated 10,000 nodes with path aliases and taxonomy terms in Drupal 6. Then upgraded this to HEAD, then patched. Took the opportunity to benchmark D6 as well :)
Did devel output when logged in as user/1, then ab as an anonymous user with no page caching. All on the front page. Also did some siege testing on 5 node paths.
D6 vs. HEAD shows we're saving 50% of database time (but losing a bunch of it in PHP it looks like since node/n is still noticeably slower going by siege). Nothing to do with the patch but good to know.
HEAD vs. patch shows 41 less queries for an admin, and about 6% increase in requests per second for anonymous users with no page caching.
With siege I also did a 2 minute benchmark with approx. 100 cache misses (by truncating cache_path 20 times during the benchmarks). Even with around 1 in 6 requests missing the cache there's still a small performance improvement, suggesting the cache isn't overly expensive to generate.
Devel says:
Comment #15
catchPaths weren't getting cached when visiting a page via a path alias due to the way drupal_lookup_path() populates $map during the request.
Patch fixes that bug but that shows we need basic tests, so CNW until those are done. Doesn't affect benchmarks above because that was using node/1 etc. as the paths to visit.
Comment #16
catchAdded a basic test just to confirm the cache gets set per page.
Back to CNR.
Comment #17
catchwithout debug.
Comment #18
moshe weitzman CreditAttribution: moshe weitzman commentedCode looks good. Noticed a verb problem and a typo in "We assume that expect that aiases will be needed for the majority of these paths during"
Like the page cache, does it make sense to have a flag where modules can disable path cache write for the current page. For example, if an admin sees the page it might not be very representative. I don't think it is that needed, but I'll just throw it out there.
Comment #19
Dries CreditAttribution: Dries commentedThat look like good results to me -- although I'm worried about the additional PHP overhead that you mentioned. Something to look into as well.
What I like about this patch, is its simplicity.
Comment #20
catchI'm not sure we need a don't cache flag, you could do something tricky with drupal_static() and the $system_path variable as it currently stands if you really needed it.
Fixed that typo.
Also found a page with more paths to see what our best-case gains are:
Node with 300 comments. This is an extreme example because every comment has edit / delete / reply on it - but that's the same as a long drupal.org issue for any site maintainer.
Devel output logged in as user 1:
HEAD:
Executed 1262 queries in 483.03 milliseconds.
Patch:
Executed 341 queries in 219.88 milliseconds.
Benchmarks with an anonymous user with view and post comments permission.
I did two benchmarks - one with the cache entry generated by uid 1 - so that's 900 paths in the cache entry, and 900 paths to query. And the other with the cache entry generated by uid 0 (not sure how many paths exactly but more like 350 or so). It's a bit slower to retrieve the huge cache entry and do the bigger query, but not so much that I think we need to add any additional checks in. I'd rather combine this with the blacklist/whitelist patch to remove the number of paths we look up or cache at all.
HEAD:
0.80 reqs/sec
Patch with bloated uid 1 generated cache entry:
0.88 reqs/sec
Patch with uid 0 generated cache entry:
0.92 reqs/sec
So the main thing remaining is whether we want to port the patch killes pointed to to D7 to do a side-by-side comparison. I'm not sure I'm up for the porting job but happy to run benches.
Comment #21
catchBy PHP overhead I meant between Drupal 6 and 7 rather than between HEAD and the patch, but yeah we need to track that stuff down a lot more.
The database savings here compared to overall page generation time for this patch look pretty consistent to me - i.e. on the normal front page from #14 we save around 5-20ms in the database - out of a total of 140ms per request (on my laptop) - which is pretty close to the c.7% improvement for the whole page - given the devel output has a much higher margin of error.
While I'm here though, here's webgrind screenshots on drupal_lookup_path() on the same node with 300 comments . These are both with the patch - one with the cache primed, one without. Self time is down by 50ms, incl. time is down by 1200ms (apparently).
Also included the cache_set() / drupal_cache_system_paths() section from the uncached page - that's not free but not ridiculous - and again it's caching all 900 paths on that page, so worst case for cache generation.
Comment #23
catchCan't reproduce locally so sending for a re-test.
Comment #24
catchTrying to organise the complete mess which is 'my issues'.
Comment #25
BerdirThat should be something like:
I *think* we can use $map[$path_language] = ...->fetchAllKeyed() here...
Same as above...
Comment #26
catchMade those changes, and yeah fetchAllKeyed() works :)
Comment #27
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedIt seems that I was a bit mistaken in #10. I probably confused "system paths" with "admin paths".
Comment #28
Dries CreditAttribution: Dries commented- No need to prefix things with 'path:system:' as you're using a dedicated table.
- Put the
->fetch*
on the same line with the closing ).Otherwise looks RTBC to me.
Comment #29
catchThe prefix was cruft left over from the first iteration of the patch when we had both system paths and aliases sharing the cache, that's not longer the case so it just needed to go. I put ->fetch() at the end of lines.
Comment #30
catchDries asked for a CHANGELOG.txt entry.
Comment #31
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks, catch!
Comment #32
catchdrupal_init_path() is called in bootstrap.inc, bootstrap.inc runs on update.php brokenness ensues.
In the hope that this will be my last comment on this otherwise lovely issue, see you over at #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() which I think we should revive to complement this (smaller cache entries, less paths to query when there's no cache, less paths in the big IN() when there is a cache).
Comment #33
webchickThe idea of a crufty ol' variable that sticks around for all eternity by virtue of having run update.php doesn't fill me with glee, but it is consistent with how we solved this same sort of issue in D6.
Committed to HEAD.
Comment #34
catch@webchick, were we to remove Drupal 6 updates in Drupal 7. We could remove the D6 variable. Then we could do the same in Drupal 8 for the Drupal 7 one.
Comment #36
stefan.r CreditAttribution: stefan.r commentedDrupal 6 port. I had to change array_diff_key into array_diff though?
Comment #37
lilou CreditAttribution: lilou commentedComment #38
lilou CreditAttribution: lilou commentedComment #39
ajayg CreditAttribution: ajayg commentedIs there a way we could get 6.x automatic tests run against this patch?
Comment #40
catchThere are no 6.x automatic tests in, and the testing framework is only set up for D7.
Comment #41
claudiu.cristeaDoes it work in D6?
Comment #42
EvanDonovan CreditAttribution: EvanDonovan commented@claudiu.cristea: The path cache patch in #36 works on Drupal 6. It offers a pretty significant reduction in queries once the paths are in the cache. I think this is also in Pressflow Drupal, if I'm not mistaken.
Comment #43
EvanDonovan CreditAttribution: EvanDonovan commentedActually, it looks like I'm getting an error with the patch (on D6):
Does anyone have any ideas as to why that might be happening?
Comment #44
scroogie CreditAttribution: scroogie commentedThe problem might be that their are quotes missing around the %s of $language in the first database query.
Comment #45
EvanDonovan CreditAttribution: EvanDonovan commentedThanks. I'll look into that - however, the query is coming from Drupal core (locale module), I believe, and never had issues before I applied this patch. Also, it doesn't happen consistently - on some pages it loads fine. I'll see what Devel query logging can tell me.
Comment #46
scroogie CreditAttribution: scroogie commentedThe query that you reported above comes from path.inc and was introduced in this patch.
Comment #47
EvanDonovan CreditAttribution: EvanDonovan commented@scroogie: You were correct, thanks! I was wrong about the query coming from locale module - it was actually coming from line 99 of path.inc, which had been changed by the patch. I had to make the following change:
From
to
Can someone reroll the patch? I'm not much good at that.With that change, I think it's ready for D6.
Comment #48
EvanDonovan CreditAttribution: EvanDonovan commentedActually, found another issue. 6.14 has some system_update functions in it that necessitate the update in the patch to be renumbered as system_update_6054().
After that, it should be good. I'm running it on my live site right now.
Comment #49
rmjiv CreditAttribution: rmjiv commentedSubscribing
Comment #50
rmjiv CreditAttribution: rmjiv commentedI'm pretty sure I've identified a bug in the D6 patch (#36). This problem lies with this line:
Since $system_paths is an array, db_query never replaces the $path_language argument and that clause always ends up as "...AND language in ('','')". Obviously this breaks any language specific path aliasing.
A quick fix would be something like this:
Comment #51
catchThis should be using db_placeholders().
Comment #52
rmjiv CreditAttribution: rmjiv commentedI didn't re-roll the patch because our system isn't on 6.14, but here is the if block for more context:
Comment #53
dyke it CreditAttribution: dyke it commentedsubscribing
Comment #54
droberge CreditAttribution: droberge commentedI've created the latest SVN patch for 6.14 based on the git patch from http://tag1consulting.com/patches/path-cache (patch file: http://tag1consulting.com/cgi-bin/gitweb.cgi?p=6.x-perf-patches.git/.git...)
Comment #55
Dave Reid@droberge Please don't change critical meta data like that. Ever.
Comment #56
Dave ReidComment #57
droberge CreditAttribution: droberge commentedSorry
Comment #58
Dave ReidMoving to 'path system' component.
Comment #59
mrfelton CreditAttribution: mrfelton commentedAttached patch takes patch from #54, adds the update hook in system.install from #36 and renumbers it to 6056.
EDIT: works for 6.16
Comment #60
mcurry CreditAttribution: mcurry commentedsubscribing
Comment #61
treksler CreditAttribution: treksler commentedsubscribe
Comment #62
jboeger CreditAttribution: jboeger commentedSubscribing.
Comment #63
catchUnassigning myself, this is in Pressflow in Drupal 6, which is enough for me.
Comment #64
andypostFollow-up for D7 #863318: Wrong sort order of aliases for different languages
Comment #65
carlescliment CreditAttribution: carlescliment commentedcache_path.patch queued for re-testing.
Comment #67
dgtlmoon CreditAttribution: dgtlmoon commentedsubscribing
Comment #68
yonailo CreditAttribution: yonailo commented+1 subscribing
Comment #69
dgtlmoon CreditAttribution: dgtlmoon commentedHow does pressflow implement it? can we use their patch?
Comment #70
VladSavitsky CreditAttribution: VladSavitsky commented...
Comment #71
VladSavitsky CreditAttribution: VladSavitsky commentedI have modified patch to be used for Drupal 6.20 and have tested with Pressflow 6.20.97.
PS. I've done a dummy work. This patch is not necessary for Pressflow because Pressflow has modules path_alias_cache and cookie_cache_bypass modules. Just enable them and have fun.
Comment #72
mrfelton CreditAttribution: mrfelton commentedI'm not sure what was going on with the patch in #71 but it seems to add a new function called _drupal_lookup_path_direct which is not used anywhere. Here is yet another updated patch for Drupal 6.22, based on the patch in #59.
Comment #73
mrfelton CreditAttribution: mrfelton commentedI guess this is never going to get committed, so whatever. But, we still have sites running on Drupal 6 that we can't or dont want to migrate to Pressflow for whatever reason, and this is a considerable performance hit on some sites. Attach patch is updated to work with latest D6 in git and resolves a slight error in the previous patch that came about due to the fact that this bloody patch has been re-rolled so many times. It's hard to get it right every time.
Comment #74
dgtlmoon CreditAttribution: dgtlmoon commented@mrfelton : great work!
Comment #75
ispboy CreditAttribution: ispboy commentedmark~
Comment #76
Benq CreditAttribution: Benq commented#73: 456824.73-drupal-lookup-path-slow-query_for-6.22.patch queued for re-testing.
Comment #77
ajayg CreditAttribution: ajayg commentedLooks something was missing in your step as the patch verification system doesn't seems to be invoked on your patch.
Comment #78
ChrisLaFrancis CreditAttribution: ChrisLaFrancis commentedsubstring_index() in drupal_path_alias_whitelist_rebuild() is MySQL-specific. I'm using split_part() with PostgreSQL as a workaround, but a database-independent method would be nice.
Comment #79
rahulbile CreditAttribution: rahulbile commentedPatch against 6.28.
Comment #80
ajayg CreditAttribution: ajayg commentedNice to see this getting passed against 6.28. Time to commit?
Comment #82
catchMoving this back to 'fixed' against Drupal 7 since it was committed there.