This is a first attempt at per-path path caching, but a much dumber approach than #223075: Adaptive path caching.

We cache two things:

1. Per $_GET['q'] cache the system paths for that page.

This is set to CACHE_TEMPORARY because it's likely to get outdated fairly quickly, and because it's just a single cache entry per page so cheap to rebuild. This is taken from information already collected in the $map static from drupal_lookup_path() and is set in drupal_page_footer()

2. Cache each path alias individually by language.
This avoids duplicating storage, and we're getting the paths out of cache in one query due to cache_get_multiple(). These cache entries will need to be cleared in path_set_alias() - not done that yet.

In drupal_lookup_path() - we fetch all the system paths which were on that page when we last cached them, and use cache_get_multiple() to fetch all the path aliases which match those. That means all paths which actually have aliases are dealt with in two database queries.

Anything not fetched from cache, goes through the normal drupal_lookup_path() query.

This works out as a query less per page per path alias which appears on that page. The only consequence of the system_path cache getting stale is that less paths will be fetched from cache - nothing will actually get broken.

I'm thinking about caching which system paths are no-ops as well, but again not there yet, just posting what I have so far.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Priority: Normal » Critical
Status: Needs work » Needs review

Better version.

Handles cache clearing in path_set_alias() - all cache entries matching the system path for the alias are cleared, as are all per-page cache entries for system paths.

Path tests are passing, manual testing suggests it's working well too - i.e. no stale entries when updating a path.

When we try to fetch a path from cache and don't get a record back, we don't try to get it again - this is because clearing the system_path cache entries in path_set_alias should mean we never miss a new path alias when it's added. It's now 2 queries for all cached paths on a page, and no no-op queries. The only extra queries now will be cache_set().

10 nodes, front page, 5 nodes have aliases.

HEAD:
82 queries

Patch:
54 queries

catch’s picture

FileSize
11.59 KB

and the patch.

catch’s picture

FileSize
10.81 KB

Same thing except this time we don't cache path aliases at all. The only cache maintained is an array of system paths per page. 57 instead of 54 queries for some reason, but only a maximum of 1 insert per page.

catch’s picture

Title: Cache path aliases » drupal_lookup_path() speedup - cache system paths per page.
catch’s picture

FileSize
10.76 KB

grr, missed one thing.

What happens now is this:

In drupal_page_footer() we cache the system paths for the current page. This is set as CACHE_TEMPORARY so it'll get cleared regularly, it's a cheap cache to rebuild - one cache_set() per page. The system paths are already available in $map and we just grab it from drupal_static().

The first time drupal_lookup_path() gets called for a page, we fetch all aliases corresponding to these paths in a single query and stuff them into $map.

If the paths for that page have changed since we cached, or if we don't have a cache of system paths for that page, then we just do the usual drupal_lookup_path() query to fetch the other aliases one by one. The more stale the cache gets, the more queries we have to do, but that's the only symptom.

catch’s picture

FileSize
6.11 KB

ouch. Now without cache_get_multiple() since that's not necessary any more.

Last patch of the night.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
6.43 KB

Fixed the notice, added lots of documentation.

Dries’s picture

This looks nice and simple. It is the right approach, IMO.

1. Benchmarks would be useful.

2. I don't see when/where we clear the cache.

killes@www.drop.org’s picture

I am not seeing that this would help a terrible lot for non-admins.

If #223075: Adaptive path caching is deemed too complicated, we could go back to #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() or (preferably ;) #100301: Page level path alias caching

The latter patch is part of the advcache module and thus probably used by some sites.

catch’s picture

FileSize
10.95 KB

chx reviewed in irc and thought we'd fall down on #446346: url_aliases storage is not normalized. We actually don't because the paths are queried by ASC and then array keys are overwritten, but I hadn't clarified this in the code comments, so updated for that.

@Dries:

1. I couldn't get path generation working in devel HEAD - either need to track that down or fix up a D6 site with lots of paths to upgrade. But yes this definitely needs benchmarks.

2. The current plan is by setting CACHE_TEMPORARY, we allow each cache record to grow stale and get cleared up by the normal garbage collection process in cache_get(). I think we could then pair this approach with #186638: Add a 'keep until' column for cache tables which would allow contrib to provide a UI for a minimum cache lifetime.

I went for this over explicitly cache clearing because the only problem with a stale cache is a gradually increasing number of drupal_lookup_path() queries rather than stale data being displayed. I think it's better to have a few pages with additional queries because their caches are stale than have to rebuild the cache all in one go.

This assumes we can rely on CACHE_TEMPORARY to do reliable garbage collection of course. We also might want to make it REQUEST_TIME + 24 hours or something similar. I don't quite grok exactly what lasts how long and how in cache garbage collection, and there's a few bug reports outstanding on it as well, so could do with feedback on how best to deal with that.

catch’s picture

FileSize
10.99 KB

killes - could you explain why you think it won't help non-admins much?

I think 24 hours is a better expiry, so done it like that.

catch’s picture

FileSize
115 bytes
2.34 KB

Generated 10,000 nodes with path aliases and taxonomy terms in Drupal 6. Then upgraded this to HEAD, then patched. Took the opportunity to benchmark D6 as well :)

Did devel output when logged in as user/1, then ab as an anonymous user with no page caching. All on the front page. Also did some siege testing on 5 node paths.

D6 vs. HEAD shows we're saving 50% of database time (but losing a bunch of it in PHP it looks like since node/n is still noticeably slower going by siege). Nothing to do with the patch but good to know.

HEAD vs. patch shows 41 less queries for an admin, and about 6% increase in requests per second for anonymous users with no page caching.

With siege I also did a 2 minute benchmark with approx. 100 cache misses (by truncating cache_path 20 times during the benchmarks). Even with around 1 in 6 requests missing the cache there's still a small performance improvement, suggesting the cache isn't overly expensive to generate.

Devel says:

D6:
Executed 119 queries in [between 90 - 130ms]

HEAD (9 page requests, all caches primed):
Executed 94 queries in 53.35 milliseconds. 
Executed 94 queries in 46.23 milliseconds.
Executed 97 queries in 45.84 milliseconds. 
Executed 94 queries in 50.11 milliseconds.
Executed 94 queries in 50.24 milliseconds.
Executed 94 queries in 49.62 milliseconds.
Executed 94 queries in 44.93 milliseconds.
Executed 94 queries in 43.7 milliseconds.
Executed 94 queries in 46.27 milliseconds.

Patch (9 page requests all caches primed):
Executed 53 queries in 40.45 milliseconds.
Executed 53 queries in 41.73 milliseconds.
Executed 53 queries in 44.2 milliseconds.
Executed 53 queries in 43.1 milliseconds. 
Executed 53 queries in 38.85 milliseconds.
Executed 56 queries in 33.77 milliseconds. 
Executed 53 queries in 33.02 milliseconds.
Executed 53 queries in 38.91 milliseconds. 
Executed 53 queries in 41.33 milliseconds.
ab -c1 -n500 http://example.com/ says:

D6:
5.96 [#/sec]
5.92 [#/sec]

HEAD:
6.34 [#/sec]
6.31 [#/sec]

Patch:
6.85 [#/sec]
6.87 [#/sec]
siege -c 3 -t 120s -f url_list.txt

D6 (same paths as HEAD):
5.24 trans/sec

HEAD:
4.82 trans/sec

Patch (with primed cache):
5.04 trans/sec

Patch (with empty cache at start):
4.94 trans/sec

Patch (truncating cache_path 20 times while running siege to simulate 100 cache misses):
4.90 trans/sec
catch’s picture

Status: Needs review » Needs work
FileSize
11.68 KB

Paths weren't getting cached when visiting a page via a path alias due to the way drupal_lookup_path() populates $map during the request.

Patch fixes that bug but that shows we need basic tests, so CNW until those are done. Doesn't affect benchmarks above because that was using node/1 etc. as the paths to visit.

catch’s picture

Status: Needs work » Needs review
FileSize
8.7 KB

Added a basic test just to confirm the cache gets set per page.

Back to CNR.

catch’s picture

FileSize
8.09 KB

without debug.

moshe weitzman’s picture

Code looks good. Noticed a verb problem and a typo in "We assume that expect that aiases will be needed for the majority of these paths during"

Like the page cache, does it make sense to have a flag where modules can disable path cache write for the current page. For example, if an admin sees the page it might not be very representative. I don't think it is that needed, but I'll just throw it out there.

Dries’s picture

That look like good results to me -- although I'm worried about the additional PHP overhead that you mentioned. Something to look into as well.

What I like about this patch, is its simplicity.

catch’s picture

FileSize
8.08 KB

I'm not sure we need a don't cache flag, you could do something tricky with drupal_static() and the $system_path variable as it currently stands if you really needed it.

Fixed that typo.

Also found a page with more paths to see what our best-case gains are:

Node with 300 comments. This is an extreme example because every comment has edit / delete / reply on it - but that's the same as a long drupal.org issue for any site maintainer.

Devel output logged in as user 1:

HEAD:
Executed 1262 queries in 483.03 milliseconds.

Patch:
Executed 341 queries in 219.88 milliseconds.

Benchmarks with an anonymous user with view and post comments permission.

I did two benchmarks - one with the cache entry generated by uid 1 - so that's 900 paths in the cache entry, and 900 paths to query. And the other with the cache entry generated by uid 0 (not sure how many paths exactly but more like 350 or so). It's a bit slower to retrieve the huge cache entry and do the bigger query, but not so much that I think we need to add any additional checks in. I'd rather combine this with the blacklist/whitelist patch to remove the number of paths we look up or cache at all.

HEAD:
0.80 reqs/sec

Patch with bloated uid 1 generated cache entry:
0.88 reqs/sec

Patch with uid 0 generated cache entry:
0.92 reqs/sec

So the main thing remaining is whether we want to port the patch killes pointed to to D7 to do a side-by-side comparison. I'm not sure I'm up for the porting job but happy to run benches.

catch’s picture

FileSize
51.5 KB
54.99 KB
44.19 KB

By PHP overhead I meant between Drupal 6 and 7 rather than between HEAD and the patch, but yeah we need to track that stuff down a lot more.

The database savings here compared to overall page generation time for this patch look pretty consistent to me - i.e. on the normal front page from #14 we save around 5-20ms in the database - out of a total of 140ms per request (on my laptop) - which is pretty close to the c.7% improvement for the whole page - given the devel output has a much higher margin of error.

While I'm here though, here's webgrind screenshots on drupal_lookup_path() on the same node with 300 comments . These are both with the patch - one with the cache primed, one without. Self time is down by 50ms, incl. time is down by 1200ms (apparently).

Also included the cache_set() / drupal_cache_system_paths() section from the uncached page - that's not free but not ridiculous - and again it's caching all 900 paths on that page, so worst case for cache generation.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

Can't reproduce locally so sending for a re-test.

catch’s picture

Assigned: Unassigned » catch

Trying to organise the complete mess which is 'my issues'.

Berdir’s picture

Status: Needs review » Needs work
+          $result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN(:system) AND language IN(:language, '') ORDER BY language ASC", array(
+                      ':system' => $system_paths,
+                      ':language' => $path_language));

That should be something like:

            $result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN(:system) AND language IN(:language, '') ORDER BY language ASC", array(
              ':system' => $system_paths,
              ':language' => $path_language
            ));
+          foreach ($result as $record) {
+            $map[$path_language][$record->src] = $record->dst;
+          }

I *think* we can use $map[$path_language] = ...->fetchAllKeyed() here...

+        $alias = db_query("SELECT dst FROM {url_alias} WHERE src = :src AND language IN(:language, '') ORDER BY language DESC", array(
+          ':src' => $path,
+          ':language' => $path_language))
+          ->fetchField();

Same as above...

catch’s picture

Status: Needs work » Needs review
FileSize
8.05 KB

Made those changes, and yeah fetchAllKeyed() works :)

killes@www.drop.org’s picture

It seems that I was a bit mistaken in #10. I probably confused "system paths" with "admin paths".

Dries’s picture

- No need to prefix things with 'path:system:' as you're using a dedicated table.

- Put the ->fetch* on the same line with the closing ).

Otherwise looks RTBC to me.

catch’s picture

FileSize
7.95 KB

The prefix was cruft left over from the first iteration of the patch when we had both system paths and aliases sharing the cache, that's not longer the case so it just needed to go. I put ->fetch() at the end of lines.

catch’s picture

FileSize
8.68 KB

Dries asked for a CHANGELOG.txt entry.

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Thanks, catch!

catch’s picture

Status: Fixed » Needs review
FileSize
2.17 KB

drupal_init_path() is called in bootstrap.inc, bootstrap.inc runs on update.php brokenness ensues.

In the hope that this will be my last comment on this otherwise lovely issue, see you over at #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() which I think we should revive to complement this (smaller cache entries, less paths to query when there's no cache, less paths in the big IN() when there is a cache).

webchick’s picture

Status: Needs review » Fixed

The idea of a crufty ol' variable that sticks around for all eternity by virtue of having run update.php doesn't fill me with glee, but it is consistent with how we solved this same sort of issue in D6.

Committed to HEAD.

catch’s picture

@webchick, were we to remove Drupal 6 updates in Drupal 7. We could remove the D6 variable. Then we could do the same in Drupal 8 for the Drupal 7 one.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

stefan.r’s picture

FileSize
7.34 KB

Drupal 6 port. I had to change array_diff_key into array_diff though?

lilou’s picture

Version: 7.x-dev » 6.x-dev
lilou’s picture

Status: Closed (fixed) » Needs review
ajayg’s picture

Is there a way we could get 6.x automatic tests run against this patch?

catch’s picture

There are no 6.x automatic tests in, and the testing framework is only set up for D7.

claudiu.cristea’s picture

Does it work in D6?

EvanDonovan’s picture

@claudiu.cristea: The path cache patch in #36 works on Drupal 6. It offers a pretty significant reduction in queries once the paths are in the cache. I think this is also in Pressflow Drupal, if I'm not mistaken.

EvanDonovan’s picture

Actually, it looks like I'm getting an error with the patch (on D6):

Warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ' '') ORDER BY language ASC' at line 1 query: SELECT src, dst FROM um_url_alias WHERE src IN('admin/settings/performance','rdf/schema','rdf','admin/build/block/configure/locale/0','admin','admin/settings', 'admin/settings/actions','admin/reports/status/run-cron','admin/reports/updates/check','user','admin/review','admin/content','admin/build','admin/user','admin/modules','civicrm','admin/store','admin/reports','admin/rules','logout','admin/content/add','admin/content/audio_import','admin/review/all','admin/review/tagger','admin/review/moderate','admin/review/moderate-orgs','admin/review/moderate-opp in 
/home/techmi5/public_html/urbanministryjobs/includes/database.mysql.inc on line 128

Does anyone have any ideas as to why that might be happening?

scroogie’s picture

The problem might be that their are quotes missing around the %s of $language in the first database query.

EvanDonovan’s picture

Thanks. I'll look into that - however, the query is coming from Drupal core (locale module), I believe, and never had issues before I applied this patch. Also, it doesn't happen consistently - on some pages it loads fine. I'll see what Devel query logging can tell me.

scroogie’s picture

The query that you reported above comes from path.inc and was introduced in this patch.

EvanDonovan’s picture

@scroogie: You were correct, thanks! I was wrong about the query coming from locale module - it was actually coming from line 99 of path.inc, which had been changed by the patch. I had to make the following change:

From

$result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN($placeholders) AND language IN(%s, '') ORDER BY language ASC", $system_paths, $path_language); 

to

$result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN($placeholders) AND language IN('%s', '') ORDER BY language ASC", $system_paths, $path_language); 

Can someone reroll the patch? I'm not much good at that.With that change, I think it's ready for D6.

EvanDonovan’s picture

Actually, found another issue. 6.14 has some system_update functions in it that necessitate the update in the patch to be renumbered as system_update_6054().

After that, it should be good. I'm running it on my live site right now.

rmjiv’s picture

Subscribing

rmjiv’s picture

I'm pretty sure I've identified a bug in the D6 patch (#36). This problem lies with this line:

 $result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN($placeholders) AND language IN('%s', '') ORDER BY language ASC", $system_paths, $path_language);

Since $system_paths is an array, db_query never replaces the $path_language argument and that clause always ends up as "...AND language in ('','')". Obviously this breaks any language specific path aliasing.

A quick fix would be something like this:

$args = $system_paths;
$args[] = $path_language;
$result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN($placeholders) AND language IN('%s', '') ORDER BY language ASC", $args);
catch’s picture

Status: Needs review » Needs work

This should be using db_placeholders().

rmjiv’s picture

I didn't re-roll the patch because our system isn't on 6.14, but here is the if block for more context:

        if ($cache = cache_get($cid, 'cache_path')) {
          // Now fetch the aliases corresponding to these system paths.
          // We order by ASC and overwrite array keys to ensure the correct
          // alias is used when there are multiple aliases per path.
          $system_paths = $cache->data;
          $placeholders = db_placeholders($system_paths, 'varchar');
          $args = $system_paths;
          $args[] = $path_language;
          $result = db_query("SELECT src, dst FROM {url_alias} WHERE src IN($placeholders) AND language IN('%s', '') ORDER BY language ASC", $args
          );
          while ($record = db_fetch_object($result)) {
            $map[$path_language][$record->src] = $record->dst;
          }
          // Keep a record of paths with no alias to avoid querying twice.
          $no_aliases[$path_language] = array_flip(array_diff($system_paths, array_keys($map[$path_language])));
        }
dyke it’s picture

subscribing

droberge’s picture

Title: drupal_lookup_path() speedup - cache system paths per page. » new patch
Version: 6.x-dev » 6.14
Priority: Critical » Normal
Status: Needs work » Active
FileSize
9.61 KB

I've created the latest SVN patch for 6.14 based on the git patch from http://tag1consulting.com/patches/path-cache (patch file: http://tag1consulting.com/cgi-bin/gitweb.cgi?p=6.x-perf-patches.git/.git...)

Dave Reid’s picture

Title: new patch » rupal_lookup_path() speedup - cache system paths per page.
Version: 6.14 » 6.x-dev
Priority: Normal » Critical
Status: Active » Needs work

@droberge Please don't change critical meta data like that. Ever.

Dave Reid’s picture

Title: rupal_lookup_path() speedup - cache system paths per page. » drupal_lookup_path() speedup - cache system paths per page.
droberge’s picture

Sorry

Dave Reid’s picture

Component: menu system » path.module

Moving to 'path system' component.

mrfelton’s picture

Status: Needs work » Needs review
FileSize
10.64 KB

Attached patch takes patch from #54, adds the update hook in system.install from #36 and renumbers it to 6056.

EDIT: works for 6.16

mcurry’s picture

subscribing

treksler’s picture

subscribe

jboeger’s picture

Subscribing.

catch’s picture

Assigned: catch » Unassigned
Priority: Critical » Normal

Unassigning myself, this is in Pressflow in Drupal 6, which is enough for me.

andypost’s picture

carlescliment’s picture

Issue tags: -Performance, -caching

cache_path.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Performance, +caching

The last submitted patch, 456824-drupal-lookup-path-slow-query.patch, failed testing.

dgtlmoon’s picture

subscribing

yonailo’s picture

+1 subscribing

dgtlmoon’s picture

How does pressflow implement it? can we use their patch?

VladSavitsky’s picture

...

VladSavitsky’s picture

I have modified patch to be used for Drupal 6.20 and have tested with Pressflow 6.20.97.

PS. I've done a dummy work. This patch is not necessary for Pressflow because Pressflow has modules path_alias_cache and cookie_cache_bypass modules. Just enable them and have fun.

mrfelton’s picture

Status: Needs work » Needs review
FileSize
10.87 KB

I'm not sure what was going on with the patch in #71 but it seems to add a new function called _drupal_lookup_path_direct which is not used anywhere. Here is yet another updated patch for Drupal 6.22, based on the patch in #59.

mrfelton’s picture

I guess this is never going to get committed, so whatever. But, we still have sites running on Drupal 6 that we can't or dont want to migrate to Pressflow for whatever reason, and this is a considerable performance hit on some sites. Attach patch is updated to work with latest D6 in git and resolves a slight error in the previous patch that came about due to the fact that this bloody patch has been re-rolled so many times. It's hard to get it right every time.

dgtlmoon’s picture

@mrfelton : great work!

ispboy’s picture

mark~

Benq’s picture

ajayg’s picture

Looks something was missing in your step as the patch verification system doesn't seems to be invoked on your patch.

ChrisLaFrancis’s picture

substring_index() in drupal_path_alias_whitelist_rebuild() is MySQL-specific. I'm using split_part() with PostgreSQL as a workaround, but a database-independent method would be nice.

rahulbile’s picture

Patch against 6.28.

ajayg’s picture

Nice to see this getting passed against 6.28. Time to commit?

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.

catch’s picture

Version: 6.x-dev » 7.x-dev
Status: Closed (outdated) » Fixed

Moving this back to 'fixed' against Drupal 7 since it was committed there.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.