See http://drupal4hu.com/node/124 for a description. Tables are:

CREATE TABLE `path_alias_statistics_map` (
  `page_path` varchar(255) NOT NULL,
  `language` varchar(255) NOT NULL,
  `path` varchar(255) NOT NULL,
  `alias` varchar(255) NOT NULL
);
 CREATE TABLE `path_alias_map` (
  `page_path` varchar(255) NOT NULL,
  `language` varchar(255) NOT NULL,
  `path` varchar(255) NOT NULL,
  `alias` varchar(255) NOT NULL,
  `counter` int(10) unsigned NOT NULL default '0'
) ;

I still need to add a timestamp to the statistics table and zap only entries older than a given timeframe instead of throwing out the whole table. I still need to check whether it works at all and need to write the init part in path.inc... but I tested the queries in cron and they were rather promising. I just dump this here -- release early, release often :)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pwolanin’s picture

subscribe

Wim Leers’s picture

Subscribing.

Sure as hell more advanced than my blacklist/whitelist approach: http://drupal.org/node/106559 :)

neclimdul’s picture

Nice, I've always liked an idea like this. The logging is the tricky part but I think the benefit makes it worth investigating.

catch’s picture

subscribing. Does this kill the blacklist/whitelist or are they complementary?

BioALIEN’s picture

Subscribing to the sound of more caching!

chx’s picture

Sure can live besides a white/backlist. All this does is collect some statistics and then grabs the URLs that appear on most loads of that page and preloads them in one query. Our model here is a node page with a user block that has changing usernames in it or the forum block with everchanging recent posts links etc. The node links , the node author link etc etc will be present on every load but those dynamic links will appear only on a tiny fraction of the pages. Edit: not pages, but loads of one single page.

Wim Leers’s picture

blacklist/whitelist was initially pushed back because it would be too hard for users to configure, whereas this works completely autonomously.
Let's first see what the impact of this patch will be, and then consider porting the blacklist/whitelist patch to D7.

Crell’s picture

I can't believe I missed this. subscribing, note to self, need to test.

m3avrck’s picture

subscribing

treksler’s picture

Hey

I think the tables should be prefixed with url_alias_ eg:(url_alias_map, url_alias_map_log)
or something like that, so they show up right near the other path module tables (url_alias, url_alias_extra)

Secondly,
what's the idea behind using variable names and table field names like $path and $alias,
we already have $src, and $dst and we should use them for consistency, right?

OR
we should change 'src' to 'path' and 'dst' to 'alias' and 'url_alias' to 'path_alias' everywhere else as well

PS
'footer' is not really a lookup action, it should be called 'log' or 'record' or something like that

PPS
path_alias_statistics_src??

chx’s picture

Status: Needs work » Needs review
FileSize
11.68 KB

Yay! A patch after so long! Untested.

dami’s picture

subscribe

R.Muilwijk’s picture

Status: Needs review » Needs work
FileSize
19.42 KB

New patch..

- Applies to HEAD
- Fixes schema in install script
- removes the PRACTICAL_DATABASE define which is not needed in my opinion. Change back if you think differently.
- Reworks the .test file to run the current available tests for the different caching types. I get 5 failures in the PATH_CACHE_ALL. Haven't looked into why yet. So for now code needs work.

p.brouwers’s picture

Muilwijk, I'll have a look at those failures somewhere this week.

smk-ka’s picture

FileSize
21.13 KB

Since we're successfully using a backport of this code in-production on D5 (hard-wired to adaptive caching), I've tried to port it forth for HEAD.

Issues identified:
1. We cannot immediately load the cache, since the first call to drupal_get_path() is a 'source' op triggered by drupal_init_path() to determine the initial value of $_GET['q']. However, since $cid relies on $_GET['q'], we need to lazy-load the cache upon the first 'alias' op.
2. On op 'wipe' we also need to clear the cached paths, otherwise we would be loading the old cached entries on the next page request. However, the currently implemented solution to just wipe anything isn't optimal.
3. The cron queries were broken.
4. Inserting mappings into the statistics table one by one pretty much neutralizes the effect that we actually wanted to achieve. Implemented an extended INSERT that pushes all paths through a single query.

To-dos:
1. Wiping the cache_path table is sub-optimal.
2. The extended INSERT clause can exceed MySQL's max_packet_length if there is a huge number of mappings on a page. The code should break down queries to a reasonable length and execute multiple INSERTs if required.

Patch is untested.

jaydub’s picture

subscribe

catch’s picture

Looking at the patch, adding the extra ops to drupal_lookup_path() makes it a very, very long function. Could we de-op it into drupal_lookup_path_alias() drupal_lookup_path_system() etc.?

smk-ka’s picture

Using standardized static caching? Or just factoring out the actions, passing $map around the old-style way (for now)?

catch’s picture

I meant just factoring out the actions, but yeah this would also be a candidate for the static cache facility.

chx’s picture

Status: Needs work » Needs review
FileSize
17.15 KB

Adding extended INSERTs are the task of the DB layer IMNSHO. Yes, DELAYED is also the task of that but I think we can add that (much) later should the DBTNG not get in. I have copied your cron and wipe fixes (thanks!) but implemented a new way to not cache on the first call - namely I pass in an argument which says it's a first call. Easy :) I post the patch despite we have four failures in PATH_CACHE_ALL -- I hope I can fix it later but, release early, release often.

andreiashu’s picture

subscribe

chx’s picture

FileSize
17.17 KB

Yeah, easy fix: we needed to take empty language into consideration as fallback. Path tests now pass.

smk-ka’s picture

Status: Needs review » Needs work

1. The shutdown action has got lost.
2. Clearing the cache is actually not enough... In fact, we also need to get rid of changed or deleted aliases from the database tables, otherwise they will simply be re-read on the next page request. Does path module implement actions?

catch’s picture

Just tested again and I get three fails with "every alias cached" - all 404s.

chx’s picture

Status: Needs work » Needs review
FileSize
18.06 KB

The wipe action definitely needs fine graining in a followup patch. I added back the footer call to the drupal_page_footer -- not shutdown action because this way you can write a script which decides to skip the shutdown actions for perf reasons. With register_shutdown_function you can't.

catch’s picture

Here's a re-roll with the schema and updateimplemented for cache_path. This now passes all the simpletests, and manual testing is fine too.

I think the caching radios should probably be moved to /admin/settings/performance instead of admin/build/path/settings - will probably re-roll with that a bit later on.

catch’s picture

FileSize
21.87 KB

And with the form moved to admin/settings/performance plus a few string tweaks.

Robin Monks’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
21.53 KB

Applied with some offset (fixed in attached patch), tested and worked OK.
Simpletest: Path: 180 passes, 0 fails, 0 exceptions

RTBC?

Robin

chx’s picture

Status: Reviewed & tested by the community » Needs work

DBTNG is in. Let's fix the insert first.

chx’s picture

Status: Needs work » Needs review
FileSize
20.28 KB

Like this.

chx’s picture

Patch description. The patch collects a statistics of what aliases appear on a page. If an alias appears frequently enough (60% but overrideable) then that alias is decided to be constant and these are written into a table and then later cached per $_GET['q'] . Testing the patch is not trivial as merely setting the path strategy will not show or change much as the path_alias_map table will be empty. You can, however, generate content, load a few pages, run cron and see that path_alias_map is populated and once that is true, benchmarking would make sense.

meba’s picture

subscribing

Crell’s picture

chx: Would this patch apply to D6? If so, we could possibly install it experimentally on some production D6 sites and see what happens.

eli’s picture

Subscribing.

Curious to see how this plays out on benchmarks simulating real site use.

hass’s picture

Subscribing

agentrickard’s picture

Status: Needs review » Needs work

When running cron with the patch, I get this:

[03-Oct-2008 09:15:04] PHP Fatal error:  Unsupported operand types in /Applications/MAMP/htdocs/drupal-cvs/includes/database/database.inc on line 325

Something in the path_cron() function -- maybe interaction of the temporary tables with DBTNG?

andypost’s picture

Subscribing

mfer’s picture

Subscribing

chx’s picture

Status: Needs work » Needs review
FileSize
20.14 KB
mfer’s picture

Status: Needs review » Needs work

In the function definition for function drupal_lookup_path() the argument $first_call is missing. This is in path.inc.

  function drupal_lookup_path($action, $path = '', $path_language = '') {

should be something like

  function drupal_lookup_path($action, $path = '', $path_language = '', $first_call = ????????) {
chx’s picture

Status: Needs work » Needs review
FileSize
20.5 KB

Excellent catch!

RobLoach’s picture

FileSize
21.3 KB

Small tweaks and the form fix.

RobLoach’s picture

Status: Needs review » Needs work

Forgot to report "URL contains the path alias." is reporting a failure.

keith.smith’s picture

There are several (three or four) comments in here that do not end with periods.

+ * Alters the System Performance Settings form.

I'm not sure we capitalize things like this.

+    PATH_CACHE_ADAPTIVE => t('Cache aliases that appear to change infrequently'),

Appear to infrequently change? Why not just "...that change infrequently"?

+    '#description' => t('Path alias caching can significantly reduce the number of database queries on each page load. If the page cache is also enabled, performance increases from enabling the path alias cache will mainly benefit authenticated users.'),

This is pretty technical (which is probably OK), but could possibly be simpler. How about:

"Path alias caching significantly reduces the number of database queries on each page. Using the page alias cache in conjunction with the page cache will increase performance for authenticated users."

chx’s picture

Status: Needs work » Needs review
FileSize
20.31 KB

Added a path.install file.

chx’s picture

Status: Needs review » Needs work

Opsie crosspost. Did not want to CNR.

mfer’s picture

This patch fixes path_schema so it now calls the schema 'cache_path' rather than 'cache_page', returns $schema, and doesn't add all of the system schema to this one. Added hook_uninstall.

There is still a test that fails.

chx’s picture

FileSize
25.71 KB

That test fails but various bits and pieces were lost durign the rerolls most improtant most importantly the path schema and the new path tests. Interestingly, still that one single path fails. I also moved some of the code into a new utility function.

chx’s picture

Status: Needs work » Needs review
FileSize
25.73 KB
  if (!$count && $action != 'wipe') {
    return FALSE;
  }

$action != 'wipe') was missing so that adding the first alias was not taken into account. All tests pass.

mfer’s picture

Patch applies, works correctly though drupal install, and passes all tests. When set to 'Cache all aliases on each page (not recommended for sites with a large number of aliases)' I saw it caching all my aliases. When set to 'Cache aliases that appear to change infrequently' I could not get any of my aliases to be cached. The rows were appearing in the cache_path table but the data in the serialized array was always empty. I have not had a chance to look through the code.

Is there something special I need to do to test the adaptive caching part of this patch?

mfer’s picture

Ah, should have read #31. I'll retest the adaptive caching later tonight.

PATH_CACHE_ADAPTIVE requires cron to work. Shouldn't there be a note on the form where you enable about this requirement?

mfer’s picture

In drupal_lookup_path and the footer operation it was calling ->delayed() on an insert query. Should have been ->delay(). This patch fixes it and statistics are new being recorded.

mfer’s picture

Status: Needs review » Needs work

cache_path had stale information in it, even after a crop run. cache_path was keeping the old version. It had an expires of 0. When I truncated the cache_path table it started caching the paths. But, not exactly as expected. If I went to node/1 it cached the path. If I went to test1 which is the alias of node/1 nothing is cached.

catch’s picture

I like this patch but I don't like having two choices for the admin - how much is a 'large number'? 100, 2,000?

If possible we should hide strategy that's used completely if possible - probably by doing a COUNT on path aliases each cron run and switching the strategy based on that. If we can pick a hard number of 500 or whatever over which in-memory caching might cause issues, then hard code it to that, if it's too restrictive make it a variable which can be altered in settings.php or a contrib module. But as it is there's a lot of margin for user error and confusion.

mfer’s picture

@catch - It's really hard to pick a hard number because so much of that depends on your setup. In some hosting environments it might be 500 and in others it might be 3,000. If we did set a limit it would need to be configurable.

In either case we need some good help documentation outlining when/how to use these and that cron is needed for adaptive caching to work.

catch’s picture

mfer: In that case I think we need two variables, naming is random but hopefully gets the point across:

path_alias_cache_strategy_threshold - variable is set by the COUNT on {path_alias} on cron runs, defaults to N (1000?)
path_alias_cache_strategy - off, all, adaptive - based on the combination of the on/off setting on the performance page and the strategy threshold.

We don't offer any UI for path_alias_cache_strategy_threshold, but you can change it in settings.php or contrib if you want.

Users already get warnings if cron isn't running, so I'm not sure we need to note that on the performance page - if cron isn't running your watchdog table will fill up and that's much more of a performance hit than not having path alias caching.

Should the help text for caching be in hook_help() or the handbook? I don't remember seeing all that much for CSS and JS aggregation, or block caching either - but it'd be good to fix that.

We also need a way to document variables that aren't triggered by a settings form - but that's not a job for this patch.

basicmagic.net’s picture

subscribe

swentel’s picture

Chasing HEAD, nothing changed so far.
Note: We should test upgrade from D6 to D7 too, as I'm quite sure if path is enabled, path_alias_map & path_alias_statistics_map won't be created. Anyway, initial tests dropped a lot of db queries, so this has a lot of potential!

catch’s picture

All tests pass too. So it seems like we need to do a bit of dbtng update, resolve the UI issues, and see about that caching bug reported earlier.

lilou’s picture

killes@www.drop.org’s picture

path.inc has been converted to dbtng.

chx’s picture

eaton’s picture

It seems that this would be a good match for http://drupal.org/node/320132 - making path.inc swappable, and providing an alternate "adaptive" implementation in a separate .inc file, would be a reasonable approach with less arguing about where cutoffs should be.

Crell’s picture

Swappable include files break the registry.

Note to self: Just damn well write the handlers patch.

eaton’s picture

Hmmmm. I didn't realize that - has the registry patch broken swappable cache.inc?

Anonymous’s picture

Issue tags: +Performance

adding tag.

Crell’s picture

@Eaton: The registry right now contains a hack to avoid it. The problem is that if the same function/class name appears twice in any file parsed by the registry, we end up with an SQL duplicate key insert error. To avoid that, the registry rebuild process uses a merge query rather than an insert query. That means we are using 1 query per function/class record (which comes out to somewhere north of 1000 merge queries for core alone, which for the degenerate driver is actually 2000 queries because it's a multi-step process) rather than being able to optimize with a single (or a small number of) multi-insert statement. It also means that we end up with essentially garbage data in the registry for those function/class names. Just as api.drupal.org right now has no knowledge of the MySQL version of db_query(), the registry in D7 right now has no knowledge of cache_set() except for whichever version of cache.inc gets parsed second.

For the cache that isn't causing a problem yet, because it initializes before the registry does. For anything that we want to not have to initialize manually (thereby bypassing the registry completely), swappable include files simply do not work. That is why classes exist in the first place. :-)

Flying Drupalist’s picture

Can anyone backport this patch to 6.x? Even without being committed this will make a large difference on many sites!

CorniI’s picture

subscribe

michtoen’s picture

subscribe

Berdir’s picture

As I already mentioned in an another issue.. drupal_static() allows to handle the caching *outside* of that function, without touching it.

So, if we just want to have an alternative caching, and not an alternative implementation, we don't necessarly need to make this file swap-able.

catch’s picture

Status: Needs work » Closed (duplicate)

#456824: drupal_lookup_path() speedup - cache system paths per page. just got in - which stole the basic idea from here but didn't actually cache aliases. So closing out as duplicate.

dgtlmoon’s picture

@catch: so this is in 7.x ?