Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
See http://drupal4hu.com/node/124 for a description. Tables are:
CREATE TABLE `path_alias_statistics_map` (
`page_path` varchar(255) NOT NULL,
`language` varchar(255) NOT NULL,
`path` varchar(255) NOT NULL,
`alias` varchar(255) NOT NULL
);
CREATE TABLE `path_alias_map` (
`page_path` varchar(255) NOT NULL,
`language` varchar(255) NOT NULL,
`path` varchar(255) NOT NULL,
`alias` varchar(255) NOT NULL,
`counter` int(10) unsigned NOT NULL default '0'
) ;
I still need to add a timestamp to the statistics table and zap only entries older than a given timeframe instead of throwing out the whole table. I still need to check whether it works at all and need to write the init part in path.inc... but I tested the queries in cron and they were rather promising. I just dump this here -- release early, release often :)
Comment | File | Size | Author |
---|---|---|---|
#66 | adaptive_cache-223075-66.patch | 27.32 KB | swentel |
#52 | adaptive_cache-223075-52.patch | 26.37 KB | mfer |
#49 | adaptive_cache.patch | 25.73 KB | chx |
#48 | adaptive_cache.patch | 25.71 KB | chx |
#47 | adaptive_cache-223075-47.patch | 21.14 KB | mfer |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commentedsubscribe
Comment #2
Wim LeersSubscribing.
Sure as hell more advanced than my blacklist/whitelist approach: http://drupal.org/node/106559 :)
Comment #3
neclimdulNice, I've always liked an idea like this. The logging is the tricky part but I think the benefit makes it worth investigating.
Comment #4
catchsubscribing. Does this kill the blacklist/whitelist or are they complementary?
Comment #5
BioALIEN CreditAttribution: BioALIEN commentedSubscribing to the sound of more caching!
Comment #6
chx CreditAttribution: chx commentedSure can live besides a white/backlist. All this does is collect some statistics and then grabs the URLs that appear on most loads of that page and preloads them in one query. Our model here is a node page with a user block that has changing usernames in it or the forum block with everchanging recent posts links etc. The node links , the node author link etc etc will be present on every load but those dynamic links will appear only on a tiny fraction of the pages. Edit: not pages, but loads of one single page.
Comment #7
Wim Leersblacklist/whitelist was initially pushed back because it would be too hard for users to configure, whereas this works completely autonomously.
Let's first see what the impact of this patch will be, and then consider porting the blacklist/whitelist patch to D7.
Comment #8
Crell CreditAttribution: Crell commentedI can't believe I missed this. subscribing, note to self, need to test.
Comment #9
m3avrck CreditAttribution: m3avrck commentedsubscribing
Comment #10
treksler CreditAttribution: treksler commentedHey
I think the tables should be prefixed with url_alias_ eg:(url_alias_map, url_alias_map_log)
or something like that, so they show up right near the other path module tables (url_alias, url_alias_extra)
Secondly,
what's the idea behind using variable names and table field names like $path and $alias,
we already have $src, and $dst and we should use them for consistency, right?
OR
we should change 'src' to 'path' and 'dst' to 'alias' and 'url_alias' to 'path_alias' everywhere else as well
PS
'footer' is not really a lookup action, it should be called 'log' or 'record' or something like that
PPS
path_alias_statistics_src??
Comment #11
chx CreditAttribution: chx commentedYay! A patch after so long! Untested.
Comment #12
dami CreditAttribution: dami commentedsubscribe
Comment #13
R.Muilwijk CreditAttribution: R.Muilwijk commentedNew patch..
- Applies to HEAD
- Fixes schema in install script
- removes the PRACTICAL_DATABASE define which is not needed in my opinion. Change back if you think differently.
- Reworks the .test file to run the current available tests for the different caching types. I get 5 failures in the PATH_CACHE_ALL. Haven't looked into why yet. So for now code needs work.
Comment #14
p.brouwers CreditAttribution: p.brouwers commentedMuilwijk, I'll have a look at those failures somewhere this week.
Comment #15
smk-ka CreditAttribution: smk-ka commentedSince we're successfully using a backport of this code in-production on D5 (hard-wired to adaptive caching), I've tried to port it forth for HEAD.
Issues identified:
1. We cannot immediately load the cache, since the first call to drupal_get_path() is a 'source' op triggered by drupal_init_path() to determine the initial value of $_GET['q']. However, since $cid relies on $_GET['q'], we need to lazy-load the cache upon the first 'alias' op.
2. On op 'wipe' we also need to clear the cached paths, otherwise we would be loading the old cached entries on the next page request. However, the currently implemented solution to just wipe anything isn't optimal.
3. The cron queries were broken.
4. Inserting mappings into the statistics table one by one pretty much neutralizes the effect that we actually wanted to achieve. Implemented an extended INSERT that pushes all paths through a single query.
To-dos:
1. Wiping the cache_path table is sub-optimal.
2. The extended INSERT clause can exceed MySQL's max_packet_length if there is a huge number of mappings on a page. The code should break down queries to a reasonable length and execute multiple INSERTs if required.
Patch is untested.
Comment #16
jaydub CreditAttribution: jaydub commentedsubscribe
Comment #17
catchLooking at the patch, adding the extra ops to drupal_lookup_path() makes it a very, very long function. Could we de-op it into drupal_lookup_path_alias() drupal_lookup_path_system() etc.?
Comment #18
smk-ka CreditAttribution: smk-ka commentedUsing standardized static caching? Or just factoring out the actions, passing $map around the old-style way (for now)?
Comment #19
catchI meant just factoring out the actions, but yeah this would also be a candidate for the static cache facility.
Comment #20
chx CreditAttribution: chx commentedAdding extended INSERTs are the task of the DB layer IMNSHO. Yes, DELAYED is also the task of that but I think we can add that (much) later should the DBTNG not get in. I have copied your cron and wipe fixes (thanks!) but implemented a new way to not cache on the first call - namely I pass in an argument which says it's a first call. Easy :) I post the patch despite we have four failures in PATH_CACHE_ALL -- I hope I can fix it later but, release early, release often.
Comment #21
andreiashu CreditAttribution: andreiashu commentedsubscribe
Comment #22
chx CreditAttribution: chx commentedYeah, easy fix: we needed to take empty language into consideration as fallback. Path tests now pass.
Comment #23
smk-ka CreditAttribution: smk-ka commented1. The shutdown action has got lost.
2. Clearing the cache is actually not enough... In fact, we also need to get rid of changed or deleted aliases from the database tables, otherwise they will simply be re-read on the next page request. Does path module implement actions?
Comment #24
catchJust tested again and I get three fails with "every alias cached" - all 404s.
Comment #25
chx CreditAttribution: chx commentedThe wipe action definitely needs fine graining in a followup patch. I added back the footer call to the drupal_page_footer -- not shutdown action because this way you can write a script which decides to skip the shutdown actions for perf reasons. With register_shutdown_function you can't.
Comment #26
catchHere's a re-roll with the schema and updateimplemented for cache_path. This now passes all the simpletests, and manual testing is fine too.
I think the caching radios should probably be moved to /admin/settings/performance instead of admin/build/path/settings - will probably re-roll with that a bit later on.
Comment #27
catchAnd with the form moved to admin/settings/performance plus a few string tweaks.
Comment #28
Robin Monks CreditAttribution: Robin Monks commentedApplied with some offset (fixed in attached patch), tested and worked OK.
Simpletest: Path: 180 passes, 0 fails, 0 exceptions
RTBC?
Robin
Comment #29
chx CreditAttribution: chx commentedDBTNG is in. Let's fix the insert first.
Comment #30
chx CreditAttribution: chx commentedLike this.
Comment #31
chx CreditAttribution: chx commentedPatch description. The patch collects a statistics of what aliases appear on a page. If an alias appears frequently enough (60% but overrideable) then that alias is decided to be constant and these are written into a table and then later cached per $_GET['q'] . Testing the patch is not trivial as merely setting the path strategy will not show or change much as the path_alias_map table will be empty. You can, however, generate content, load a few pages, run cron and see that path_alias_map is populated and once that is true, benchmarking would make sense.
Comment #32
meba CreditAttribution: meba commentedsubscribing
Comment #33
Crell CreditAttribution: Crell commentedchx: Would this patch apply to D6? If so, we could possibly install it experimentally on some production D6 sites and see what happens.
Comment #34
eli CreditAttribution: eli commentedSubscribing.
Curious to see how this plays out on benchmarks simulating real site use.
Comment #35
hass CreditAttribution: hass commentedSubscribing
Comment #36
agentrickardWhen running cron with the patch, I get this:
Something in the path_cron() function -- maybe interaction of the temporary tables with DBTNG?
Comment #37
andypostSubscribing
Comment #38
mfer CreditAttribution: mfer commentedSubscribing
Comment #39
chx CreditAttribution: chx commented#325895: [DBTNG + simpletest] queryTemporary is broken and need simpletest test case . patch rerolled against HEAD.
Comment #40
mfer CreditAttribution: mfer commentedIn the function definition for function drupal_lookup_path() the argument $first_call is missing. This is in path.inc.
should be something like
Comment #41
chx CreditAttribution: chx commentedExcellent catch!
Comment #42
RobLoachSmall tweaks and the form fix.
Comment #43
RobLoachForgot to report "URL contains the path alias." is reporting a failure.
Comment #44
keith.smith CreditAttribution: keith.smith commentedThere are several (three or four) comments in here that do not end with periods.
I'm not sure we capitalize things like this.
Appear to infrequently change? Why not just "...that change infrequently"?
This is pretty technical (which is probably OK), but could possibly be simpler. How about:
"Path alias caching significantly reduces the number of database queries on each page. Using the page alias cache in conjunction with the page cache will increase performance for authenticated users."
Comment #45
chx CreditAttribution: chx commentedAdded a path.install file.
Comment #46
chx CreditAttribution: chx commentedOpsie crosspost. Did not want to CNR.
Comment #47
mfer CreditAttribution: mfer commentedThis patch fixes path_schema so it now calls the schema 'cache_path' rather than 'cache_page', returns $schema, and doesn't add all of the system schema to this one. Added hook_uninstall.
There is still a test that fails.
Comment #48
chx CreditAttribution: chx commentedThat test fails but various bits and pieces were lost durign the rerolls most improtant most importantly the path schema and the new path tests. Interestingly, still that one single path fails. I also moved some of the code into a new utility function.
Comment #49
chx CreditAttribution: chx commented$action != 'wipe')
was missing so that adding the first alias was not taken into account. All tests pass.Comment #50
mfer CreditAttribution: mfer commentedPatch applies, works correctly though drupal install, and passes all tests. When set to 'Cache all aliases on each page (not recommended for sites with a large number of aliases)' I saw it caching all my aliases. When set to 'Cache aliases that appear to change infrequently' I could not get any of my aliases to be cached. The rows were appearing in the cache_path table but the data in the serialized array was always empty. I have not had a chance to look through the code.
Is there something special I need to do to test the adaptive caching part of this patch?
Comment #51
mfer CreditAttribution: mfer commentedAh, should have read #31. I'll retest the adaptive caching later tonight.
PATH_CACHE_ADAPTIVE requires cron to work. Shouldn't there be a note on the form where you enable about this requirement?
Comment #52
mfer CreditAttribution: mfer commentedIn drupal_lookup_path and the footer operation it was calling ->delayed() on an insert query. Should have been ->delay(). This patch fixes it and statistics are new being recorded.
Comment #53
mfer CreditAttribution: mfer commentedcache_path had stale information in it, even after a crop run. cache_path was keeping the old version. It had an expires of 0. When I truncated the cache_path table it started caching the paths. But, not exactly as expected. If I went to node/1 it cached the path. If I went to test1 which is the alias of node/1 nothing is cached.
Comment #54
catchI like this patch but I don't like having two choices for the admin - how much is a 'large number'? 100, 2,000?
If possible we should hide strategy that's used completely if possible - probably by doing a COUNT on path aliases each cron run and switching the strategy based on that. If we can pick a hard number of 500 or whatever over which in-memory caching might cause issues, then hard code it to that, if it's too restrictive make it a variable which can be altered in settings.php or a contrib module. But as it is there's a lot of margin for user error and confusion.
Comment #55
mfer CreditAttribution: mfer commented@catch - It's really hard to pick a hard number because so much of that depends on your setup. In some hosting environments it might be 500 and in others it might be 3,000. If we did set a limit it would need to be configurable.
In either case we need some good help documentation outlining when/how to use these and that cron is needed for adaptive caching to work.
Comment #56
catchmfer: In that case I think we need two variables, naming is random but hopefully gets the point across:
path_alias_cache_strategy_threshold - variable is set by the COUNT on {path_alias} on cron runs, defaults to N (1000?)
path_alias_cache_strategy - off, all, adaptive - based on the combination of the on/off setting on the performance page and the strategy threshold.
We don't offer any UI for path_alias_cache_strategy_threshold, but you can change it in settings.php or contrib if you want.
Users already get warnings if cron isn't running, so I'm not sure we need to note that on the performance page - if cron isn't running your watchdog table will fill up and that's much more of a performance hit than not having path alias caching.
Should the help text for caching be in hook_help() or the handbook? I don't remember seeing all that much for CSS and JS aggregation, or block caching either - but it'd be good to fix that.
We also need a way to document variables that aren't triggered by a settings form - but that's not a job for this patch.
Comment #65
basicmagic.net CreditAttribution: basicmagic.net commentedsubscribe
Comment #66
swentel CreditAttribution: swentel commentedChasing HEAD, nothing changed so far.
Note: We should test upgrade from D6 to D7 too, as I'm quite sure if path is enabled, path_alias_map & path_alias_statistics_map won't be created. Anyway, initial tests dropped a lot of db queries, so this has a lot of potential!
Comment #67
catchAll tests pass too. So it seems like we need to do a bit of dbtng update, resolve the UI issues, and see about that caching bug reported earlier.
Comment #68
lilou CreditAttribution: lilou commented@swentel : Schema descriptions are no longer translated
Comment #69
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedpath.inc has been converted to dbtng.
Comment #70
chx CreditAttribution: chx commentedComment #71
eaton CreditAttribution: eaton commentedIt seems that this would be a good match for http://drupal.org/node/320132 - making path.inc swappable, and providing an alternate "adaptive" implementation in a separate .inc file, would be a reasonable approach with less arguing about where cutoffs should be.
Comment #72
Crell CreditAttribution: Crell commentedSwappable include files break the registry.
Note to self: Just damn well write the handlers patch.
Comment #73
eaton CreditAttribution: eaton commentedHmmmm. I didn't realize that - has the registry patch broken swappable cache.inc?
Comment #74
Anonymous (not verified) CreditAttribution: Anonymous commentedadding tag.
Comment #75
Crell CreditAttribution: Crell commented@Eaton: The registry right now contains a hack to avoid it. The problem is that if the same function/class name appears twice in any file parsed by the registry, we end up with an SQL duplicate key insert error. To avoid that, the registry rebuild process uses a merge query rather than an insert query. That means we are using 1 query per function/class record (which comes out to somewhere north of 1000 merge queries for core alone, which for the degenerate driver is actually 2000 queries because it's a multi-step process) rather than being able to optimize with a single (or a small number of) multi-insert statement. It also means that we end up with essentially garbage data in the registry for those function/class names. Just as api.drupal.org right now has no knowledge of the MySQL version of db_query(), the registry in D7 right now has no knowledge of cache_set() except for whichever version of cache.inc gets parsed second.
For the cache that isn't causing a problem yet, because it initializes before the registry does. For anything that we want to not have to initialize manually (thereby bypassing the registry completely), swappable include files simply do not work. That is why classes exist in the first place. :-)
Comment #76
Flying Drupalist CreditAttribution: Flying Drupalist commentedCan anyone backport this patch to 6.x? Even without being committed this will make a large difference on many sites!
Comment #77
CorniI CreditAttribution: CorniI commentedsubscribe
Comment #78
michtoen CreditAttribution: michtoen commentedsubscribe
Comment #79
BerdirAs I already mentioned in an another issue.. drupal_static() allows to handle the caching *outside* of that function, without touching it.
So, if we just want to have an alternative caching, and not an alternative implementation, we don't necessarly need to make this file swap-able.
Comment #80
catch#456824: drupal_lookup_path() speedup - cache system paths per page. just got in - which stole the basic idea from here but didn't actually cache aliases. So closing out as duplicate.
Comment #81
dgtlmoon CreditAttribution: dgtlmoon commented@catch: so this is in 7.x ?