Why?
To reduce the number of queries per page that are completely useless, I've written this optimization.

Configuration
You get a new config page at admin/build/path/skiplist to modify the skiplist. The paths listed here can include wildcards as well as '<front>', in the exact same way you can configure on what pages a block should (not) be listed.

What has changed to make this work?
The skiplist variable is stored in the 'variable' table
drupal_lookup_path() (in includes/path.inc) has been modified: it uses a mechanism comparable to the one used in block_list(), the regexp part is almost a 1:1 copy of the code in that function

path.module has been modified, to allow you to configure the skiplist

Files: 
CommentFileSizeAuthor
#235 106559-path_alias_whitelist-D6.patch4.09 KBsmk-ka
#229 106559-path_alias_whitelist-D6.patch4.09 KBsmk-ka
#220 106559-get-the-naming-right.patch1.18 KBJosh Waihi
Failed: Failed to apply patch.
[ View ]
#211 106559-alias-whitelist-automatic-d6.patch2.13 KBkbahey
#209 106559-alias-whitelist-automatic.patch7.24 KBDamien Tournoud
Failed: Failed to apply patch.
[ View ]
#208 106559-alias-whitelist-automatic.patch7.22 KBDamien Tournoud
Failed: Failed to apply patch.
[ View ]
#205 106559-alias-whitelist-automatic.patch7.29 KBDamien Tournoud
Failed: Failed to apply patch.
[ View ]
#203 path.patch7.54 KBcatch
Failed: Failed to apply patch.
[ View ]
#197 106559-alias-whitelist-automatic-alternative.patch9.06 KBDamien Tournoud
Failed: Failed to apply patch.
[ View ]
#196 path.patch6.1 KBcatch
Failed: Failed to apply patch.
[ View ]
#193 path_whitelist.patch5.06 KBchx
Failed: Failed to apply patch.
[ View ]
#190 path_whitelist.patch5.07 KBchx
Failed: 11509 passes, 5 fails, 0 exceptions
[ View ]
#189 106559-alias-whitelist-automatic.patch3.04 KBkbahey
Failed: Failed to apply patch.
[ View ]
#188 path_whitelist.patch3.18 KBchx
Failed: Failed to apply patch.
[ View ]
#181 106559-alias-whitelist-automatic.patch2.6 KBkbahey
Failed: 11508 passes, 5 fails, 10 exceptions
[ View ]
#176 106559-alias-whitelist-automatic.patch1.95 KBkbahey
Failed: Failed to apply patch.
[ View ]
#171 106559-alias-whitelist-automatic.patch2.43 KBkbahey
Failed: Failed to apply patch.
[ View ]
#171 106559-alias-whitelist-automatic-d6.patch2.11 KBkbahey
#168 whitelist.patch4.58 KBcatch
Failed: Failed to apply patch.
[ View ]
#166 whitelist.patch4.7 KBcatch
Failed: Failed to apply patch.
[ View ]
#151 blacklist.patch4.8 KBcatch
Failed: Failed to apply patch.
[ View ]
#150 blacklist.patch3.89 KBcatch
Failed: Failed to apply patch.
[ View ]
#149 blacklist.patch3.52 KBcatch
Failed: Failed to apply patch.
[ View ]
#144 path.inc_.patch4.04 KBbcn
Failed: Failed to apply patch.
[ View ]
#137 lookup_whitelist.patch8.2 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch lookup_whitelist_0.patch.
[ View ]
#136 lookup_whitelist.patch8.01 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch lookup_whitelist.patch.
[ View ]
#131 drupal_lookup_path_optimization_2.patch7.96 KBJaza
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal_lookup_path_optimization_2.patch.
[ View ]
#130 drupal_lookup_path_optimization_106559_2007101401.patch5.18 KBhass
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal_lookup_path_optimization_106559_2007101401.patch.
[ View ]
#124 path_inc_opt_2.patch1.38 KBpwolanin
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path_inc_opt_2.patch.
[ View ]
#115 path-whitelist-4.patch4.89 KBkbahey
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path-whitelist-4.patch.
[ View ]
#110 path-whitelist-2.patch2.13 KBkbahey
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path-whitelist-2.patch.
[ View ]
#106 path-whitelist.patch1.84 KBkbahey
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path-whitelist.patch.
[ View ]
#92 optimize_drupal_lookup_no_ui_0.patch5.87 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_no_ui_0.patch.
[ View ]
#90 optimize_drupal_lookup_no_ui.patch5.9 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_no_ui.patch.
[ View ]
#40 optimize_drupal_lookup.patch9.81 KBchx
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup.patch.
[ View ]
#37 optimize_drupal_lookup_patch_4.patch9.37 KBchx
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_4.patch.
[ View ]
#31 optimize_drupal_lookup_patch_3.patch9.77 KBjoshk
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_3.patch.
[ View ]
#30 optimize_drupal_lookup_patch_2.patch8.08 KBchx
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_2.patch.
[ View ]
#28 optimize_drupal_lookup_patch_1.patch9.19 KBchx
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_1.patch.
[ View ]
#26 optimize_drupal_lookup_patch_0.patch10.55 KBjoshk
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_0.patch.
[ View ]
#24 optimize_drupal_lookup_patch.patch8.87 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch.patch.
[ View ]
#8 optimize_drupal_lookup_patch5_0.patch5.3 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch5_0.patch.
[ View ]
#5 optimize_drupal_lookup_patch5.patch5.87 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch5.patch.
[ View ]
#4 optimize_drupal_lookup_patch4.patch5.93 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch4.patch.
[ View ]
#3 optimize_drupal_lookup_patch3_0.patch5.9 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch3_0.patch.
[ View ]
#2 optimize_drupal_lookup_patch3.patch3.72 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch3.patch.
[ View ]
#1 optimize_drupal_lookup_patch2.patch4.42 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch2.patch.
[ View ]
optimize_drupal_lookup_path.patch4.17 KBWim Leers
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_path.patch.
[ View ]

Comments

StatusFileSize
new4.42 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch2.patch.
[ View ]

Patch improved: refactored now (thanks to chx for the tip).

One more remark: the current skiplist is far from complete... please submit the ones that aren't listed here yet:

*/add
*/edit
*/load
*/outline
*/render
*/revisions
*/track
*/votes
admin
admin/*
book
comment/*
devel
devel/*
donate
filter
filter/*
lm_paypal
lm_paypal/*
logout
node
poll
profile
tracker
tracker/*
user

StatusFileSize
new3.72 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch3.patch.
[ View ]

This new version of the patch includes the following enhancements:
*refactored even further, so the list_block() module (see initial post) can use this function as well
*performance improved: instead of performing one regexp for every path in the skiplist, it will not perform just a single, larger one (this implies that the skiplist is now also stored as a string and no longer as an array in the 'variable' table)
*no more default skiplist included (as recommended by chx)

StatusFileSize
new5.9 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch3_0.patch.
[ View ]

Fixed the patch, the changes for path.module weren't included.

Version:5.x-dev» 6.x-dev
StatusFileSize
new5.93 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch4.patch.
[ View ]

Fixed a few lines that were incorrectly indented.

chx marked this patch as looking ok (through IRC), and he also talked to Dries, concluding that this patch will be included in 6.x and not in 5.x, because it's too late.

Feel free to use it in 5.x nevertheless if you need it, it doesn't break anything!

StatusFileSize
new5.87 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch5.patch.
[ View ]

chx just doesn't stop finding even more optimizations... :P

Now he noticed it'd actually be possible to cache all regexps by using a static array. Also some indenting errors fixed.

And now, I step out from the background and praise the patch in public. Yes, I gave you advice but the idea is yours and is executed very nicely. Little code is added, Drupal can be speeded up and as a side benefit, Drupal 6 will have the path matcher available for other modules to use. Nicely done.

Status:Needs review» Needs work

Change $form['skiplist'] to $form['path_skiplist'], and have path_admin_skiplist() return system_settings_form($form). Then you could remove the submit handler.

Status:Needs work» Needs review
StatusFileSize
new5.3 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch5_0.patch.
[ View ]

Thanks for the tip, RobRoy. Very handy function, system_settings_form()! The updated patch is attached.

RTBC?

Major +1 to this patch. Well done Wim Leers for the idea and of course chx and everybody else who contributed. This will bring enormous speed improvements to sites with DB bottlenecks.

Just a clarification while I'm here so this post isn't so "useless". Some of the skiplist paths that you've included in your post - will they be dependent on modules enabled? E.g. if someone doesn't have any paypay module installed, I don't see the need to skip "lm_paypal" and "lm_paypal/*" in this instance.

Thanks. And please check the last line of #2 as an answer to your question ;)

I didn't understand it first time round, but now I do. Thanks for the clarification - its indeed the right way to go as per chx's recommendation :)

+1
I think it's time to RTBC

Status:Needs review» Needs work

C'mon guys, how can this be RTBC when it doesn't even apply to HEAD.

patching file includes/path.inc
Hunk #3 succeeded at 219 (offset 4 lines).
patching file modules/block/block.module
Hunk #1 succeeded at 683 (offset 20 lines).
patching file modules/path/path.module
Hunk #1 FAILED at 60.
Hunk #2 succeeded at 132 (offset 5 lines).

"String containing patterns separated by \n, \r or \r\n." - this needs an extra initial space

I think the function would be better renamed to drupal_match_path() - it makes a bit more sense imo.

"Check if a path matches a path in the skiplist." - this is slightly misleading as the function no longer references skiplist. I suggest "Check if a path matches certain regular expressions" or something of that nature.

Once the patch is appliable it will then need to be fully tested.

This will significantly speed up drupal. please lets fix it up and resubmit.

I will reroll against HEAD first thing in the morning. About twelve hours after this post was made, for those eager to test.

i would like to see some some heavily used patterns get into the skiplist by default. how about

comment/reply*
comment/edit*
comment/delete*
admin*

Sure.

Perhaps add node/add* etc. also? Or am I missing the reason that you did not list it?

+1 on avoiding queries when possible. A benchmark is needed on a large site with lots of aliases to confirm the benefits though.

I don't see defaults at all in the patch though. As others said, we need reasonable defaults (all add and edit).

@kbahey: add these back in then?

*/add
*/edit

Here is my proposed list.

Do not include any contrib modules by default. You can include them in the read me as an example.

admin*
*/add*
*/edit
*/load
*/outline
*/render
*/revisions
*/track
admin
admin/*
book
comment/*
filter
filter/*
logout
poll
profile
tracker*
user
user/register
user/login

We need a benchmark to quantify the effect of this. I am hoping it speeds up things, but we need to confirm that.

Aren't these for devel.module, which is not core either?

*/load
*/render

Yes, you are right. Take them out.

Title:drupal_lookup_path() optimization: don't lookup paths that match a path in the skiplistdrupal_lookup_path() optimization: provide a configurable blacklist and whitelist of Drupal paths
StatusFileSize
new8.87 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch.patch.
[ View ]

@ChrisKennedy: this patch was created in January 2007, which is when Drupal 5 was still HEAD. It was valid back then, but now we're at Drupal 6.x, hence it's not anymore against HEAD.
Implemented all of your suggestions.

@moshe_weitzman: the blacklist/whitelist/no optimization options you suggested in IRC are implemented.

@kbahey: implemented the exact default blacklist you suggested.

Special thanks to chx for helping me out with my newbie problems (and especially 5.x -> 6.x problems). Updated the title.

Can we stick to path_optimization_type:

$type = variable_get('path_optimize_type', 2);

typo in settings:

'#description' => t("Blacklist: Drupal dwill only look up aliases ...

two more:

This will reduce the numer of database queries, and thus less waste of resourced!

Options should start with an uppercase character:

array(t('blacklist of Drupal paths'), t('whitelist of Drupal paths'), t('no optimization'))

Can we find a way to increase readability of the contents of this line and another way to have this code in one place only?

'#default_value' => variable_get('path_optimize_blacklist', "admin*\n*/add*\*/edit\n*/outline\n*/revisions\n*/track\nadmin\nadmin/*\nbook\ncomment/*\nfilter\nfilter/*\nlogout\npoll\nprofile\ntracker*\nuser\nuser/register\nuser/login"),

Seems like t() misses some args here:

t("Enter one page per line as Drupal paths. The '*' character is a wildcard. Example paths are %blog for the blog page and %blog-wildcard for every personal blog. %front is the front page.")

Status:Needs work» Needs review
StatusFileSize
new10.55 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_0.patch.
[ View ]

+1 For this.

After some talk with Wim in IRC, I've made the following changes:

  1. Moved the form into admin/settings/performance which seemed like a more intuative fit.
  2. Added a link on the admin/build/path page to the performance section.
  3. Added a placeholder for documentation in the handbook, ala what happens when the clean_url check fails.
  4. Fixed language issues highlighted by Sun above and made slight changes for readability.

This changes should be reviewed, but this should apply clean to HEAD and IMHO is RTBC.

Josh/Wim

The variable_get() stuff is repeated and that is a maintenance headache (can change on place without changing the other), so it may be better to create a function to return the default and use it in both places, or a define for it.

StatusFileSize
new9.19 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_1.patch.
[ View ]

I do not see variable_get repeated. But I did see drupal_path_match and the query repeated quite a bit. Let's put all those into one line...

Yes, the regexp is repeated.

What I meant was this is repeated twice too.

variable_get('path_optimization_blacklist', "admin*\n*/add*\*/edit\n*/outline\n*/revisions\n*/track\nadmin\nadmin/*\nbook\ncomment/*\nfilter\nfilter/*\nlogout\npoll\nprofile\ntracker*\nuser\nuser/register\nuser/login");

The stuff between the double quotes has to be a define or returned from a function, otherwise it is a maintenance headache.

StatusFileSize
new8.08 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_2.patch.
[ View ]

A lot more simplifications: the form belongs to system module. There is only one variable to store and use.

Status:Needs review» Reviewed & tested by the community
StatusFileSize
new9.77 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_3.patch.
[ View ]

Final tweak: added weights to performance settings fieldsets so that path fieldset added via form_alter is at the bottom, but not below the buttons.

RTBC!

Status:Reviewed & tested by the community» Needs work

Can't we build these strings by optional function args in drupal_path_match() or another function:

+++ includes/path.inc
@@ -45,12 +45,12 @@
+    $regexps[$patterns] = '/^('. preg_replace(array('/(\r\n?|\n)/', '/\\\\\*/', '/(^|\|)\\\\<front\\\\>($|\|)/'), array('|', '.*', '\1'. preg_quote(variable_get('site_frontpage', 'node'), '/') .'\2'), preg_quote($patterns, '/')) .')$/';
+++ modules/path/path.module
@@ -297,6 +297,31 @@
+      '#default_value' => variable_get('path_optimization_list', "admin*\n*/add*\*/edit\n*/outline\n*/revisions\n*/track\nadmin\nadmin/*\nbook\ncomment/*\nfilter\nfilter/*\nlogout\npoll\nprofile\ntracker*\nuser\nuser/register\nuser/login"),

Increasing readability means doing something like:

<?php
array(
 
'admin*',
 
'*/add*',
 
'edit',
  ...,
);
$crumb = '';
foreach(array() as
$crumb) {
 
$crumb .= $crumb ."\n";
}
?>

See? I wasn't able to quickly read the default paths here correctly.

Alternatively we could do:

<?php
$paths
= '"admin* */add* */edit */outline */revisions */track admin ad{...}';
$paths = str_replace(' ', '', $paths);
?>

but IMHO an array fits better here.

We all know Dries hates Regex so lets cut to the chase and go with sun's proposal?
You get +1 from me for going with arrays :)

Status:Needs work» Reviewed & tested by the community

sun's proposal won't remove the regexs, you can't remove them. They were present in Drupal core as they are used in there, the matches are simply abstracted. It does not worth the bothering to do an array of defaults and implode it, I considered it and said 'neh', i'ts just a string and quite rarely used one at that. My editor can wrap long lines....

What about Heredoc? No extra burden, 1:1 functionality, plus readability.

I'm not switching status again.

StatusFileSize
new9.37 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_patch_4.patch.
[ View ]

You hardly need heredoc syntax to use newlines. The patch is essentially the same as josh_k 's but I actually pressed Enter in place of \n . Thanks for the idea.

For the record, we should not add any caching functionality, there's another patch for that kind of functionality.

IMVHO, this is RTBC now.

Status:Reviewed & tested by the community» Needs work

This looks good, however ...

1. Let's use defines (PATH_WHITELIST and PATH_BLACKLIST) instead of 1 and 2.

2. 'No optimizations' should probably be the first choice?

3. Before this can be committed, I'd like to see some benchmarking done. Maybe the regex (with long white/black lists) is slower than a couple of database calls? Some testing is in order ...

Status:Needs work» Reviewed & tested by the community
StatusFileSize
new9.81 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup.patch.
[ View ]

Added defines and reordered the options. No functionality change though, I have choosen the values of defines as described in the lengthy relevant comment. Benchmarking is not feasible, it's way too much dependent on your web frontend and database backend capabilities and the lists themselves.

Q: Doesn't */outline belong to category module?

Whatever it is, it is not core. So it can go away.

*/outline is from book.module - see http://api.drupal.org/api/HEAD/function/book_menu

moshe weitzman talked to me a couple of days ago about another interesting possible addition: the ability to store *all* aliases in a simple variable (in the variable table, that is), but only if there are <50 aliases in total (or something of that order).
I told him I would implement it the next day, but didn't get to it yet... What do the others think of this?

@ Wim Leers, I don't think its necessary. Either do a one size fits all or nothing at all. It makes it easier for future updated and enhancements to this code and not to mention how would you handle the special cases when the <50 eventually become >50?

Introducing more code? I think we should wait for this to be benchmarked and committed then add more to the plate.

@bioalien, again, the patch is not benchmarkable. There are too many variations -- how many aliases, how big iron you have for frontends and backends etc.

In theory it is benchmarkable.

All you need is a site with lots of aliases, and a busy home page (lots of nodes, lots of comments, taxonomy terms, as well as the admin navigation menu, ...etc.), apply the patch and see the effect (or lack thereof).

An example: a client site does some 340 SELECTs from the url_alias table for the front page.

For most of these, MySQL's query cache helps a lot, and they take 0.05 ms each.

Still, this patch helps by avoiding a query to being with. I can't imagine a regexp being slower than a query, even if the result is cached.

It needs a site running HEAD though, which is why I said: "theory".

Hmm, maybe on scratch.drupal.org?

Drupal.org would be the best place to benchmark it if it can be benchmarkable, but my original view remains that I don't think we need to factor anything more to this including the <50 aliases. It's good as it is.

one could argue that the < 50 aliases is the most important optimization because it requires no configuration and will speed up the initial drupal experience for all users. out of the box we avoid 50 or so queries. and many more if you start enabling contrib modules that add links to the page. in any case, this patch has lots of merit even without the < 50 aliases optimization.

Is here anyone who reads what I am typing? Now matter how long you describe one given scenario, there is no one size fits all situation here. Every site will be different depending on the number of the aliases, the aliases themselves, the speed of server used as a webserver and the speed of server used a database server. You can't benchmark all.

Status:Reviewed & tested by the community» Needs work

Sorry, but changes like this need to be benchmarked. What is the impact on a stock Drupal with 0 path aliases? What is the impact on a Drupal site with a handful of path aliases? We don't have to do rigorous benchmarks, but a minimum number of benchmark tests is in order. Just for sanity.

I've got a 5.1 site with 26,988 paths on a VPS, and will test/benchmark if/when there's an upgrade path to HEAD.

tracking.

Status:Needs work» Needs review

I believe that falls into the review category.

nice patch... i'd like to ask one thing...

1. Why not blocking alias lookups for logged in users at all, if path starts for e.g. with "node/*
2. then we lookup only for named aliases that must be non-core or non-module like "example/test"
3. this logic makes some urls not looking nice, ok, but internally for logged in users - it shouldn't be a problem - mostly
4. maybe this can be named "aggressive non-lookup path" feature like the "aggressive cache" feature
5. however the patch should be a big step forward...

re: hass.

There's the path alias patch that chx started, which would do a redirect to /node/ to an alias if one exists, so the "ugliness" would only be there for the mouseover, with it reverting to aliases when the page was loaded (I think, a bit like global redirect does now). Those two together would work beautifully.

Title:drupal_lookup_path() optimization: provide a configurable blacklist and whitelist of Drupal pathsdrupal_lookup_path() optimization: use static map

profiling my own drupal site(s) which have a multitude of url_aliases thanks to pathauto, I also tried to reduce the mindboggling number of db queries issued by drupal_lookup_path.

Instead of pattern matching I just load the entire url_alias table into the static $map hash inside the function. That way I don't have to piecemeal it together with db_query calls for each "cache miss".

Something that's not in the cache doesn't have an alias. Simple.

Works. So what am I missing?

Title:drupal_lookup_path() optimization: use static mapdrupal_lookup_path() optimization: provide a configurable blacklist and whitelist of Drupal paths

Please don't change the title to reflect your opinion. It should reflect the current implementation of the patch.

The memory usage of the solution you are using scales linearly with the amount of path aliases. So it's a non-solution. (I don't want to see the memory usage on a site with 150k path aliases.)

i think so too, i have some boxes with +120K aliases in the pipeline... such a way will kill the machine.

Makes sense about the memory, and sorry for changing the Title. Didn't mean to express my opinion but discuss alternate solutions to this problem. Am I supposed to create a new patch + discussion for that?

A patch, discussion, and benchmarks already exist at http://drupal.org/node/137934. To summarize, your idea of loading the entire table into memory is only useful for small sites with less than 500 (or so) nodes and aliases. Get over 500 nodes/aliases and your memory usage & execution time go through the roof.

This whitelist/blacklist implemention is MUCH better and I look forward to using it in Drupal 6. If only it would get committed... Hint, hint...

@kevinweber: Dries wants benchmarks... if you can benchmark, please do so, so we can get this in D6 before the code freeze (which is in 5 days!)

@jhm: no problem :)

Tried to apply to a 5.1 test site but no go: "2 out of 4 hunks failed". How hard would it be to re-roll this against 5.x so a few people can test it?

OK some initial tests using the patch from #8 and kbahey's list from http://drupal.org/node/106559#comment-224015 - since I didn't get a skiplist by default.

With kbahey's skiplist I got very little difference. However blacklisting node/* forum/* and user/* made a big difference:

/forum
with:
Executed 365 queries in 616.07 milliseconds
without:
Executed 469 queries in 704.04 milliseconds.
/
with:
Executed 206 queries in 636.38 milliseconds
without:
Executed 247 queries in 581.6 milliseconds
/node/123: ;)
with:
Executed 342 queries in 738.22 milliseconds.
without:
Executed 398 queries in 375.88 milliseconds.

Note that [code]without[/code] is on my live site, [code]with[/code] is on a copy of that site from earlier today with the patch applied, which means that things like cache_set are throwing out the generation time I think. It's clear that the number of path lookups is being reduced though.

Also book/export/* should be added to the list.

Just for clarity, he tested the latest Drupal 5 compatible revision of this patch. There's only the blacklist in that version, but the performance should be virtually identical. The added benefit of the new patch is that it also has a whitelist.

@catch: in the new patch, there are different defaults and "skiplist" is now "blacklist". But I agree, "book/export/*" should be added.

Didn't notice this immediately, but your benchmarking setup seems seriously flawed to me. You're testing sites on different hardware in different situations.

Please repeat your test, but test on the same server, in the same situation (i.e. your test site). Thanks!

WimLeers, sorry, I wasn't clear.

My test site in this case is on the same server as my live site, so hardware, server load etc. was the same (I had each open in different tabs when testing the pages with devel). Like I said though, some of the longer queries like cache_set make the actual timings (if not number of queries so much) unreliable. I don't see a way 'round this except maybe clearing cache on both sites just before each page load. If you think that'd help I'll give it a go. Or I could just post up the few longest queries from each result to show where the variance is.

catch, you seem to be the only hope to get this into D6. Could you please test again, but either test only on your dev site, or only on your live site (preferably the former, I guess). That should give some more solid numbers.

Status:Needs review» Reviewed & tested by the community

OK these were on test site only. it's nearly 2am here, so rather than apply + revert patch, I killed skiplist for 'without' which should have the same effect (I hope so anyway). I can try to test again tomorrow properly unpatched if that'll help, but not now.

Some of these I've included two or more page requests to show variance.

without skiplist:
front (which is a panel with several views blocks)
Executed 248 queries in 802.77 milliseconds
Executed 248 queries in 92.38 milliseconds.
/forum
Executed 541 queries in 1325.56 milliseconds.
Executed 470 queries in 636.45 milliseconds
/tracker
Executed 510 queries in 309.84 milliseconds.
Executed 490 queries in 98.93 milliseconds.
xecuted 531 queries in 144.12 milliseconds
with the skiplist:
front
Executed 237 queries in 830.85 milliseconds.
Executed 206 queries in 101.68 milliseconds.
/forum
Executed 364 queries in 615.24 milliseconds.
/tracker
Executed 399 queries in 83.83 milliseconds.
Executed 419 queries in 95.35 milliseconds.
Executed 399 queries in 84.38 milliseconds

Overall some of those pages are a lot faster and have 20% less queries, even on subsequent page requests when query cache is taking over.

Marking RTBC, because like you say, it seems like no-one else can review atm.

sorry, 20% means nothing. It's taking 100 to 150 queries out of those pages. That's a lot.

Dries wants benchmark on a clean install/few nodes/aliases as well - surely someone can do that with 6.x?

last post tonight.

Also some shared hosting appears to have silly limits on how many mysql queries you can run per minute or whatever. So even without the performance improvements it'd help a lot of users on shared hosting to avoid running into those TOS issues.

catch, could you please repeat one more time, but this time using the "ab" command line tool? I didn't notice that you didn't use this the previous time.
Something like this:

ab2 -c10 -n1000 http://localhost/some/path

Thanks!

OK Wim. I dunno what I'm doing with ab, at all. But here goes:

ab -c10 -n1000 http://scratch.mysite.org/forums

without:

Server Software:        Apache/2.2.3
Server Hostname:        scratch.mysite.org
Server Port:            80
Document Path:          /forums
Document Length:        25089 bytes
Concurrency Level:      10
Time taken for tests:   18.160220 seconds
Complete requests:      1000
Failed requests:        4
   (Connect: 0, Length: 4, Exceptions: 0)
Write errors:           0
Non-2xx responses:      4
Total transferred:      25538722 bytes
HTML transferred:       24995688 bytes
Requests per second:    55.07 [#/sec] (mean)
Time per request:       181.602 [ms] (mean)
Time per request:       18.160 [ms] (mean, across all concurrent requests)
Transfer rate:          1373.33 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   10 128.3      0    1538
Processing:    11  168 954.6     35   12704
Waiting:       10  153 909.5     34   12703
Total:         11  179 1057.3     35   14242
Percentage of the requests served within a certain time (ms)
  50%     35
  66%     44
  75%     57
  80%     73
  90%    147
  95%    262
  98%   2512
  99%   3424
100%  14242 (longest request)

with skiplist:

Server Software:        Apache/2.2.3
Server Hostname:        scratch.libcom.org
Server Port:            80
Document Path:          /forums
Document Length:        25237 bytes
Concurrency Level:      10
Time taken for tests:   16.270393 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      25781000 bytes
HTML transferred:       25237000 bytes
Requests per second:    61.46 [#/sec] (mean)
Time per request:       162.704 [ms] (mean)
Time per request:       16.270 [ms] (mean, across all concurrent requests)
Transfer rate:          1547.35 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.8      0      37
Processing:    11  153 810.3     84   14729
Waiting:       10  138 672.0     84   14720
Total:         11  154 810.7     84   14751
Percentage of the requests served within a certain time (ms)
  50%     84
  66%     91
  75%    109
  80%    133
  90%    184
  95%    258
  98%    347
  99%    772
100%  14751 (longest request)

I'll be around next couple of hours, but not much after that, so let me know if you need any more.

just to clarify, just in case, these are the same site again, I just forgot to take my domain name out of the second one.

One more. I'd taken out node/* forum/* and user/*, this is the same request with those added to skiplist.
Big, big difference.

Server Software:        Apache/2.2.3
Server Hostname:        scratch.libcom.org
Server Port:            80
Document Path:          /forums
Document Length:        24739 bytes
Concurrency Level:      10
Time taken for tests:   13.365509 seconds
Complete requests:      1000
Failed requests:        1
   (Connect: 0, Length: 1, Exceptions: 0)
Write errors:           0
Non-2xx responses:      1
Total transferred:      25259770 bytes
HTML transferred:       24716022 bytes
Requests per second:    74.82 [#/sec] (mean)
Time per request:       133.655 [ms] (mean)
Time per request:       13.366 [ms] (mean, across all concurrent requests)
Transfer rate:          1845.57 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.2      0      13
Processing:    11  131 894.3     11   12119
Waiting:       10  118 836.2     10   12118
Total:         11  131 894.3     11   12119
Percentage of the requests served within a certain time (ms)
  50%     11
  66%     25
  75%     54
  80%     82
  90%    109
  95%    136
  98%    392
  99%   2052
100%  12119 (longest request)

Thanks a lot for your time, catch! Now we can get it in before the code freeze.

without:

Requests per second:    55.07 [#/sec] (mean)
Time per request:       18.160 [ms] (mean, across all concurrent requests)

with:

Requests per second:    74.82 [#/sec] (mean)
Time per request:       13.366 [ms] (mean, across all concurrent requests)

If this isn't a performance improvement, then shoot me!

Definitely RTBC!

I think I prefer the solution that is outlined over here:
http://drupal.org/node/100301

If this isn't a performance improvement, then shoot me!

BANG! You are dead!

Seriously,

Note that the patch by default will show less improvement, since it has these paths:

*/add
*/edit
*/outline
*/revisions
*/track
book
comment/*
filter*
logout
poll
profile
tracker*
user
user/register
user/login

These give 61 req/sec, vs 50 without.

While the ones that REALLY speeded it up a lot are:

node/*
forum/*
user/*

With node/* blacklisted, there is no point in even enabling the path module.

However, the patch is of worth, and should go in. +1 from me.

In a second round, we should figure out sensible defaults, perhaps in a "moderate" and "aggressive" configurations. This could simply be documentation below the text area.

I was so excited by the numbers that I didn't think about that :P

61 vs 50 is still 20% faster by the way. Not bad for a simple patch!

Thanks for the review! Now let's get somebody to commit this...

http://drupal.org/node/147143 - does it allow for this to happen?

* blacklist node/*
* but when you visit a node/123 link, drupal 301 redirects you to the active alias, so you end up on a 'nice' url again.

That way, you save your extra 100 or whatever requests per page, and the only non-prettiness is in the status bar/source. Add an extra option later on to only run the blacklist for authenticated users and boom boom.

If so, bullet proof.

No, that's something different. Blacklist = blacklist = no alias at all.

Let meadd here that this does not collide with killes' caching patch much rather adds to it -- when you have a cache miss you will have much less queries and also you need to cache / process smaller structures.

Wim Leers: thanks for clarification, obviously that really would be too good to be true.

Dear Core Committer, where art thou?

Another url alias optimization approach, complementary to this one: http://drupal.org/node/153888

Status:Reviewed & tested by the community» Needs review

Looked at the patch. My comments:

- The abstraction of drupal_path_match is very good! This should be in Drupal even if the whole patch will not get in.
- The comments added to drupal_lookup_path() still contain the number behind the constants. There is no point in this.

I put a deeper thought into this though. Even I don't have a complete picture of what type of source paths I set aliases for on some of my sites. I set them up and expect them to work just fine. Some sites I even have aliases for */edit paths (automated though custom_url_alias()). So there are two problems with this approach:

- The average/beginner user has no idea of what paths he should have there. It takes a considerable research to collect this information, although the system knows it already, and knows it much better then the user.
- This can result in nasty surprises very easily. If some module or I try to add aliases for obvious paths like book, logout and so on it won't work.

I am unsure about *the user experience* of this patch.

I also reviewed the patch Gerhard pointed to (http://drupal.org/node/100301) and although chx pointed out that these two patches can be complementary, I would like to point out that the user experience is much better there (although it is not complete yet).

One possibility is that we have the variable handling part of this patch and the UI will be implemented in a contrib module you intentionally set up and tune yourself. But this should not be the sole performance improvement, Gerhard's automated stuff looks into leveraging what the system already knows, instead of the admin telling it again in more abstract terms, and needing to do the abstraction by a human.

Status:Needs review» Needs work

looks like after a through *review* goba meant it needed work...

I am happy with moving the UI to a Contrib module. Some admins might even just hard code their whitelist/bloacklist arrays in settings.php.

I am not too happy with Gerhards path patch at http://drupal.org/node/100301. Duplicating hundreds of aliases for every unique path on your site is just too inefficient. If there was a way to fix the storage problem (Grugnug's cache patch?), it would be much more attractive.

I agree. The user experience is too difficult for a large part of the user base. Hence the UI should move to a contrib module.

If everybody agrees, I'll reroll. Or somebody else can if he/she wants to (I'm in the middle of my exams).

Thanks for the review, Gábor!

Status:Needs work» Needs review

The comments added to drupal_lookup_path() still contain the number behind the constants. There is no point in this.

Fixed. I have rewritten the comments there completely, although I'm not sure they are really necessary. But I think it won't hurt, it might otherwise be confusing what exactly is happening on line 91.

I also found that on line 71 there was still a "2" in the variable_get(), instead of the corresponding define.

And lastly, I have renamed the $optimize_type variable to $optimization_type, for consistency.

A short review of these changes will be sufficient now to mark it as RTBC.

StatusFileSize
new5.9 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_no_ui.patch.
[ View ]

Forgot the patch...

@Wim: why not setting a default path_optimization_list? Maybe paths without any harm and side effects and we know well (for e.g core). Moving the UI into contrib sounds like a good idea, but not using this performance enhancement by default sounds strange.

StatusFileSize
new5.87 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch optimize_drupal_lookup_no_ui_0.patch.
[ View ]

@hass: of course! Stupid that I didn't think of that! :(
Note that it will not be obvious at all that there is a default blacklist, for people who don't have the UI counterpart installed. I think this should be added to the documentation of the Path module to avoid confusion. That's the downside of not having the UI in core (the upside outweighs the downside here significantly though).

Updated the patch!

Patch looks really good. It has been throughly tested and benchmarked. The code is well documented, especially if you read the patched module instead of the patch. I must say I'm in love with this patch ;)

I have played with it for a couple of hours and not stepped into any problems.

The only, minor, issue I can find is that this feature should be documented somewhere in the ui. It could be somewhere in /help to not add clutter to any other pages. That could however be added after patch hits core.

I dare not to RTBC this patch myself, but it's really, really close. There are lots of people who have posted in this issues, surely someone can step in and do a final review?

Wim: adding -c10 to the ab commandline is useless if the server can't cope with this.

It is evident from the posted results that in this case -c10 was too much for the server in question since the standard deviation for the page generation times was much higher than the average time.

It would have been better to use -c1 (which is the default) since you want to measure the performance of a single Drupal page and not a server.

The patch looks good. Actually I have similar patches on some sites running Drupal 5 on cheap web hostings, because the amount of not needed queries produced by the current path system is *huge*.

So +1 for this. I think the define's could be better commented, so at least there's information somewhere about this 'hidden' feature. But I like it.

You don't really need benchmarking to realize this patch will boost performance by avoiding all these useless queries :-)

(And btw, if you didn't fix existing comments with the same patch, it would be easier to review...)

I forgot: The only important thing I miss is one line of validation code in path.module to tell the administrator when he's defining an alias for a blacklisted path.

Status:Needs review» Reviewed & tested by the community

I am now testing Drupal HEAD on a otherwise unoccupied server using ab -n100 on /node. The site has 1000 nodes and each node has a path alias. No other path aliases are in use.

1) testing as anonymous user without page cache

                 min  mean[+/-sd] median   max
A) The site without the patch with query cache:
Total:        518  527  12.6    524     609
Total:        518  523   3.1    524     533
B) With the patch and query cache:
Total:        523  529   7.7    528     585
C) Without patch, without query cache
Total:        593  609  10.4    605     644
D) With patch without query cache
Total:        594  603  14.8    599     674

There is no significant difference between A and B nor between C and D.

One possible reason that this doesn't look impressive might be that the site hasn't a lot of modules enables and thus only a few links.

2) Testing as logged in uid = 1 on /admin

A) Without path, query cache enabled
Total:        482  494  20.9    489     658
B) With patch and query cache
Total:        436  448  14.2    442     510
C) no patch, no query cache
Total:        570  581  11.0    577     642
D) patch, no query cache
Total:        482  491  16.3    486     569

This looks quite nice Form A to B there is a improvement of about 9% and from C to D by 15%.

What we have tested here are the two extreme cases. 1) is a page where almost all links are aliased, hence no improvement with the patch. 2) is a page with a lot of links that aren't aliased, thus a good improvement, even when using the query cache.

The average Drupal page will be somewhere in between.

I have no objections wrt the code, thus RTBC.

@killes: I'm not very much into this benchmarking stuff, so I took over the line from the Drupal docs ("How to benchmark Drupal"). Also I'm not into statistics (next year I'll get this in university... mehhh :P). So I apologize for the not so representative benchmarks.
And thank you very much for your detailed (and *correct*) benchmarking!

@Jose A Reyero: Dries wanted benchmarks, so that's what we give him! :) And your suggestion is a very nice one. It's so obvious now :) I will fix that in a follow-up patch, but first let's get this committed!

Status:Reviewed & tested by the community» Needs review

Personally, I don't think we should be storing this in a variable. I also don't think this should have a UI. I'd like to see us explore mechanism that are configuration-less. I'm going to try and experiment with this myself a bit the next couple of weeks. Not committing this (yet).

It doesn't have an UI anymore. The UI will be implemented in a contrib module. Or you don't want that either? In that case, could you please clarify how you think this should work?

Bump!

Here is an idea to revive this.

If we go by a dynamically build white list, this will be configuration free.

My idea is to build a list of the top level components of the src field, and use that as the whitelist.

So, we query the url_alias table for the part before the first slash, and then stick this into a variable, and then query the url_alias only if the path we lookup exists in this dynamically built list.

The list can be built via cron once a day or once a week.

To demonstrate this, here is a site with 66,000+ aliases.

When I run this query, it tells me that the part before the slash are only 6, as follows:

mysql> select substring_index(src, '/', 1) as path, count(*) as cnt from url_alias group by path order by cnt desc;
+----------+-------+
| path     | cnt   |
+----------+-------+
| user     | 35969 |
| blog     | 17957 |
| node     | 12092 |
| taxonomy |    50 |
| forum    |     6 |
| rss.xml  |     1 |
+----------+-------+

So, if we are run a query similar to the above once a day and stick the result in variable, we now have an automatic white list that is tuned to the site's content. The drupal_lookup_path() function then checks if the path passed in starts with the any one of the saved top level components, and only if it matches does it go and get it from the database, otherwise, no queries are issued.

No need to manually edit anything, and works for every site regardless.

I will roll a patch if this has a chance of getting in (Dries, Moshe, et. al. can you comment?)

Good idea. But why rebuild the "parts before the first slash" cache every day, when you can do it when path aliases change (just as we do with all other caches)?

I'm a bit on the fence about kbahey's proposal. There will be lots of whispered wisdom's like "don't alias any admin paths or you will massively increase load. don't alias any user's - massively increase load. and so on.". maybe those are not a big deal.

I like the idea of working on cron. We don't have to worry if modules do import or write of paths without passing through our APIs. Also, that query could get expensive and it would suck to do it on every insert. Consider pathauto bulk import in the extreme case. But it isn't a big deal either way.

Hmmmm.

I see your point.

This also means the changes are localized in a single function (only drupal_lookup_path() changes).

So, when $action is 'wipe', we rebuild the whitelist from the following query:

SELECT SUBSTRING_INDEX(src, '/', 1) AS path FROM {url_alias} GROUP BY path

Takes 350 ms for 66K aliases.

When $action is 'alias' or 'source', we lookup the path's first part, then skip the db query if it is not in the white list.

I will roll a patch.

StatusFileSize
new1.84 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path-whitelist.patch.
[ View ]

Here is the patch.

To force a rebuild of the whitelist, go to admin/build/path and add a path. You should see that the variable table now has an entry with the name 'alias_whitelist', containing the top level component of aliases (e.g. user, node, ...etc.)

The patch applies with some fuzz to 5.1. I applied it to a site, and here are the results:

Before: 482 queries.
After: 433 queries.

So, ~50 queries saved.

IMO, that isn't enough saved queries to be very interesting.

That last patch is no good.

It causes 404s on aliased node views.

Please do not use it on a live site.

every query that can be saved is less load on the DB and therefor very good.

StatusFileSize
new2.13 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path-whitelist-2.patch.
[ View ]

The 404 issue has been solved.

Here is the test setup:

The alias white list ('alias_whitelist' in the variable table) include only 'user' and 'node', and nothing else. In effect we are excluding those that start with 'admin' and 'comment' as well as a few minor ones (e.g. 'misc/feed.png', and 'logout').

Using the devel module, I generated a total of 1001 nodes and 1001 users.

10 nodes are visible on the front page.

Before the patch:
Executed 99 queries in 68.83 ms. Page execution time was 193.44 ms.

With the patch:
Executed 85 queries in 42.82 ms. Page execution time was 136.59 ms.

Viewing an individual node in full page view with 5 comments.

Before the patch:
Executed 52 queries in 27.83 milliseconds. Page execution time was 206.13 ms.

With the patch:
Executed 37 queries in 20.69 milliseconds. Page execution time was 74.4 ms.

I would not count too much on the timings above, because this is a test server.

But it does reduce the number of queries.

Am I right in thinking the node and user whitelist means there are still lookups for:

*/add
*/edit
*/outline
*/revisions

Easy way to continue to remove those, but still keep the same whitelist method?

Assigned:Wim Leers» kbahey

Yes, it will still look those up.

Remember that they will only be looked up if there are links to such paths on the current page.

So, if you are viewing node/1, and you are the author of that node, then you will see one query to node/1/edit (because of the tab on top).

This patch does not eliminate this, but attempts to reduce the number of queries for paths that we KNOW FOR SURE do not exist in the alias table, and has a mechanism of automatically determining which paths to not query at all, so it is configuration-less as Dries wants it.

I ported the patch to 5.1, and ran it on a live site.

The white list is the same as the SQL output in comment #102 above.

Here are the results:

Before the patch.
Front page 484 queries
Single node view 101 queries

With the patch:
Front page 437 queries
Single node view 72 queries

The benchmarks are in.

Using devel, generated content and users (used default settings there).

$ ab -c10 -n1000 http://example.com/

Before patch

Concurrency Level:      10
Time taken for tests:   86.784815 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Requests per second:    11.52 [#/sec] (mean)
Time per request:       867.848 [ms] (mean)
Time per request:       86.785 [ms] (mean, across all concurrent requests)

With patch

Concurrency Level:      10
Time taken for tests:   76.791515 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      13918000 bytes
HTML transferred:       13389000 bytes
Requests per second:    13.02 [#/sec] (mean)
Time per request:       767.915 [ms] (mean)
Time per request:       76.792 [ms] (mean, across all concurrent requests)

So there is an 11% improvement.

StatusFileSize
new4.89 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path-whitelist-4.patch.
[ View ]

Updated patch attached.

This one restructures the code a bit for clarity, inlines the whitelist creation to avoid a new function, and has a fix for #162384 as well (one liner, doesn't warrant a patch on its own).

Devel says the following:

Front page
Before the patch: Page execution time was 150.57 ms. Executed 99 queries in 33.14 milliseconds.
With the patch: Page execution time was 131.98 ms. Executed 84 queries in 18.6 milliseconds.

Single node
Before the patch: Page execution time was 101.24 ms. Executed 61 queries in 27.33 milliseconds.
With the patch: Page execution time was 73.58 ms. Executed 38 queries in 10.94 milliseconds.

Those stats and benchmark figures are very impressive indeed!

I would really like one or two people to install this patch and replicate the results before we can say that this is conclusively positive for performance.

Please, anyone who can do a benchmark, go ahead and do so, and post the results here.

Good morning to all. I'm responding as a result of Dries Buytaert's email to the development list "Drupal performance patches: call for action". The patch thus far does indeed look impressive; however, Dries asked for ideas. What if we add the PID column to the {node}, {term_data}, and {vocabulary} tables as a foreign key to the {url_alias} table? As I understand from use of this valuable tool there is a 1:1 relationship between {url_alias} and one of the other three tables. You would then use the primary key to do the lookup which should be slightly faster still.

Results from benchmark testing.

Fresh Drupal Install, no content

*** Without the Patch - http://www.example.com/

Concurrency Level:      1
Time taken for tests:   160.835539 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      5506896 bytes
HTML transferred:       4973000 bytes
Requests per second:    6.22 [#/sec] (mean)
Time per request:       160.836 [ms] (mean)
Time per request:       160.836 [ms] (mean, across all concurrent requests)
Transfer rate:          33.43 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   158  160   0.9    160     165
Waiting:      152  158   3.2    160     165
Total:        158  160   0.9    160     165

*** With the Patch - http://www.example.com/

Concurrency Level:      1
Time taken for tests:   161.176602 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      5506962 bytes
HTML transferred:       4973000 bytes
Requests per second:    6.20 [#/sec] (mean)
Time per request:       161.177 [ms] (mean)
Time per request:       161.177 [ms] (mean, across all concurrent requests)
Transfer rate:          33.36 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   158  160   0.9    160     163
Waiting:      152  158   3.2    160     163
Total:        158  160   0.9    160     163

After generating users and content described in the Benchmarking Howto
2,000 users, 5,000 nodes, 10,000 comments

*** Without the Patch - http://www.example.com/

Concurrency Level:      1
Time taken for tests:   226.879019 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      14713000 bytes
HTML transferred:       14196000 bytes
Requests per second:    4.41 [#/sec] (mean)
Time per request:       226.879 [ms] (mean)
Time per request:       226.879 [ms] (mean, across all concurrent requests)
Transfer rate:          63.33 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   223  226   3.7    227     331
Waiting:      215  218   3.7    218     323
Total:        223  226   3.7    227     331

** With the Patch - http://www.example.com/

Concurrency Level:      1
Time taken for tests:   224.224222 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      14605000 bytes
HTML transferred:       14088000 bytes
Requests per second:    4.46 [#/sec] (mean)
Time per request:       224.224 [ms] (mean)
Time per request:       224.224 [ms] (mean, across all concurrent requests)
Transfer rate:          63.61 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   220  223   3.7    224     325
Waiting:      212  215   3.5    216     317
Total:        220  223   3.7    224     325

Just for fun, I tested it with the administration menu available and expanded for anonymous users.

*** Without the Patch

Concurrency Level:      1
Time taken for tests:   251.769209 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      17931000 bytes
HTML transferred:       17414000 bytes
Requests per second:    3.97 [#/sec] (mean)
Time per request:       251.769 [ms] (mean)
Time per request:       251.769 [ms] (mean, across all concurrent requests)
Transfer rate:          69.55 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   247  251   2.6    251     275
Waiting:      230  242   2.7    243     267
Total:        247  251   2.6    251     275

*** With the Patch

Concurrency Level:      1
Time taken for tests:   238.858186 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      17961000 bytes
HTML transferred:       17444000 bytes
Requests per second:    4.19 [#/sec] (mean)
Time per request:       238.858 [ms] (mean)
Time per request:       238.858 [ms] (mean, across all concurrent requests)
Transfer rate:          73.43 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   235  238   1.8    238     243
Waiting:      227  230   1.6    230     235
Total:        235  238   1.8    238     243

@mikejoconnor

Thanks for benchmarking. I see some improvement for the admin menu test (which has no aliases I assume.

Just to confirm:

Did you go in and add an alias for a node after installing the patch?
The reason this is needed, is that there is a variable called site_whitelist that holds the white list of top level items. It gets regenerated when an alias is added.

To confirm that a white list exists, run this SQL:

SELECT * FROM variable WHERE name = 'alias_whitelist';

Then repeat the tests.

killes installed a 5.x version of the patch on drupal.org, and it has been running there for a few days.

Any update to this patch after it running on drupal.org? It's been over a week since there was any response to this thread.

Perhaps we should already commit the drupal_path_match($path, $patterns) function that was included in my patch? This is a function that would be useful at many places, regardless of how the path lookups will be optimized. At least that function will make it into D6 then.

StatusFileSize
new1.38 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path_inc_opt_2.patch.
[ View ]

I need to look more at this main patch, but here's a small additional code optimization patch that can be combined with the larger patch.

I have been struggling with the expensive drupal_lookup_path() queries for a time myself, and actually ended up disabling the feature by cleaning the alias table. Sad, cause it is is a nice feature, but with many links and large menus, several hundred queries can’t be backed up in any way.

I just have some small inputs on the issue. In a way I can see why this is a hack to the core, when it is so that this is part of path.inc, and not path.module. But I am not sure this is the right spot to put the “white list”. First I see this as a complication of settings in Drupal, and should be an option for the users. For smaller sites this is not an large issue. Second, if this is not committed to the core it will stay as a patch, and not very accessible for the community. And last, there may be other nice things to do with paths above the existing features, and that’s why we need a place to put it.

Therefore I suggest to build a white list module, or a path_filter module, which can contain this very very nice feature. There may have to be a small hack to path.inc, but this could on a later stage be committed to core if the module has a life to it, but it would be a clear option to use it with the modules path and pathauto.

There, I said it. It’s late in Norway, but I will look into this tomorrow, and follow up on the progress. I ended up having 900 queries per page, and has to do something about it. Can’t sleep... 900 queries... aaaaaah.... Actually, counting queries would now be my new bed hobby.

Version:6.x-dev» 7.x-dev

Why are you going to delay this performance bugfix? As i know - it is running well in d.o. since ~2 months... Dries like to commit performance patches until final as i remember... so we can get this in and get a much faster cms. This patch has been approved by many people... i don't know why it hasn't been committed yet.

Version:7.x-dev» 6.x-dev

OK my fault, mis-understanding by me and possibly over-zealous issue queue tidying. Back to 6.x then!

Status:Needs review» Needs work

Seems 115 and 124 need to be combined into a single patch... this can certainly go in, so let's not wait another 8 months for a release with this :-)

Priority:Normal» Critical
Status:Needs work» Needs review
StatusFileSize
new5.18 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal_lookup_path_optimization_106559_2007101401.patch.
[ View ]

Re-roled patch against HEAD that combine 115 and 124 together. Let's get this performance hit in, please.

Title:drupal_lookup_path() optimization: provide a configurable blacklist and whitelist of Drupal pathsdrupal_lookup_path() optimization: maintain a whitelist of Drupal paths
StatusFileSize
new7.96 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal_lookup_path_optimization_2.patch.
[ View ]

Nice patch - this is a much-needed performance boost for the core aliasing system. I didn't benchmark the latest patch, but I did review and test it. Found a few problems, and fixed them in the posted patch:

  1. The query to get the top level substring of each system path uses the INDEX_SUBSTRING function, which is MySQL-specific. I wrote a solution to this problem in a previous patch (menu rewrite for node menu items), which adds a new db abstraction function called 'db_sql_explode' (uses INDEX_SUBSTRING() for MySQL, and SPLIT_PART() for Postgres) — I have added the functions to this patch.
  2. Unless you manually go in and add/edit/delete a path alias, the whitelist is not built after applying this patch, and thus no aliases actually work. I added a new function to the D6 update script, to call drupal_lookup_path('wipe'), and thus to build the initial whitelist.
  3. Fixed a few inconsistent uses of 'white list' vs 'whitelist' (I think 'whitelist' one word is better).

Is this functionality still considered for inclusion in Drupal 6?

I don't see why not.

Fresh patched drupal install. Everything applied cleanly and no errors thrown.
Enabled path module, manually created three nodes with aliases on the front page.
With patch applied, running front page, devel says: Executed 42 queries in 7.32 milliseconds.
Without: Executed 50 queries in 10.47 milliseconds

A 15-20% reduction in queries on a vanilla install.

So it's still doing what it's supposed to. Obviously this will work best for big sites with more cardinality rather than ones with three nodes. Benchmarks earlier in the issue showed how much of a difference this makes, although some changes in the code (which I can't comment on, hence why I'm not going to RTBC this myself), the functionality is substantially the same, so they should still stand in principle.

Version:6.x-dev» 7.x-dev
Priority:Critical» Normal

Moving this to D7. Sorry.

This is wonderful.

Do we have 5.3 version of the patch? I would like to use it on my site.

StatusFileSize
new8.01 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch lookup_whitelist.patch.
[ View ]

This still applied, just. (Untested) re-roll to remove offset and fix update numbering.

StatusFileSize
new8.2 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch lookup_whitelist_0.patch.
[ View ]

Er, here's one with an attempt at the new defgroup and stuff. This is my first run at something after HEAD has opened up, so not sure if they should be collapsed into one issue like this or not!

I have the same problem with drupal_loopup_path and i think this is a very important patch. remember that small sites are using hosts with few resources. currently i'm using this http://drupal.org/node/102311 as workaround.

Hey

- does anyone know what is the status of Drupal urlalias caching?
- has any of this made it into D5 D6 D7?
- where do we go from here? is there an active issue/ patch to test?
- i.e. which one of these possible solutions should i be testing?

References
http://drupal.org/node/102311
http://drupal.org/node/106559
http://drupal.org/node/100301

Status:Needs review» Needs work

irstudio:

nothing's made it in yet.
adaptive path caching is the more active issue afaik.
This one doesn't apply any more.

Shouldn't drupal provide the possibility to override drupal_lookup_path() ? One scenario could be implementing a module that caches the url_alias table in memcached or other caching mechanism. Currently as far as i know i have to patch core to do this.

Yow this would appear to need a lot of re-factoring. Subscribing!

sub

StatusFileSize
new4.04 KB
Failed: Failed to apply patch.
[ View ]

in case anyone is interested, here is a patch against 5.11

could anyone make it for Drupal 5.12 please?
I used the one above for 5.11, it shows

patch: **** unexpected end of file in patch

I agree, the patch doesn't work noahb, I checked it myself, I can't find what's wrong with it.

Also, I'm wondering why they don't just fix it for 5.12 or 5.13 itself?
Any reason?

Also, many of us don't have access to our servers, shared hosting, and I hate using patch, is it possible for you to include the actual updated files for 5.12?

Drupal 5 and Drupal 6 are stable releases, this is enough of a change that it will only go into Drupal 7. Having this policy stops a lot of unexpected bugs getting into Drupal 5 or 6. To get fixed in any release ever, it needs a new re-roll and some reviews.

Issue tags:+Performance

adding performance tag

Category:feature» task
Status:Needs work» Needs review
StatusFileSize
new3.52 KB
Failed: Failed to apply patch.
[ View ]

#456824: drupal_lookup_path() speedup - cache system paths per page. just got in.

Time to revive this.

I went back to Wm Leers' patch in #92 since it covers a larger number of paths - and it's particularly paths like comment/*/reply which this patch benefits.

What this version of the patch does:

In drupal_lookup_path(), we pull a blacklist of system paths from a variable, any drupal_lookup_path() query which matches one of these paths returns FALSE.

This has the following benefits:
1. On page views where the $system_paths cache isn't set, there are less queries
2. Because those paths are never put into $map, the cache entries for $system_paths are smaller - this cache is per page, so it should reduce table size by something like 10-20% depending on how many paths actually get skipped.
3. It should lower memory usage a bit because the $map static will be smaller (especially when there's hundreds of paths on a page like the comment example below).
4. The big IN() query to fetch all paths for a page has a shorter list to look for - when viewing 300 comments, each with edit, reply and delete links, this is 100ms faster according to devel.

The only disadvantage of this approach in terms of performance is that it currently allows drupal_match_path() to run multiple times on the same path if it appears more than once on a page (because we don't populate $map for blacklisted paths) - I don't expect this to outweigh the various other benefits.

There is absolutely no UI in core for this. The blacklist is in a variable so that you can change it from a contrib module. Currently, if you try to set a path alias which would be included by the blacklist, you get a form_set_error() with a short explanation why. I think the number of people who try to do this and get annoyed will be small, but it's better than allowing people to set aliases only for them to magically not work, and gives enough of a clue for the really annoyed to work out what's going on. The benefits in terms of query savings and performance vastly outweigh this I think. Also in terms of the 80% rule, 100% of people want better logged in performance, 0.0000001% of people want to alias comment/reply/2.

Front page, 10 nodes - empty cache path:
HEAD:
Executed 113 queries in 63.28 milliseconds.
Executed 113 queries in 67.4 milliseconds.
Executed 113 queries in 65.09 milliseconds.
Patch:
Executed 100 queries in 55.71 milliseconds.
Executed 100 queries in 62.73 milliseconds.
Executed 100 queries in 62.36 milliseconds.
-----
Front page, 10 nodes, primed cache_path
HEAD:
Executed 58 queries in 47.78 milliseconds.
Executed 58 queries in 41.81 milliseconds.
Patch:
Executed 58 queries in 46.01 milliseconds.
Executed 58 queries in 46.33 milliseconds.
node with 300 comments, empty cache_path:
HEAD (empty cache_path):
Executed 1264 queries in 461.92 milliseconds.
Patch (empty cache_path):
Executed 358 queries in 152.43 milliseconds.
--
node with 300 comments, primed cache_path():
HEAD:
Executed 341 queries in 231.7 milliseconds.
Patch:
Executed 341 queries in 127.37 milliseconds.

So in the more extreme example, we save 300ms when generating the cache, and 115ms with the cache. With just a vanilla front page, results are also good.

The IN() query currently has a list of 900+ paths in it when viewing 300 comments with edit/delete/reply links- those are removed by this patch which between a smaller record to fetch from cache_get() and a smaller number of paths to query in IN() saves over 100ms.

HEAD:
0.84 [#/sec]

Patch:
0.89 [#/sec]

That's a 5% (ish) performance improvement, over and above the 6% we already got from caching system paths.

Title:drupal_lookup_path() optimization: maintain a whitelist of Drupal pathsdrupal_lookup_path() optimization: maintain a blacklist of Drupal paths
StatusFileSize
new3.89 KB
Failed: Failed to apply patch.
[ View ]

chx reviewed in irc. the blacklist is moved to a helper function in path.inc so it doesn't clutter up elsewhere. Also changed issue title since the whitelist has been gone for quite some time now.

StatusFileSize
new4.8 KB
Failed: Failed to apply patch.
[ View ]

And documentation of the variable in settings.php

Is there a way for modules to extend the black list automatically?

Nope. I thought about having a default array, and then hook_path_blacklist_alter() but that would make it impossible to override the defaults just with core (and a bit harder to write a UI for in contrb).

Module should be able to do this. They might create their own paths and know best if these paths should be blacklisted or not. Core cannot know about this custom paths. Explaining the users of a module to hack their settings.php first is not user-friendly and could also be error prone... I'm not sure about the best way, but we should have one.

This is an incomplete patch review because catch just asked me to chime in on the 'alter hook vs. variable' issue and so I have not read the entire thread.

But impressions from a quick glance through the patch:

+function drupal_path_blacklist() {
+  return 'admin*
+  */add
+  */edit
+  */outline
+  */revisions
+  */track
+  book
+  comment/*
+  filter*
+  logout
+  poll
+  profile
+  tracker*
+  user
+  user/register
+  user/login';
+}

a) If this is what our default list looks like, there's no way to do this without an alter hook. Every module on the face of the planet is going to need to define its top-level paths here (coder*, devel*, */ldap, etc.). If modules try and variable_set() with a regex whenever they're enabled/disabled, "hilarity" is going to ensue.

b) My brain is desperately searching for a pattern here that I can apply to my own modules. It's failing to come up with one. Why is profile there but not profile*? Why isn't forum there? etc.

c) Given that I can only think of maybe 3 top level paths where it makes sense to allow aliasing by default (user/*, node/*, taxonomy/*), I'm a little confused why we're taking a blacklist approach and not a whitelist approach.

d) This whole patch in general just screams for confused users not understanding how this works and thousands of support requests that Path module is broken. It seems like the decision on whether a path is aliasable ought to come from the module developer, not the end user.

e) Here's a crazy idea. Why don't we add another property to hook_menu() where these paths are already defined, so everything about them is in one place? And then if people want to change it (or if, for example, Pathauto module wants to supply an enable/disable UI around this) we already have a tool at our disposal: hook_menu_alter().

Also, one other nitpick: "logout" is now "user/logout" ;)

So I like the hook_menu() approach.

We add

'aliasable' => FALSE, (yes, defaulting to FALSE so it moves back to a whitelist approach).

And if you want to make your paths aliasabe, you default to TRUE - so in core this would be node/*, user/*, taxonomy/*.

Haven't yet looked at how we then generate the whitelist from hook_menu() though.

Status:Needs review» Needs work

Please do not think only about hook_menu. url() is often used in modules or page code and is one of the functions that really needs to speed up and should fire less SQL request.

I often thought about running a single SQL statement doing all alias lookups per page once with an IN statement, but this seems to be a big design issue... we would need a URL registry to collect all URLs of the current page... this single request + a white list could boost up things heavily and remove the ~200 single requests per page only for url_aliases we currently have :-(

hass, we already have that - see the issue linked above committed yesterday. The reference to hook_menu() is about building up the whitelist/blacklist only.

Title:drupal_lookup_path() optimization: maintain a blacklist of Drupal pathsdrupal_lookup_path() optimization: add 'allow alias' property to hook_menu()

Current proposal:

Add new property to hook_menu(), and corresponding column in {menu_router} called 'allow alias', defaulting to FALSE.

In core, set this to TRUE for user/% node/% and taxonomy/term/%. This is our default whitelist (we might add other core paths, but you get the idea).

In drupal_path_whitelist() do SELECT path FROM {menu_router} WHERE allow_alias = 1; and put these into a string suitable for drupal_path_match() - pretty much just a (str_replace('%', '*') and implode().

Otherwise same idea as the current patch.

Then - any contrib module wishing to add itself to the the whitelist specifies 'allow alias' => TRUE. And you (or pathauto if it wanted to provide a UI) can use hook_menu_alter() to manipulate the whitelist. This removes the hard-coded variable, means we don't need anything in settings.php etc. Most importantly, it means that contrib modules which generate lots of links on a page that are never going to get aliased (something like quote or forward module for example) don't need to do anything to avoid lookups for those aliases. And it's only a single line of code to allow aliases for your own module's paths.

There's still a bit of potential for user confusion, but as long as we keep the validation when manually adding aliases, and whitelist any paths which pathauto can generate aliases for automatically, I think it'll be pretty rare.

Does it make any sense to add the column to menu_links instead? We should make sure that folks don't start adding router items just to get aliases. menu_router is designed to be small. We should think about the pathauto use case and make sure it can work with just a few router items. for example, lets say that there are just 10 nodes that you need to be aliased (or terms). Perhaps they are your primary nav pages. Do you allow aliases on node/% and suffer lookups for your thousands of nodes or do you add router items for the 10 that need aliasing?

moshe, that's an evil, evil idea.

Using {menu_links} instead seems OK to avoid this situation though.

Btw, before heading too far down this path, we should probably get buy-in from the menu system maintainers about how they feel about what could be somewhat a 'bastardization' of its true intent.

#446346: url_aliases storage is not normalized makes menu links aliased and never reach drupal_lookup_path anyways.

Title:drupal_lookup_path() optimization: add 'allow alias' property to hook_menu()drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path()

Given #446346: url_aliases storage is not normalized and moving back to a whitelist approach, i'll re-roll this (probably tomorrow) with the same logic as the most recent patch except with a whitelist. If both this and the menu_tree_page_data() changes get in, it'll be confusing using menu links for paths which can be aliased and also loading aliases on the same page.

Trying to get a neutral title that doesn't change with every followup.

Status:Needs work» Needs review
StatusFileSize
new4.7 KB
Failed: Failed to apply patch.
[ View ]

Here it is as a whitelist.

So you only need to edit the variable now if you want to specifically add an alias which isn't for a node, user or taxonomy term. Having a whitelist maps more cleanly to people's conceptions of how Drupal works, and there's not really a need for contrib modules to hook into it, other than (probably pathauto) to provide a UI - since admins choose what to add path aliases for, not contrib (unless you're pathauto).

Status:Needs review» Needs work

settings.php comment: drupal_path_blacklist()

Status:Needs work» Needs review
StatusFileSize
new4.58 KB
Failed: Failed to apply patch.
[ View ]

Thanks hass. Fixed.

catch, I don't understand why we are moving back to a whitelist approach. I liked the menu router attribute/property that webchick suggested much better.

OK so I discussed the menu thing with chx, but we went round on it a fair bit, I'll try to summarize:

Moshe in #161 made the excellent (if evil) point that a site like Drupal.org with half a million nodes but probably less than 200 node aliases might want to whitelist only specific node/n paths instead of all. If we add a column to {menu_router}, then we run the risk of bloating that table - say if a module provides a UI for this. I think that's a no-go for this reason.

The other menu table we could use is {menu_links}. We already have user/% node/% and taxonomy/term/% in there as hidden menu items. We could easily add an 'allow alias' column there, same as the original proposal. If a site adds 200 hidden menu links to {menu_links} that's not a big deal. However using {menu_links} for skipping paths ... seems a bit odd - and chx said if doing a whitelist, might as well just have the string.

Two other factors:
node/% user/% and /taxonomy/term/% blog/% and forum/% cover 99% of aliases you'd ever want to add (this is what pathauto supports - on my D6 site with lots of contrib installed, it's the same list - so no obvious way for other modules to add to that list). In that case, a simple whitelist, with a string based UI in contrib (probably pathauto) would probably be easier to maintain than saving menu links per whitelisted path - I'm not sure how a UI to {menu_links}, which would probably need to cover both hidden and non-hidden items would look..

Even if we use {menu_links} I think we still want to fetch that information out into a drupal_match_path()-friendly string - which probably means setting a variable or cache entry periodically, and quite a bit of additional code weight and complexity.

On my Drupal 6 site with the most modules enabled, I can only set up aliases for user/*, node/*, taxonomy/term/*, blog/* and forum/* in pathauto- I'm not aware of any modules which exposes more path options to pathauto (although I might be wrong on that). Given that, it seems like if we use a whitelist approach (whether variable or hook_menu()), very few modules will ever need to use it - as opposed to a blacklist where most will - this is much less of a motivator towards providing an API.

So the choices are:

1. String variable with 5-6 paths whitelisted - whatever pathauto allows for. Document it in settings.php, pathauto could provide a UI (textarea and button) for them.

2. Blacklist paths in {menu_links} - modules need to explicitly disable aliasing of their own paths in hook_menu() or via menu_link_save(). Needs a theoretical UI / alter mechanism.

3. Whitelist paths in {menu_links} - modules need to explicitly allow aliasing of their own paths in hook_menu() or via menu_link_save(). Still needs a theoretical UI / alter mechanism.

4. Try to automate something based on the contents of {url_alias} like kbahey's patch did (but looking at the SQL in http://drupal.org/node/106559#comment-717843 this looks like it could be really hard to do in dbtng).

So given the options. I think a whitelist, well documented by both validation when trying to add a non-functional alias, which can be edited in in settings.php or via a contrib UI is going to cover most use cases without adding lots of complexity. The only situation you'd need to override this is if adding an alias for a 'non-standard' path - at which point you just need to add that path to a list in the same format as block visibility works already. If that's a no go, then we can try {menu_links} or automation, but at the moment it seems like extra work and complexity for not much of a real benefit (you still have to go to a UI if you want to override core/contrib decisions on a site specific basis, but we can't let people just do it in settings.php for a quick fix).

StatusFileSize
new2.11 KB
new2.43 KB
Failed: Failed to apply patch.
[ View ]

I like to revisit my earlier approach in this thread, which is to derive the whitelist automatically from the top level of the paths.

This is a zero configuration patch. Nothing needs to be configured, it derives the aliases automatically!

To demonstrate from a real site that uses the 6.12 patch attached ...

select substring_index(src, '/', 1) as path, count(*) as cnt from url_alias group by path order by cnt desc;
+------------------+-------+
| path             | cnt   |
+------------------+-------+
| user             | 98743 |
| node             | 49676 |
| taxonomy         | 11044 |
| forum            |    11 |
| blog             |     2 |
| rss.xml          |     1 |
| taxonomy_browser |     1 |
| tracker          |     1 |
| aggregator       |     1 |
+------------------+-------+

As you can see, the aliases are mainly for the above 4 paths. Anything else is not considered, such as comment/blah.

The savings in the number of queries is amazing.

For a node with 9 comments

Without patch:
262 queries

With patch:
212 queries

Savings: 50 queries

For a node with 165 comments

Without patch
2915 queries

With patch
2313 queries

Saving: 602 queries

So as you can see, we avoid a lot of queries that would have returned nothing anyway.

I ported the patch to today's HEAD, and did some cursory testing on it. I am also attaching the original 6.12 patch as well in case someone wants that for Drupal 6.

For some reason that version of the patch looks much simpler than the previous ones, nice! It looks like both postgres and sqlite have straight substr support too although it'd be good to verify that.

Smart and simple, Khalid!

1. Given that drupal_rebuild_whitelist is only called once, it probably doesn't have to be a function. Second, it introduces another public API which is bad in this case -- people should always call drupal_lookup_path('wipe'), not? Last, drupal_rebuild_whitelist() is too generic a name anyway. I recommend that we merge it into the function.

2. The part that starts with +    if (count($whitelist) == 0) { is a bit unnecessary. I'd think we should only rebuild the whitelist when we wipe the cache?

3. Let's document this a bit better in the code.

As a reference, on buytaert.net, I get:

mysql> SELECT SUBSTRING_INDEX(src, '/', 1) AS path FROM url_alias GROUP BY path;
+----------+
| path     |
+----------+
| node     |
| rss.xml  |
| taxonomy |
| term     |
+----------+

On mollom.com, I get:

mysql> SELECT SUBSTRING_INDEX(src, '/', 1) AS path FROM url_alias GROUP BY path;
+------+
| path |
+------+
| node |
+------+

On drupal.org, I get:

mysql> SELECT SUBSTRING_INDEX(src, '/', 1) AS path FROM url_alias GROUP BY path;
+-----------------+
| path            |
+-----------------+
| aggregator      |
| cvs-application |
| forum           |
| image           |
| node            |
| project         |
| rss.xml         |
| taxonomy        |
+-----------------+

Can't easily test on acquia.com right now, but based on buytaert.net, mollom.com and drupal.org, this seems like a good deal to me!

Status:Needs review» Needs work

Minor DBTNG thing :)

+  while ($row = db_fetch_object($result)) {

That should be foreach ($result as $row).

I really like the "Smart whitelist" concept. It's a good performance boost but, if I understand it correctly, still allows you to add an alias theoretically anywhere, correct? And the worst-case scenario is right back where we are now. I like.

Berdir is correct regarding the DBTNG fix.

I will, however, disagree with Dries about merging drupal_rebuild_whitelist(). It's an atomic operation, so should be its own function. That we can't think of anywhere else to call it right now doesn't mean someone won't think of one the day after Drupal 7 is released. :-) Having it a "public" API function doesn't really bother me; Drupal has hundreds of functions that you really should never call directly, but you theoretically could, which in the end is generally a good thing. If we really really really want to discourage it we can give it a _ prefix. It should probably be renamed to something that points out that it's path-related, though, say drupal_rebuild_alias_whitelist() or some such.

Status:Needs work» Needs review
StatusFileSize
new1.95 KB
Failed: Failed to apply patch.
[ View ]

@Dries

#1 The query in drupal_rebuild_whitelist() is expensive. It takes 410 ms on a large site. Therefore we want to avoid doing it too often. On a Drupal 6 site, we can't guarantee that a wipe will happen after applying the patch, so we build the white list once if it is empty.

On Drupal 7, this is not necessary since wipe will happen at some point and the code has the patch from the start.

Also removed the un-needed function and inlined the code in wipe.

So the attached patch takes care of that part.

#2 That was a carry over from D6. Taken care of now.

#3 Added more code comments.

@Berdir

Nice catch. Fixed.

@Crell

I don't mind one way or the other whether it is a function or not. I think a function is cleaner. but don't want that impacting whether it gets committed or not. Either way, We can still make it atomic using the locking framework in #251792: Implement a locking framework for long operations when it gets in.

@killes

You may want to apply the D6 patch in #171 on d.o right away. It is non-invasive, and is safe to roll back at any time. No schema changes or anything.

It will save many alias lookup queries (e.g. comment/*, user/*, think of pages with many comments). Emable devel before and after and check a few pages to see the number of queries before and after.

Status:Needs review» Needs work

<?php
+
+   
// For each alias in the database, get the top level (i.e. the portion before the first /).
+    // Using GROUP BY is faster than DISTINCT, at least for MyISAM.
+    $whitelist = array();
+   
$result = db_query("SELECT SUBSTRING_INDEX(src, '/', 1) AS path FROM {url_alias} GROUP BY path");
+    foreach (
$result as $row) {
+     
$whitelist[$row->path] = TRUE;
+    }
+   
variable_set('path_alias_whitelist', $whitelist);
   }
?>

SUBSTRING_INDEX() seems MySQL-specific. I think it would be good idea to support it, but that means that this patch needs work.

<?php
  
elseif ($count > 0 && $path != '') {
+   
// Retrieve the path alias whitelist
+    $whitelist = variable_get('path_alias_whitelist', array());
+   
// And derive the top level component of the path
+    $pos = strpos($path, '/');
+   
$top_level = ($pos) ? substr($path, 0, $pos) : $path;
?>

That's $top_level = strtok($path, '/');

Tagging

Status:Needs work» Needs review
StatusFileSize
new2.6 KB
Failed: 11508 passes, 5 fails, 10 exceptions
[ View ]

Thanks Damien.

I added your strtok suggestion.

For database portability, initially moved the logic of extracting the top level paths from the query itself to PHP code. It did work from a functionality point of view. However, when I measured this on a site that has 159,000 aliases, and it takes 3.55 seconds. Since wipe is called on every path alias update (delete, add, update), this is prohibitive.

So, I opted for a solution that has different SQL for different databases.

It is attached, with SQL for PostgreSQL in it. It needs testing.

We still need sqlite SQL too, but that is really not a priority since no one deploys a high performance site on that.

Please review.

Assigned:kbahey» Damien Tournoud

Ok, my turn. I'll work on the cross-database pieces of this.

Status:Needs review» Needs work

The last submitted patch failed testing.

By the way. This is the totally portable way that I tried.

<?php
function drupal_path_alias_whitelist_rebuild() {
 
// For each alias in the database, get the top level (i.e. the portion before the first /).
 
$whitelist = array();
 
$result = db_query('SELECT src FROM {url_alias} ORDER BY pid');
  while (
$row = db_fetch_object($result)) {
   
// Extract the part of the path before the first /
   
$top_level = strtok($row->src, '/');
    if (!
$whitelist[$top_level]) {
     
// Add it to the white list array
     
$whitelist[$top_level] = TRUE;
    }
  }
  return
$whitelist;
?>

It works 100% from a functionality point of view, but due to poor performance, it is not an option. On a site with 159,000 aliases, it takes between 3.12 and 3.55 seconds, and since we call wipe for every node add, that is very bad. Not an option. If I remove the ORDER BY, it goes down to 1.96 seconds. Still too much.

I think you need to build that variable only once, in an upgrade function and when saving, just add to it if necessary. No need to remove anything. Sub-optimal but given the usage, it will do. To avoid the nonstandard SQL, just SELECT src FROM url_alias LIMIT 1, let's say it'll be node/123. Now you run SELECT src FROM url_alias WHERE src NOT LIKE 'node/%' LIMIT 1. Pick another now run SELECT src FROM url_alias WHERE src NOT LIKE 'node/%' AND src NOT LIKE 'user/%' LIMIT 1. When the query returns empty you are done.

Assigned:kbahey» chx

Let me love this a bit.

Status:Needs work» Needs review
StatusFileSize
new3.18 KB
Failed: Failed to apply patch.
[ View ]

My queries run against an indexed field and though I run more than one, they are practically constant time so I have left in the rebuild-whitelist-constantly for now. Comes with tests.

Status:Needs review» Needs work
StatusFileSize
new3.04 KB
Failed: Failed to apply patch.
[ View ]

Based on a conversation with chx in IRC, this version adds comments to the rebuild function to explain how it all works.

Tested this against 159,000 aliases. The SUBSTRING_INDEX version takes 368 ms, which is half the time that the new patch takes (805 ms).

Setting to needs work, because chx will optimize it further to avoid doing the rebuild on every wipe, and rather check of the top level is in the variable_get.

Status:Needs work» Needs review
StatusFileSize
new5.07 KB
Failed: 11509 passes, 5 fails, 0 exceptions
[ View ]

Now we only rebuild on module enable/disable and on D6->D7 update. On saving, we add if necessary. Given that removing even a single path alias should be rare because of link rot, this will do.

Status:Needs review» Reviewed & tested by the community

I like it now ...

No overhead of rebuilding the whitelist, so node/add does not suffer.

This is as fast as it gets ...

So, I am setting it to RTBC ...

Webchiiiiiik !!!!

Status:Reviewed & tested by the community» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review
StatusFileSize
new5.06 KB
Failed: Failed to apply patch.
[ View ]

That was stupid... the variable name differed.

Status:Needs review» Reviewed & tested by the community

Reset to RTBC.

Status:Reviewed & tested by the community» Needs review

- We write "the look" instead of "the loop" (twice).

- path_set_alias() needs code comments.

- path_whitelist_rebuild() is a lot uglier now (yuck) but I assume that is the only solution to make it work across database? If so, let's mention in the code comments that we can't use substring_index().

(Update: removed some bad thinking)

StatusFileSize
new6.1 KB
Failed: Failed to apply patch.
[ View ]

FIxed the code comments Dries mentioned - didn't review the entire patch but the module enable/disable and only adding on path_set_alias is exactly what we need. I think we still want to avoid four second queries on module enable/disable so the ugliness is worth it - devel query log will be much prettier.

StatusFileSize
new9.06 KB
Failed: Failed to apply patch.
[ View ]

Here is a patch with a SUBSTR_INDEX() implementation for SQLite and PostgreSQL, and with an alternative lookup implementation.

Could someone benchmark that?

@catch, there are no four second queries. I tried to benchmark on NowPublic, with a 1.6M alias table and the queries were 0.00 seconds... I do not really know how Khalid measured almost a second...

@Damien, we can discuss whether we want to use this SQL function, storing the prefix like that is a waste of effort. This is a perfect usage for variables as you do need this array on every page and it's a very small array.

Status:Needs review» Reviewed & tested by the community

@Damien

I am not sure if we should spend all that effort and complexity with sqlite, while it will never see a large deployment that needs this optimization.

For Postgresql, I like that you are creating SUBSTRING_INDEX() function in it. I don't like the added storage in the table. This is the type of denormalization that causes headaches. I am sure DISTINCT is slower than GROUP BY in this case, from benchmarking last year. So I am -1 on the added field.

I can't benchmark what you did easily though to see if it is faster than 0.8 seconds. But since we are doing this only rarely and only the admin will experience it every now and then, performance is not that important for this part.

@Dries

path_whitelist_rebuild() is a lot uglier now (yuck) but I assume that is the only solution to make it work across database? If so, let's mention in the code comments that we can't use substring_index().

Well, I agree. The one we have in #196 is ugly, and it took a while for chx to make it go through my tiny brain on IRC. I added comments in the code to describe what it does.

It is the fastest of the portable solutions though. It takes 0.8 seconds (on two separate machines with 159K aliases) vs. the much easier to understand, prettier and portable code that I listed here #184, which takes 3.55 seconds.

Note that the MySQL specific SUBSTRING_INDEX is still the fastest at 0.4 seconds or less.

Setting #196 to RTBC.

Status:Reviewed & tested by the community» Needs work

I agree on not taking my column storage suggestion. But:

- now that we have SUBSTR_INDEX(), we should use that
- the whitelist is a cache, it has nothing to do in a variable
- but... we can replace the count check in drupal_lookup_alias() with a cache_get('path_whitelist')!

Good thing the column is out of the way.

Implementing the whitelist as a cache_set/get should be doable. I am cool with either implementation. Damien, Will you merge the two patches then?

The count check can use a cache_get or variable_get. That is a bonus of this whitelist approach, but it can wait till this one gets in core, then the other issue is trivial to close with a simple line of code.

Can someone benchmark the SUBSTR_INDEX on PostgreSQL? Even if it is slow, it should not stop the patch, since it is done rarely, and a PostgreSQL site will be slower anyways.

Status:Needs work» Needs review

OK I merged the patches. This one does the following:

# takes Damien's substr_index and uses that to create the whitelist

# keeps the whitelist in {cache_path} and resets it on drupal_lookup_path('wipe');

# uses the whitelist instead of $count

With 10,000 aliases, the substring_index query takes approx 33ms on my system,

HEAD with the path cache cleared (so all path lookups plus setting the system_path cache):
Executed 116 queries in 81.35 milliseconds.

Patched with the path cache cleared (so rebuilding both system_path and whitelist caches):
Executed 76 queries in 77.64 milliseconds.

So even without the cache primed there's a small gain over HEAD with 10,000 aliases. Would be good to see how we do with 100,000 but if it's even a little bit slower on cache misses I think we're fine.

Also ab on the front page, 10 nodes. Note we're already looking up queries in one go once caches are primed, so this is the difference of having a smaller cache_path record to fetch and unserialize and less paths in the big IN() query. Even with10 nodes there's a small measurable impact, on long comment listings we can expect a lot more.

HEAD:
6.43 [#reqs/sec]

Patch:
6.52 reqs/sec [#reqs/sec]

StatusFileSize
new7.54 KB
Failed: Failed to apply patch.
[ View ]

patch.

I'm surprised we only see a 1% performance improvement in #202 (i.e. from 6.43 to 6.52) -- or did I interpret that wrong?

StatusFileSize
new7.29 KB
Failed: Failed to apply patch.
[ View ]

Two small changes:

- move cache_set() into drupal_path_alias_whitelist_rebuild()
- fix doxygens for system_update_7024()

Dries - don't forget we already committed #456824: drupal_lookup_path() speedup - cache system paths per page.. That meant that on pages with primed caches, we lookup all paths in a single (sometimes very, very big) query - so the actual performance gain here is only going to show up on cache misses and pages with lots and lots of non-whitelisted paths.

The main advantages of this patch now that the system paths cache is in are:

1. When the system_paths cache is empty for a page, we only look up paths in the whitelist instead of all of them.

2. Much smaller cache entries for the system_paths cache because $map only contains whitelisted paths - since that cache is per page that equates to a lot of space saved in the cache tables.

3. The IN() query where we look up paths in one query becomes significantly smaller due to the whitelist, which makes it faster - see example below.

So on the front page, there's not a huge difference. However on the "looking at the drupal.org issue queue logged in as an admin" example it's quite significant.

HEAD:
truncate cache_path;
http://d7.7/node/20051 (300 comments)
Executed 1276 queries in 485.61 milliseconds.

Second request:
Executed 345 queries in 218.01 milliseconds.

The big IN() query still has to lookup 900-odd paths in one go though, this took 85.25ms to do, and this is why:

SELECT src, dst FROM url_alias WHERE src IN(:system_0, :system_1, :system_2, :system_3, :system_4, :system_5, :system_6, :system_7, :system_8, :system_9, :system_10, :system_11, :system_12, :system_13, :system_14, :system_15, :system_16, :system_17, :system_18, :system_19, :system_20, :system_21, :system_22, :system_23, :system_24, :system_25, :system_26, :system_27, :system_28, :system_29, :system_30, :system_31, :system_32, :system_33, :system_34, :system_35, :system_36, :system_37, :system_38, :system_39, :system_40, :system_41, :system_42, :system_43, :system_44, :system_45, :system_46, :system_47, :system_48, :system_49, :system_50, :system_51, :system_52, :system_53, :system_54, :system_55, :system_56, :system_57, :system_58, :system_59, :system_60, :system_61, :system_62, :system_63, :system_64, :system_65, :system_66, :system_67, :system_68, :system_69, :system_70, :system_71, :system_72, :system_73, :system_74, :system_75, :system_76, :system_77, :system_78, :system_79, :system_80, :system_81, :system_82, :system_83, :system_84, :system_85, :system_86, :system_87, :system_88, :system_89, :system_90, :system_91, :system_92, :system_93, :system_94, :system_95, :system_96, :system_97, :system_98, :system_99, :system_100, :system_101, :system_102, :system_103, :system_104, :system_105, :system_106, :system_107, :system_108, :system_109, :system_110, :system_111, :system_112, :system_113, :system_114, :system_115, :system_116, :system_117, :system_118, :system_119, :system_120, :system_121, :system_122, :system_123, :system_124, :system_125, :system_126, :system_127, :system_128, :system_129, :system_130, :system_131, :system_132, :system_133, :system_134, :system_135, :system_136, :system_137, :system_138, :system_139, :system_140, :system_141, :system_142, :system_143, :system_144, :system_145, :system_146, :system_147, :system_148, :system_149, :system_150, :system_151, :system_152, :system_153, :system_154, :system_155, :system_156, :system_157, :system_158, :system_159, :system_160, :system_161, :system_162, :system_163, :system_164, :system_165, :system_166, :system_167, :system_168, :system_169, :system_170, :system_171, :system_172, :system_173, :system_174, :system_175, :system_176, :system_177, :system_178, :system_179, :system_180, :system_181, :system_182, :system_183, :system_184, :system_185, :system_186, :system_187, :system_188, :system_189, :system_190, :system_191, :system_192, :system_193, :system_194, :system_195, :system_196, :system_197, :system_198, :system_199, :system_200, :system_201, :system_202, :system_203, :system_204, :system_205, :system_206, :system_207, :system_208, :system_209, :system_210, :system_211, :system_212, :system_213, :system_214, :system_215, :system_216, :system_217, :system_218, :system_219, :system_220, :system_221, :system_222, :system_223, :system_224, :system_225, :system_226, :system_227, :system_228, :system_229, :system_230, :system_231, :system_232, :system_233, :system_234, :system_235, :system_236, :system_237, :system_238, :system_239, :system_240, :system_241, :system_242, :system_243, :system_244, :system_245, :system_246, :system_247, :system_248, :system_249, :system_250, :system_251, :system_252, :system_253, :system_254, :system_255, :system_256, :system_257, :system_258, :system_259, :system_260, :system_261, :system_262, :system_263, :system_264, :system_265, :system_266, :system_267, :system_268, :system_269, :system_270, :system_271, :system_272, :system_273, :system_274, :system_275, :system_276, :system_277, :system_278, :system_279, :system_280, :system_281, :system_282, :system_283, :system_284, :system_285, :system_286, :system_287, :system_288, :system_289, :system_290, :system_291, :system_292, :system_293, :system_294, :system_295, :system_296, :system_297, :system_298, :system_299, :system_300, :system_301, :system_302, :system_303, :system_304, :system_305, :system_306, :system_307, :system_308, :system_309, :system_310, :system_311, :system_312, :system_313, :system_314, :system_315, :system_316, :system_317, :system_318, :system_319, :system_320, :system_321, :system_322, :system_323, :system_324, :system_325, :system_326, :system_327, :system_328, :system_329, :system_330, :system_331, :system_332, :system_333, :system_334, :system_335, :system_336, :system_337, :system_338, :system_339, :system_340, :system_341, :system_342, :system_343, :system_344, :system_345, :system_346, :system_347, :system_348, :system_349, :system_350, :system_351, :system_352, :system_353, :system_354, :system_355, :system_356, :system_357, :system_358, :system_359, :system_360, :system_361, :system_362, :system_363, :system_364, :system_365, :system_366, :system_367, :system_368, :system_369, :system_370, :system_371, :system_372, :system_373, :system_374, :system_375, :system_376, :system_377, :system_378, :system_379, :system_380, :system_381, :system_382, :system_383, :system_384, :system_385, :system_386, :system_387, :system_388, :system_389, :system_390, :system_391, :system_392, :system_393, :system_394, :system_395, :system_396, :system_397, :system_398, :system_399, :system_400, :system_401, :system_402, :system_403, :system_404, :system_405, :system_406, :system_407, :system_408, :system_409, :system_410, :system_411, :system_412, :system_413, :system_414, :system_415, :system_416, :system_417, :system_418, :system_419, :system_420, :system_421, :system_422, :system_423, :system_424, :system_425, :system_426, :system_427, :system_428, :system_429, :system_430, :system_431, :system_432, :system_433, :system_434, :system_435, :system_436, :system_437, :system_438, :system_439, :system_440, :system_441, :system_442, :system_443, :system_444, :system_445, :system_446, :system_447, :system_448, :system_449, :system_450, :system_451, :system_452, :system_453, :system_454, :system_455, :system_456, :system_457, :system_458, :system_459, :system_460, :system_461, :system_462, :system_463, :system_464, :system_465, :system_466, :system_467, :system_468, :system_469, :system_470, :system_471, :system_472, :system_473, :system_474, :system_475, :system_476, :system_477, :system_478, :system_479, :system_480, :system_481, :system_482, :system_483, :system_484, :system_485, :system_486, :system_487, :system_488, :system_489, :system_490, :system_491, :system_492, :system_493, :system_494, :system_495, :system_496, :system_497, :system_498, :system_499, :system_500, :system_501, :system_502, :system_503, :system_504, :system_505, :system_506, :system_507, :system_508, :system_509, :system_510, :system_511, :system_512, :system_513, :system_514, :system_515, :system_516, :system_517, :system_518, :system_519, :system_520, :system_521, :system_522, :system_523, :system_524, :system_525, :system_526, :system_527, :system_528, :system_529, :system_530, :system_531, :system_532, :system_533, :system_534, :system_535, :system_536, :system_537, :system_538, :system_539, :system_540, :system_541, :system_542, :system_543, :system_544, :system_545, :system_546, :system_547, :system_548, :system_549, :system_550, :system_551, :system_552, :system_553, :system_554, :system_555, :system_556, :system_557, :system_558, :system_559, :system_560, :system_561, :system_562, :system_563, :system_564, :system_565, :system_566, :system_567, :system_568, :system_569, :system_570, :system_571, :system_572, :system_573, :system_574, :system_575, :system_576, :system_577, :system_578, :system_579, :system_580, :system_581, :system_582, :system_583, :system_584, :system_585, :system_586, :system_587, :system_588, :system_589, :system_590, :system_591, :system_592, :system_593, :system_594, :system_595, :system_596, :system_597, :system_598, :system_599, :system_600, :system_601, :system_602, :system_603, :system_604, :system_605, :system_606, :system_607, :system_608, :system_609, :system_610, :system_611, :system_612, :system_613, :system_614, :system_615, :system_616, :system_617, :system_618, :system_619, :system_620, :system_621, :system_622, :system_623, :system_624, :system_625, :system_626, :system_627, :system_628, :system_629, :system_630, :system_631, :system_632, :system_633, :system_634, :system_635, :system_636, :system_637, :system_638, :system_639, :system_640, :system_641, :system_642, :system_643, :system_644, :system_645, :system_646, :system_647, :system_648, :system_649, :system_650, :system_651, :system_652, :system_653, :system_654, :system_655, :system_656, :system_657, :system_658, :system_659, :system_660, :system_661, :system_662, :system_663, :system_664, :system_665, :system_666, :system_667, :system_668, :system_669, :system_670, :system_671, :system_672, :system_673, :system_674, :system_675, :system_676, :system_677, :system_678, :system_679, :system_680, :system_681, :system_682, :system_683, :system_684, :system_685, :system_686, :system_687, :system_688, :system_689, :system_690, :system_691, :system_692, :system_693, :system_694, :system_695, :system_696, :system_697, :system_698, :system_699, :system_700, :system_701, :system_702, :system_703, :system_704, :system_705, :system_706, :system_707, :system_708, :system_709, :system_710, :system_711, :system_712, :system_713, :system_714, :system_715, :system_716, :system_717, :system_718, :system_719, :system_720, :system_721, :system_722, :system_723, :system_724, :system_725, :system_726, :system_727, :system_728, :system_729, :system_730, :system_731, :system_732, :system_733, :system_734, :system_735, :system_736, :system_737, :system_738, :system_739, :system_740, :system_741, :system_742, :system_743, :system_744, :system_745, :system_746, :system_747, :system_748, :system_749, :system_750, :system_751, :system_752, :system_753, :system_754, :system_755, :system_756, :system_757, :system_758, :system_759, :system_760, :system_761, :system_762, :system_763, :system_764, :system_765, :system_766, :system_767, :system_768, :system_769, :system_770, :system_771, :system_772, :system_773, :system_774, :system_775, :system_776, :system_777, :system_778, :system_779, :system_780, :system_781, :system_782, :system_783, :system_784, :system_785, :system_786, :system_787, :system_788, :system_789, :system_790, :system_791, :system_792, :system_793, :system_794, :system_795, :system_796, :system_797, :system_798, :system_799, :system_800, :system_801, :system_802, :system_803, :system_804, :system_805, :system_806, :system_807, :system_808, :system_809, :system_810, :system_811, :system_812, :system_813, :system_814, :system_815, :system_816, :system_817, :system_818, :system_819, :system_820, :system_821, :system_822, :system_823, :system_824, :system_825, :system_826, :system_827, :system_828, :system_829, :system_830, :system_831, :system_832, :system_833, :system_834, :system_835, :system_836, :system_837, :system_838, :system_839, :system_840, :system_841, :system_842, :system_843, :system_844, :system_845, :system_846, :system_847, :system_848, :system_849, :system_850, :system_851, :system_852, :system_853, :system_854, :system_855, :system_856, :system_857, :system_858, :system_859, :system_860, :system_861, :system_862, :system_863, :system_864, :system_865, :system_866, :system_867, :system_868, :system_869, :system_870, :system_871, :system_872, :system_873, :system_874, :system_875, :system_876, :system_877, :system_878, :system_879, :system_880, :system_881, :system_882, :system_883, :system_884, :system_885, :system_886, :system_887, :system_888, :system_889, :system_890, :system_891, :system_892, :system_893, :system_894, :system_895, :system_896, :system_897, :system_898, :system_899, :system_900, :system_901, :system_902, :system_903, :system_904, :system_905, :system_906, :system_907, :system_908, :system_909, :system_910, :system_911, :system_912, :system_913, :system_914, :system_915, :system_916, :system_917, :system_918, :system_919, :system_920, :system_921, :system_922) AND language IN(:language, '') ORDER BY language ASC

With the patch:
truncate cache_path;
first request:
Executed 351 queries in 181.15 milliseconds.
(30.74ms for the whitelist query)

second request:
Executed 344 queries in 131.65 milliseconds. (same number of queries but just over 80ms faster)

SELECT src, dst FROM url_alias WHERE src IN(:system_0, :system_1, :system_2, :system_3, :system_4) AND language IN(:language, '') ORDER BY language ASC

So on pages with lots of whitelisted paths (aliased nodes, users and taxonomy terms on a node listing) we get most benefit from the system path cache. On pages with few whitelisted paths but a lot of system paths, we get most benefit from this patch. The combination of the two gives us a really, really efficient path alias system.

Status:Needs review» Reviewed & tested by the community

#205 is good cleanup, marking that RTBC.

Status:Reviewed & tested by the community» Needs review
StatusFileSize
new7.22 KB
Failed: Failed to apply patch.
[ View ]

After discussion with chx and catch, here is the same as #205, but with the cache entry being a variable.

StatusFileSize
new7.24 KB
Failed: Failed to apply patch.
[ View ]

And here is the same, but with saner logic checking for an empty whitelist.

Status:Needs review» Reviewed & tested by the community

Just to clarify that saves running the rebuild query on every page on sites with no path aliases.

This is nicer than the cache_set() / cache_get() and actually saves us one query on every page as well now.

StatusFileSize
new2.13 KB

I am +1 on that. Raw performance is there as catch proved. Even if it is not, it has the benefit of lowering the number of queries for those on hosts that cap that. One step to countering the "Drupal is database heavy" complaint.

I am attaching an updated Drupal 6 patch, mainly for killes to apply on d.o at some point. The only change is that it now uses the new variable name as HEAD does, so that it gets overwritten on update.php, and not leave junk behind. Otherwise, it is MySQL only. d.o will benefit a lot from this patch by not issuing lots of queries of "user/xxx" for pages that have lots of comments (like this very one).

Version:7.x-dev» 6.x-dev
Status:Reviewed & tested by the community» Fixed

Committed to CVS HEAD. Thanks! Updating the version to Drupal 6 for Gabor to consider.

Status:Fixed» Reviewed & tested by the community

Status:Reviewed & tested by the community» Needs review

Erm. #211 needs review, I suggest we first deploy it on drupal.org, and consider it for Drupal 6 after that.

Status:Needs review» Needs work

Nice. This is sufficiently harmless I think it's worth a serious look for Drupal 6. By itself it's going to give us 5-15% improvements on every page (i.e. without the system path caching in HEAD).

Status:Needs work» Needs review

cross-posted - really this just needs Gabor to say whether he'll consider it at all. I just deployed the d6 patch on one of my sites by the way.

Awesome work everybody! And thanks for the D6 patch, I'll definitely apply it to my sites.

This is actually a kind of special issue for me, because this is the very first core patch I wrote. At the time of submitting it, I had been using Drupal for one, maybe two weeks. The work has been continued by (many) others, but at last, it has made its way into Drupal core :)

Finally, #106559 will be closed soon, almost 2.5 years after it was submitted! Rock on, Drupal :)

Heh, it's also got the first (completely wrong) benchmarks I ever ran.

Shouldn't the Postgres function be named "substring_index" instead of "substr_index"?

StatusFileSize
new1.18 KB
Failed: Failed to apply patch.
[ View ]

Assigned:chx» Unassigned
Category:task» bug
Priority:Normal» Critical
Status:Needs review» Reviewed & tested by the community
Issue tags:+Quick fix

above patch fixes the spelling mistake mentioned in #219, I've tested on PostgreSQL, works a treat.

Version:6.x-dev» 7.x-dev
Assigned:Unassigned» chx
Issue tags:-Quick fix

Assigned:chx» Unassigned
Issue tags:+Quick fix

and this applies to Drupal HEAD not 6.

cross posted but all my status changes were the same except for fixing the version.

D6 patch also needs strtok($path, '/')

Version:7.x-dev» 6.x-dev
Status:Reviewed & tested by the community» Patch (to be ported)

Committed #220. That makes this a "to be ported" patch for 6.x.

Yay thank god.

So do you think this will be available in Drupal 6.13? (Oh I hope so)

Status:Patch (to be ported)» Needs review
StatusFileSize
new4.09 KB

Rerolled backport for D6 from final D7 patch.

Hi smk-ka, is this for dev or 6.13 [2]?

It's based on CVS but also applies to 6.13.

Version:6.x-dev» 7.x-dev

Doesn't apply to HEAD, can you please reroll?

Status:Needs review» Needs work

The last submitted patch failed testing.

Version:7.x-dev» 6.x-dev
Status:Needs work» Needs review

This is already in HEAD, the patch in #229 is for the backport to 6.

StatusFileSize
new4.09 KB

Renumbered update.

Why don't you make a project page?

Why a project page? I thought this was meant to be going in the next D6 release.

Version:6.x-dev» 8.x-dev

Version:8.x-dev» 6.x-dev
Category:bug» task
Priority:Critical» Normal

hass, this is already in Drupal 7, the patch here is a backport to Drupal 6, so there's absolutely no reason to change version to Drupal 8 unless you hadn't bothered to read the issue at all.

Sorry I haven't remembered that Dries committed the patch. I thought it's still in discussion if or if not... :-(

Status:Needs review» Needs work

The patch needs to be re-rolled for 6.14. The update number needs to be bumped to 6054 to come after the ones in 6.14.

Also, would it be possible for someone to create a version of the patch which is compatible with #456824: drupal_lookup_path() speedup - cache system paths per page.?

subscribe

Just a thought on this - the path alias whitelist could be built from the url_alias table itself, couldn't it? I can envisage it being update/checked against whenever a new alias is added.

It already is, see patch/HEAD.

Subscribing:
I've just installed subpath_alias 6.x-1.1 where this issue is mentioned. It appears this patch would be of benifit to me. Does somebody know if it's ready to use on Drupal v 6.8 in conjunction with subpath_alias 6.x-1.1?

subscribinate

Could someone tell me if this is in 6.16? It is not clear to me. Thanks.

It's not. Very unlikely this patch will get into Drupal 6.

I think it's in pressflow, or on it's way into pressflow though.

Thanks catch.

Yes, it is in Pressflow (as a separate module that must be enabled). Not sure how different Pressflow's implementation is from the one in this issue, though.

Anyone got a version of this for 6.16?

@mrfelton: Are you unable to upgrade to Pressflow?

@EvanDonovan: Yeah, probably. But is that really the official recommendation to people these days - upgrade to pressflow..? Shouldn't this get fixed in core?

@mrfelton: I don't think it's considered a bug. Just a performance improvement. But I'm not an official representative of course - you'd have to ask Gabor.

hmm. bit of a blurry line there. So it works, but quite badly. I'd say it's a bug, I guess others may have a different opinion.

I don't see Gabor saying either way. IMO if it improves the performance of D6 without braking anything it should get in especially because Drupal it's heavy (sometimes to heavy) on the DB.

+1 on anything that significantly reduces the # of queries per page load without breaking anything

Subscribing.

Issue tags:+url, +url alias, +aliases, +db queries

+1 for this improvement too!

Did anyone succeed at applying #235 to 6.16?

Issue tags:-url, -url alias, -aliases, -db queries

Please don't over-use tags. That adds nothing but flotsam that makes them harder to use.

subscribe

Issue tags:+Ancient

tag

Version:6.x-dev» 7.x-dev
Status:Needs work» Closed (fixed)
Issue tags:-Investigate for use on drupal.org, -Ancient

This is already maintained in Pressflow. Apparently noone is willing to move forward with a patch, so won't fixing for D6 and reassigning to D7.

Agreed.

Version:7.x-dev» 6.19
Status:Closed (fixed)» Needs review

I had installed from #235 and all was well but... But some pages become wsod - for example: /admin, /admin/settings/xmlsitemap, and some others. Also I had WSOD after clearing cache by devel module..
So I remove it, though I like this idea of optimisation...

Version:6.19» 7.x-dev
Status:Needs review» Closed (fixed)

@asPagurus: The patch reported in comment #235 (or any other patches) will not committed be in Drupal 6.

Issue summary:View changes

fix formatting