Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks
John Morahan - October 12, 2008 - 20:03
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | path.module |
| Category: | task |
| Priority: | normal |
| Assigned: | Dave Reid |
| Status: | active |
| Issue tags: | API clean-up, custom_url_rewrite, DX (Developer Experience), Needs Documentation, Performance |
Description
It occurred to me that #298600: DBTNG: make module_implements work regardless of bootstrap phase could make this possible. Note that the test is currently broken even with that patch applied but I think I know why.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| hook_url_rewrite.patch | 3.65 KB | Idle | Unable to apply patch hook_url_rewrite_0.patch | View details | Re-test |

#1
Nice work. And a test too. I hope to test the patch soon.
#2
The code here looks good. Not sure how to test it, if at all.
#3
If you want to test this now, first apply the latest patch from #298600: DBTNG: make module_implements work regardless of bootstrap phase, then either make the change I suggested and then run the tests, or remove the hidden=TRUE from hook_url_rewrite.info and enable the module to test it manually.
#4
whoa, nice idea. Will try to test this if time permits.
#5
subscribe
#6
Seems to work fine. I also tried modifying $options, e.g. $options['base_url'], and that also works.
The test only covers hook_rewrite_url_rewrite_inbound(), not hook_rewrite_url_rewrite_outbound().
I think the
nameanddescriptiontexts from getInfo() are a bit too generic, considering that there are two other tests in the Path group. Also, the outbound rewriting doesn't happen in path.inc but in common.inc. I suggest something like “Path rewriting” and “Test hook_rewrite_url_rewrite_inbound and hook_rewrite_url_rewrite_outbound”.I noticed that hook_rewrite_url_rewrite_inbound() is invoked several times per request. Every time drupal_is_front_page() is called, it calls drupal_get_normal_path() that invokes the hook. This may be optimized, though this is outside the scope of this issue.
#7
Thanks for the review; the details of the test were the part I was least sure of here.
Regarding how often these are invoked - this is no different to how custom_url_rewrite_{in,out}bound in settings.php are invoked already. If it can be reduced, great - but not here.
I'll post a new patch soon with a better test.
#8
Updated the test. Also included the simpletest change in this version to make it easier to test. So now simply apply #298600 and this.
Naming the patch .txt to avoid automated testing: this should fail because t.d.o does not know to apply the other patch first. (I wonder how the first one passed?)
#9
In light of Gábor's comments on the dev list, benchmarks would also be welcomed.
#10
Haven't tested the patch, but looked it over. Question...
You use these two loops:
<?php
foreach (module_implements('url_rewrite_outbound') as $module) {
// do something
}
foreach (module_implements('url_rewrite_inbound') as $module) {
// do something else
}
?>
Don't we need to reverse the order of one of these loops? I think the inbound and outbound operations need to be performed in opposite order, although I'm not sure which order should be reversed (or whether it matters which).
For example consider the original url is 'node/42', and module X rewrites it to be X/node/42. Then module Y rewrites it to be Y/X/node/42. If in the other direction we pass Y/X/node/42 to module X, it won't make sense of it. So in the other direction module Y must act first, then X.
#11
I agree that this needs a bit more thought. The current API does not really support multiple hook implementations very well; $result is modified by reference based on $path, making multiple implementations of the hook problematic. The parameters and return value (if any) of the hook(s) need to be reconsidered.
#12
I get your point - but wouldn't this be unusual compared to how other hooks work? Currently both "inbound" hooks (hooks that act on form submission) and "outbound" hooks (hooks that change how things are displayed) are invoked in the same order.
#13
Sure it's unusual, but it's better to be unusual and useful, than normal and useless.
If you want to make things more "usual", then we'd make url creation more like form creation. Create a data structure with values similar to the parameters of url(), then call url_alter() or something like that to let modules change it. I think there are some valid reasons to take that sort of approach for outbound URLs.
#14
Here's a version with just one of the hooks called in reverse order.
#15
The last submitted patch failed testing.
#16
Fail is due to #303154-8: Document ini_set() calls in default.settings.php
#17
#18
See also #359965: Replace custom_url_rewrite_outbound() and language_url_rewrite() with new hook_url_alter().
#19
Marked #359965: Replace custom_url_rewrite_outbound() and language_url_rewrite() with new hook_url_alter() as a duplicate of this issue. I'll repost the benchmarking results I did with using drupal_alter() to replace custom_url_rewrite_outbound():
ab -c 5 -n 1000 http://mysql.drupalhead.local with 50 nodes, 5 comments per node, and the recent blog posts block enabled:
w/o patch w/o locale module enabled
Requests per second: 5.20 [#/sec] (mean)
Time per request: 961.254 [ms] (mean)
Time per request: 192.251 [ms] (mean, across all concurrent requests)
Transfer rate: 78.21 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 2.0 0 32
Processing: 539 959 123.4 949 2075
Waiting: 502 883 117.4 875 1916
Total: 539 959 123.5 949 2075
w/o patch w/ locale module enabled
Requests per second: 5.05 [#/sec] (mean)
Time per request: 991.042 [ms] (mean)
Time per request: 198.208 [ms] (mean, across all concurrent requests)
Transfer rate: 75.85 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 2.1 0 33
Processing: 570 989 159.4 967 2180
Waiting: 552 911 152.6 891 2063
Total: 570 989 159.4 967 2180
w/ patch w/o locale module enabled
Requests per second: 5.17 [#/sec] (mean)
Time per request: 967.278 [ms] (mean)
Time per request: 193.456 [ms] (mean, across all concurrent requests)
Transfer rate: 77.72 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.9 0 30
Processing: 525 966 120.7 954 2030
Waiting: 513 890 116.3 883 1899
Total: 525 966 120.8 954 2030
w/ patch w/ locale module enabled
Requests per second: 5.07 [#/sec] (mean)
Time per request: 986.513 [ms] (mean)
Time per request: 197.303 [ms] (mean, across all concurrent requests)
Transfer rate: 76.20 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.8 0 29
Processing: 426 985 122.0 974 2048
Waiting: 413 906 118.2 899 1908
Total: 426 985 122.0 974 2048
About a +/- 1 ms difference with the patch. Maximum benefit is with the patch and locale module enabled.
#20
Well done. My only beef is with the test. testHookUrlRewrite does not need to do http requests to assert that url() is working. just call url("user/12") and assert you get back member/12. http requests are the slowest possible way to test something. they are often useful, but not here IMO.
#21
Here's the progress I made so far for converting both into a hook_url_alter_inbound/outbound. I will need to incorporate some of the tests from this issue and write some simpler ones myself. I don't agree with reversing the module_implements. If modules have weights, those weights should always be respected.
#22
I will consider this "code needs work" until there is a test involving at least two modules which both implement the inbound and outbound hooks.
...or the order of one of the loops is reversed.
#23
I agree with Dave Cohen: we should respect the weights, yes, but in this case, respecting the weights means reversing the order of the one of the loops. Consider modifying an outgoing link node/4:
Module1: node/4 -> article/4
Module2: article/4 -> blog/article/4
Module2 in this case has the lower weight, which is being respected. On the incoming url blog/article/4, however, that means the module with the lower weight needs to be considered first to undo what it did:
Module2: blog/article/4 -> article/4
Module1: article/4 -> node/4
Consider what would have happened if it had been the other way around, assuming Module1 is using the regular expression /^article\//:
Module1: blog/article/4 -> blog/article/4
Module2: blog/article/4 -> article/4
The problem would only get worse with more modules implementing the hook. Each module needs to be presented with a certain url for outcoming links and change it to exactly one thing, which is all they should need to worry about changing back.
Hope all that makes sense.
#24
@cwgordon7: Ok that makes sense.
@DaveCohen: That's why I left the issue as "code needs work" :)
#25
subscribing
#26
Subscribing.
#27
Would be great to revive this and push it in.
#28
Marking this for my own little patch sprint this weekend.
#29
I agree with cwgordon7: multiple modules implementing this hook will result in a nightmare. I wonder if it wouldn't be better to stick with the "special function" concept instead of a hook to keep our sanity?
It probably can be made to work with multiple hook implementations, but it will always remain tricky or at the very least, rather difficult to grok.
#30
Here's an updated patch that has what I believe is the correct implementation. The test case it comes with quite clearly describes the expected behavior and makes the hook easier to understand. The hook is also well documented in system.api.php.
#31
also check out url_alter
#32
I have not tested the patch, but I've noticed this: language_url_rewrite() takes the $options array by reference, and was responsible for initializing $options['language'] if it was unset. The language is then used in a subsequent call to drupal_get_path_alias(). However, that happens BEFORE hook_url_alter_outbound() (including the successor to language_url_rewrite()) is called, so I expect the new code to initialize the language too late. Or, in other words: there seemed to be a reason why language_url_rewrite() was called earlier than custom_url_rewrite().
#33
The last submitted patch failed testing.
#34
Bot was broken
#35
The last submitted patch failed testing.
#36
restart bot
#37
The last submitted patch failed testing.
#38
I support this, and I think we should proceed with the hook-approach.
I agree with Wim and cwgordon7 that the problem gets complicated when many modules start rewriting URLs. These cases should be rare -- modules shouldn't rewrite URLs because they can. In some cases, having multiple modules rewrite hooks is a valid use case. So, yes, it can get messy but it is not our task to babysit.
#39
By the way, do we really have to have modules like hook_url_alter_1.module and hook_url_alter_2.module? Those are pretty terrible. We should try to find more elegant ways to these those things, IMO.
#40
Exactly. I didn't look at its code, nor did I use it in any way (yet) - but the description of http://drupal.org/project/purl sounds like a more stable, reliable, and elegant approach to solve this in a scalable way. Language negotiation in core could be treated as first consumer already. Most of the other consumers are modules that similarly try to provide and determine context. The exception to this are modules that really want to advance on URL aliases only (such as http://drupal.org/project/subpath_alias).
#41
Please, I am new here, but has anyone test this patch together with a site that runs both domain access and url alter modules at the same time.
Pls let me know thanks.
#42
Tagging and reminding myself to work on this sooner rather than later so we can get this into D7 before code freeze.
#43
subscribe
#44
@Dave Red - any chance you can revive this patch and the role API one? I think the blocker on role API has been cleared.
#45
I have been testing out the url_alter module (which uses very similar code to this patch) and have run into an issue with how the inbound hook works.
Currently if a module makes a change to a path and sets the result, the following module isn't aware of that change, so it can only act upon the original unmodified path. If the second module changes the path, then the first modules change is completely lost. Sometimes this may work well, but in cases where you want to act upon the changes of another module this causes problems.
For my use case the following adjustment works:
Original:
foreach (array_reverse(module_implements('url_alter_inbound')) as $module) {$function = $module . '_url_alter_inbound';
$function($result, $path, $path_language);
}
Changed:
foreach (array_reverse(module_implements('url_alter_inbound')) as $module) {$function = $module . '_url_alter_inbound';
$function($result, $path, $path_language);
$path = !empty($result) ? $result : $path;
}
This carries the $result forward to the next module, which in my use case (using subdomain) solved the problem.
#46
you mean chaining the url rewrites, right?
that makes a lot of sense
would be helpful the other way around? don't think so
#47
yes thats right.
For my situation I didn't need to change the outbound links too, but it would probably make sense to keep it consistent.
Chaining both inbound & outbound would allow for the situation mentioned in #23 to work as expected.
One other thought I had on the matter is it might be worth keeping track of the original unaltered path somewhere as well as the chained one.
#48
Intending to get back and get this finished today if possible. Would this be considered for the code slush phase since we're changing the APIs of an existing feature?
#49
I don't think this can happen anymore without significant performance implications now that the registry is out.
#50
I would like to do actual performance tests to back that up on a normal site with and without the locale module enabled. We've shown that there are several use cases for this feature already with Url alter. Asking users to add a custom function to their settings.php in order to make a module work is just sad.
#51
@cwgordon7: hook_file_url_alter() is also a hook and was recommended/requested by moshe et al. For consistency, it would make sense to also use this approach for these functions. I *think* Dries/webchick would let this one in, because it affects so few people.
I'm too tired to think about the performance consequences at the moment.
#52
.
#53
Alrighty, I'm going to focus on re-rolling this because this custom_url_rewrite is a broken API that needs to be fixed in D7.
#54
#55
Moving path patches to path.module so they're easier to track.
#56
So crazy thought, this could and *should* also repalce hook_term_path as well.
#57
Initial stab at replacing hook_term_path and hand-tested it with forum module and it worked pretty darn good. Finally figured out that I needed to add url_alter_inbound to the list in bootstrap_hooks() in order to get it to fire since drupal_get_normal_path is called before all modules are loaded *duh!*. I didn't run -N to include the tests because I know it doesn't work yet and there are some things we need to decide how exactly this should work.
Currenty we run the url_outbound_alter() before we lookup path aliases in url(). So that means for any forums that want to have path aliases, they need to create the path alias on 'forum/tid' instead of 'taxonomy/term/tid'. That doesn't seem ideal, but maybe it makes sense.
#58
I think it is fine to expect aliases to happen on the path that admins are seeing (forum/n). They have no clue that this path was taxonomy/term/n for a brief moment. Perhaps we should write an update function for these aliases?
#59
The problem that I'm coming into is how we should handle these two scenarios when forum.module is enabled and rewriting forum paths using url():
Case 1: We have an alias for source path 'forum/3' -> 'f3'.
url: taxonomy/term/3
forum_url_alter_outbound: forum/3
drupal_get_path_alias: f3
result of url: f3
drupal_get_normal_path: f3
drupal_lookup_path: forum/3
forum_url_inbound_alter: taxonomy/term/3
reusult of drupal_get_normal_path: taxonomy/term/3
----------------------
Case 2: We have an alias for source path 'taxonomy/term/3' -> 'f3'.
url: taxonomy/term/3
forum_url_outbound_alter: forum/3
drupal_get_path_alias: forum/3
result of url(): forum/3
drupal_get_normal_path: f3
drupal_lookup_path: taxonomy/term/3
forum_url_inbound_alter: taxonomy/term/3
reusult of drupal_get_normal_path: taxonomy/term/3
---------------------
So one way around this is to make sure when url aliases are added is to double check that they are run through drupal_get_normal_path(). So the case 1 the source url for the alias would be 'taxonomy/term/3' instead of 'forum/3'.
But that adds another situation for looking up the path alias when rewriting in url() like in case 2. Becuase we looking the alias before we rewrite paths, we should probably lookup the path alias based on the 'original' path given to url().
#60
Re-rolled for head.
#61
The last submitted patch failed testing.
#62
Testing to see if I went crazy and this was all caused by a module_list(TRUE, TRUE), which is WTF because the same thing is used to call hook_boot() before hook_url_inbound_alter().
#63
In IRC, Dave and I determined the source of the badness is that calling module_list(TRUE, TRUE) to refresh the module list from inside a reduced bootstrap level (we're still just at DRUPAL_BOOTSTRAP_PATH when drupal_path_initialize() runs) results in sheer badness. entity_get_info() is confused, we have an empty list of modules (or something), and we end up caching an empty array in {cache}.cid == 'entity_info', resulting in a completely broken site until you manually clear that cache entry. :(
So:
A) WTF, module_list()? Why can't you be refreshed twice during the bootstrap process?
B) WTF, entity_get_info()? Why are you willing to cache an empty array into the DB? ;)
For now, this patch is going to work-around it by calling module_list(FALSE, TRUE) to only get bootstrap modules, but *not* refresh the list.
#64
BTW, forgot to mention: big +1 on the direction of this patch. ;) Making these things hooks will clean up a bunch of stupid in core and remove a lot of WTF. I guess it's implicit that I support this patch if I spent time to help debug it, but I wanted to make it explicit in case that helps anyone decide to commit. ;)
#65
The last submitted patch failed testing.
#66
If we're going to do this, we should really should run this during a full bootstrap. And looking into _drupal_bootstrap, I don't see any reason why we *shouldn't* just load all modules first, then run drupal_path_initialize(). It's not like we're going to somehow stop execution in DRUPAL_BOOTSTRAP_PATH. It's always going to continue on to DRUPAL_BOOTSTRAP_FULL.
#67
Revised patch moving drupal_path_initialize to DRUPAL_BOOTSTRAP_FULL and including the missing path alter test modules.
#68
The last submitted patch failed testing.
#69
Ok this one is almost ready. Still trying to figure out what's going wrong with the wonky forum tests...
#70
Adding official D7 api cleanup tag
#71
The last submitted patch failed testing.
#72
Forum.module is very broken. Very upset and frustrated.
#73
The last submitted patch failed testing.
#74
Think I've got it finally. Should not have had forum_url_inbound_alter to turn forum/x into taxonomy/term/x because they needed to use the 'forum/x' path for display.
#75
Try again without contrib patch noise.
#76
Un-did changes to forum paths.
#77
The last submitted patch failed testing.
#78
Now with 100% random 1 failure from the bot. Also 100% less whitespace changes from my IDE. Now working on profiling.
#79
#80
Removed the crappy, crappy node title rewriting examples, because that was just silly. Node titles are not unique. Now onto profiling.
#81
Basic benchmarks since devel_generate is massively broken with all the recent patches:
Before without custom_url_rewrite_outbound() in settings.php:
Concurrency Level: 1
Time taken for tests: 177.262 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 6908500 bytes
HTML transferred: 6664500 bytes
Requests per second: 2.82 [#/sec] (mean)
Time per request: 354.523 [ms] (mean)
Time per request: 354.523 [ms] (mean, across all concurrent requests)
Transfer rate: 38.06 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 285 354 55.6 349 484
Waiting: 269 336 52.9 331 467
Total: 285 354 55.6 349 484
Percentage of the requests served within a certain time (ms)
50% 349
66% 389
75% 412
80% 419
90% 431
95% 435
98% 439
99% 442
100% 484 (longest request)
----------------------------
Before with custom_url_rewrite_outbound in settings.php:
<?phpfunction custom_url_rewrite_outbound(&$path, $options, $original_path) {
if ($path == 'blah') {
$path = 'foobar';
}
}
?>
Concurrency Level: 1
Time taken for tests: 180.408 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 6908500 bytes
HTML transferred: 6664500 bytes
Requests per second: 2.77 [#/sec] (mean)
Time per request: 360.815 [ms] (mean)
Time per request: 360.815 [ms] (mean, across all concurrent requests)
Transfer rate: 37.40 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 285 361 51.7 368 455
Waiting: 270 342 49.2 349 432
Total: 285 361 51.7 368 455
Percentage of the requests served within a certain time (ms)
50% 368
66% 391
75% 407
80% 420
90% 428
95% 436
98% 441
99% 447
100% 455 (longest request)
----------------------------
POST_PATCH (forum_url_alter_outbound() is implemented):
Concurrency Level: 1
Time taken for tests: 178.556 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 6908500 bytes
HTML transferred: 6664500 bytes
Requests per second: 2.80 [#/sec] (mean)
Time per request: 357.113 [ms] (mean)
Time per request: 357.113 [ms] (mean, across all concurrent requests)
Transfer rate: 37.78 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 286 357 54.0 353 475
Waiting: 270 338 51.5 333 443
Total: 286 357 54.0 353 475
Percentage of the requests served within a certain time (ms)
50% 353
66% 389
75% 408
80% 421
90% 433
95% 436
98% 441
99% 449
100% 475 (longest request)
#82
Generally, I agree that initializing the path can wait until the full bootstrap. However, it turns out that the new authorize.php needs $_GET['q'] initialized (so that BatchAPI works) but definitely does not want a full bootstrap, or it becomes really nasty to try to upgrade existing modules while they're being loaded. :(
Luckily, we don't actually care about a true Drupal Path(tm) inside authorize.php. The only times $_GET['q'] matter are when it's already set. It just turns out there are a lot of places in core that assume it's always set. So, we can just initialize it ourselves inside authorize.php instead of calling drupal_initialize_path(). This is actually an improvement for authorize.php, regardless of this patch (although we need the change if this patch is committed, so I'm just including it here).
We could potentially reconsider if we want to always assume $_GET['q'] is defined in our code base, but that seems like a totally separate issue. Basically, that'd just be a series of PHP notice bug fixes in a followup issue, if we care.
#83
Note to webchick:
cvs add modules/simpletest/tests/url_alter_test.infocvs add modules/simpletest/tests/url_alter_test.install
cvs add modules/simpletest/tests/url_alter_test.module
#84
The last submitted patch failed testing.
#85
Sorry, forgot hunks from statistics.module.
#86
Oh, and Dave pointed out !isset() would probably be more appropriate in authorize.php than empty() for this...
#87
I think between dww, moshe and myself, we have pretty much got all of this covered.
Today's patch is brought to you by the phrase "Yay for removing hacks!"
Executive summary:
<?phpfunction forum_url_outbound_alter(&$path, &$options) {
if (preg_match('!^taxonomy/term/(\d+)!', $path, $matches)) {
$term = taxonomy_term_load($matches[1]);
if ($term && $term->vocabulary_machine_name == 'forums') {
$path = 'forum/' . $matches[1];
}
}
}
?>
And reminder again for committing this patch that the following CVS commands must be run after patching, but before committing:
cvs add modules/simpletest/tests/url_alter_test.infocvs add modules/simpletest/tests/url_alter_test.install
cvs add modules/simpletest/tests/url_alter_test.module
#88
7. Comes with a boatload of tests. Nice work, Dave ... Pleas wait for green before commit.
#89
I'd like to know the number of calls to url() on the page which was benchmarked - this number can vary wildly depending on what sort of page is viewed. ideally in the form of webgrind screenshots and/or microbenchmarks similar to those done for #523284: Optimize url() so we can see the difference internal to url() and drupal_lookup_path().
#90
Ok I used a node page with lots of comments. Kcachegrind shows url() being called 207 times. I can attach the screenshot if needed. Run with ab -c 1 -n 500 three times for each condition and took the middle run:
BEFORE PATCH (without custom_url_rewrite_outbound() in settings.php):
Requests per second: 2.91 [#/sec] (mean)
Time per request: 343.446 [ms] (mean)
Time per request: 343.446 [ms] (mean, across all concurrent requests)
Transfer rate: 106.78 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 326 343 15.5 339 420
Waiting: 312 329 14.9 324 406
Total: 326 343 15.5 339 420
BEFORE PATCH (wtih custom_url_rewrite_outbound() in settings.php):
Requests per second: 2.89 [#/sec] (mean)
Time per request: 345.625 [ms] (mean)
Time per request: 345.625 [ms] (mean, across all concurrent requests)
Transfer rate: 106.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 328 346 15.1 342 426
Waiting: 314 331 14.6 328 407
Total: 328 346 15.1 342 426
AFTER PATCH:
Requests per second: 2.90 [#/sec] (mean)
Time per request: 345.387 [ms] (mean)
Time per request: 345.387 [ms] (mean, across all concurrent requests)
Transfer rate: 106.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 329 345 16.6 340 456
Waiting: 315 331 15.8 326 441
Total: 329 345 16.6 340 456
#91
OK that looks pretty good, nothing disastrous anyway.
#92
Fixed a very minor spelling error in the test assertion. No need to reset status.
#93
Removing 'needs tests' tag.
#94
This really is a nice patch.
I'm not sure in which horrible hook-weighting disaster we'll run into with this, but I guess there's no other way to just find out. Especially, because it makes the installation of a series of contributed modules a lot easier.
So, +1 for getting this last-minute "clean-up" in. Doesn't really break anything, especially not in core - but rather improves the situation for contributed modules. Aside from that, it was already RTBC around 10/15.
#95
oops, cross-posted.
#96
This *must* get in if we want to provide a decent developer experience, i.e. if we want to provide a consistent API. The API to alter file URLs (hook_file_url_alter()) specifically uses a hook instead of a "special/magical" function because this was going in as well, because URL rewriting was going to use hooks as well.
Performance impact is extremely minor, if any. This patch has had extensive reviews and solid tests. So I definitely agree that it's time to commit this one. Excellent work, Dave Reid et al. :)
#97
+1 to RTBC
This hooks could be good as solution for #308263: Allow privileged users to bypass the validation of menu items
to make outbound URL protected with some #anchor for example and inbound to check #anchor for existence then pass to original menu item - ability to protect URL like #606608: Use proper menu router paths for comment/* from csrf and leave them without confirmation
#98
The last submitted patch failed testing.
#99
Re-rolled only.
#100
The last submitted patch failed testing.
#101
Arg. Reference parameter introduced in path_save(). Tested and passes tests locally.
#102
+++ includes/common.inc 20 Oct 2009 17:31:24 -0000@@ -2387,10 +2388,12 @@ function url($path = NULL, array $option
+ // Preserve the original path before altering or aliasing.
+ $original_path = $path;
+
+ // Allow other modules to alter the outbound URL and options.
+ drupal_alter('url_outbound', $path, $options);
+++ modules/system/system.api.php 20 Oct 2009 17:31:26 -0000
@@ -2652,5 +2652,59 @@ function hook_page_delivery_callback_alt
+ * @param $original_path
+ * The original path, before being checked for path aliases or altered by the
+ * modules.
...
+function hook_url_inbound_alter(&$path, $original_path, $path_language) {
...
+function hook_url_outbound_alter(&$path, &$options) {
I'd like to see a third $original_path argument for hook_url_outbound_alter() as well.
This review is powered by Dreditor.
#103
Revised patch addressing sun's concerns.
#104
webchick wanted some benchmarks, so I enabled forum and statistics, put 15 comments on a forum topic, enabled the 'administer comments' permission for anonymous user (lots of links on the page), and added French language, so that all paths had to be re-written with the /fr/ prefix.
ab -c 1 -n 500 http://mysql.drupal7dev.local/fr/node/1Before patch:
Requests per second: 2.08 [#/sec] (mean)
Time per request: 480.172 [ms] (mean)
Time per request: 480.172 [ms] (mean, across all concurrent requests)
Transfer rate: 64.46 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 459 480 16.8 476 626
Waiting: 442 463 16.6 459 609
Total: 459 480 16.8 476 626
Requests per second: 2.08 [#/sec] (mean)
Time per request: 481.345 [ms] (mean)
Time per request: 481.345 [ms] (mean, across all concurrent requests)
Transfer rate: 64.31 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 457 481 20.9 476 700
Waiting: 440 464 20.5 459 678
Total: 457 481 20.9 476 700
After patch:
Requests per second: 2.08 [#/sec] (mean)
Time per request: 481.528 [ms] (mean)
Time per request: 481.528 [ms] (mean, across all concurrent requests)
Transfer rate: 64.29 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 462 481 16.8 479 686
Waiting: 446 465 16.2 462 647
Total: 462 481 16.8 479 686
Requests per second: 2.07 [#/sec] (mean)
Time per request: 482.825 [ms] (mean)
Time per request: 482.825 [ms] (mean, across all concurrent requests)
Transfer rate: 64.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 464 483 14.7 479 561
Waiting: 448 466 14.5 462 545
Total: 464 483 14.7 479 561
About a +1-2 ms difference with the patch. At this point I know it's a teeny-tiny increase but at least we're doing things the right way now.
#105
Ok, cool. My main concern with this patch was removing DRUPAL_BOOTSTRAP_PATH in favour of a full bootstrap. It looks like in the worst-case scenario where you are hitting locale, forum, and statistics-related changes in this patch, you're getting a negligible slow-down.
I also had some questions about that weird $_GET['q'] stuff in authorize.php. But basically, it's only required because authenticate.php is trying to do the least amount of bootstrapping possible. So the only other places that might need to use this are like file_layer_lite.php or something.
I also wondered if we'd hit any problems now that the language URL re-writing is part of locale.module (an optional core module) rather than includes/language.inc. I guess worst-case, a module implementing locale-like features will just need to copy/paste this code.
Otherwise, I agree this is a really solid clean-up, reviewed by lots of people big into both performance and i18n. Although I weep at no longer being one of like 30 people who know about custom_url_rewrite functions (hee hee).
And technically, this did make code freeze, albeit at the last possible second. :P~ Since Dries said he wanted to see this earlier, I feel comfortable committing this, although it is an API change, without checking with him first. (Dries, if that was the wrong call, sorry!)
Therefore, committed to HEAD. Please ensure this API addition (and the removal of hook_term_path()/language_rewrite_url()) is documented in the upgrade guide.
#106
Working on the update handbook now, but D'OH! Noticed I forgot to remove custom_url_rewrite_crap from system.api.php. On the bright side, didn't even need to undocument hook_term_path since it wasn't even documented to begin with. Hah!
#107
D'oh! Good catch. Committed that to HEAD too.
Back to needs work for docs.
#108
YAAAAAAAAAAAAAAAAY!
Updating docs completed ready for review at http://drupal.org/update/modules/6/7#hook_url_outbound_alter and http://drupal.org/update/modules/6/7#hook_url_inbound_alter
#109
This must be rolled back. The original solution worked just fine , we (NowPublic) have a module where custom_url_rewrite sits and it works (i presume it needs to be a bootstrap module) and doing drupal_alter just for the sake of it IS slower than a direct function call and the benchmarks are wrong because a) they were not made on a heavily linked page , node/1 on a def install is a joke b) there was no implementation. It's highly arguable that you want more than one implementation, this is a site-specific performance optiomization, having a hook is pointless. I will make a real page with 300 links and then add an implementation and then we shall see.
#110
<?phpfunction linktest_node_view() {
for ($i = 0; $i < 500; $i++) l($i, "node/$i");
}
function linktest_node_boot() {}
function linktest_url_outbound_alter() {}
function custom_url_rewrite_outbound() {}
?>
Rolled back aka before patch:
<code>
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: Apache/2.2.12
Server Hostname: localhost
Server Port: 80
Document Path: /drupal/node/3
Document Length: 6173 bytes
Concurrency Level: 1
Time taken for tests: 17.414 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 664200 bytes
HTML transferred: 617300 bytes
Requests per second: 5.74 [#/sec] (mean)
Time per request: 174.136 [ms] (mean)
Time per request: 174.136 [ms] (mean, across all concurrent requests)
Transfer rate: 37.25 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 161 174 16.0 170 293
Waiting: 161 174 16.0 170 292
Total: 161 174 16.0 170 293
Percentage of the requests served within a certain time (ms)
50% 170
66% 176
75% 179
80% 181
90% 188
95% 203
98% 205
99% 293
100% 293 (longest request)
Current HEAD:
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: Apache/2.2.12
Server Hostname: localhost
Server Port: 80
Document Path: /drupal/node/3
Document Length: 6173 bytes
Concurrency Level: 1
Time taken for tests: 18.902 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 664200 bytes
HTML transferred: 617300 bytes
Requests per second: 5.29 [#/sec] (mean)
Time per request: 189.024 [ms] (mean)
Time per request: 189.024 [ms] (mean, across all concurrent requests)
Transfer rate: 34.31 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 3
Processing: 172 189 12.2 187 240
Waiting: 172 189 12.2 186 240
Total: 172 189 12.1 187 240
Percentage of the requests served within a certain time (ms)
50% 187
66% 192
75% 195
80% 197
90% 203
95% 219
98% 222
99% 240
100% 240 (longest request)
#111
Sry for
function linktest_node_boot() {}i made the module bootstrap by hand anywyas -- not that it matters for outbound.#112
Oh well. I bow before the developer experience and webchick says most links will be cached anyways. At worst, I will run a hacked Drupal 7 -- that was always case, now won't be any other way.
#113
Er. The question I asked on IRC was whether the benchmarks you ran painted a realistic picture of a Drupal site, since calls to l() within hook_node_view would become part of the content array which afaik is cached. If this is still a realistic scenario regardless of that fact, then fine. I certainly don't want to slow Drupal 7 down any more than it actually is.
#114
Have my doubts that it was a realistic test, but I do understand the concern. Also see #523284: Optimize url().
#115
And to properly compare old vs new, make sure we are using worst-case scenario. Locale.module enabled and altering language prefixes (a non-default language path), and also using taxonomy_term_path() with forum.module.
#116
Also consider there *are* other modules out there that want/need to alter outgoing or incoming paths and using custom_url_rewrite inside a module is a complete WTF. See the list at http://drupal.org/project/url_alter
#117
Yes, there is that -- the test did not include the existing pieces indeed. There is more to be done here but now is not the time for me. I laid the groundwork, I let others continue.
#118
@chx and others concerned about performance of url(): please weigh in on #619566: Don't invoke hook_url_outbound_alter() if there's nothing to invoke (comment 15 on down).
#119
linno requested that failed test be re-tested.
#120
Please don't do that. I hate this retest functionality in PIFR2.
#121
The benchmarks at #104 look completely wrong to me. Previously I could get a cached 304 response (as recorded in the timer column of {accesslog} by statistics.module) in about 9 or 10 ms [on a well-managed shared server]. Now it is 350- 400 ms. (An uncached response is around 450 - 700ms.) This does not surprise me - a full bootstrap is patently an order of magnitude heavier than booting up to the old "DRUPAL_BOOTSTRAP_PATH" level.
What this means is that at the moment the page cache doesn't do a lot for sites running statistics.module. I suspect that #104 had the page cache turned off, so in other words you get a 1 - 2 ms increase in page generation time (about 0.2% - 0.4%) for non-cached responses that are already doing a full bootstrap.