I need a URL rewrite hook sometimes, here is an implementation, with ample comments. I looked into arg() to see whether it needs a reset parameter, and it does not, but it contained a minor bug, which is also fixed.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

FileSize
1.73 KB

Performance boost.

Jose Reyero’s picture

+1

This is really needed by i18n module, and maybe other modules could use it to add some extra information in the query string.

killes@www.drop.org’s picture

this seems just an evil plot to cater for the evil hack that i18n.module is. --

chx’s picture

killes, while I know you do not like the current implementation of i18n, that module exists and even works. If you do not like the current approach, you can always write a better one. And also, please note we try to introduce non-i18n specific solutions this time.

killes@www.drop.org’s picture

"it works" has never been a consideration in Drupal development, don't let us start to use it. I don't need an i18n module. Had I had the urge to write one, it would have been based on walkah's excellent start to be seen here:

http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/walkah/transl...

I don't see anything non-i18n specific here. let us not pollute low level functions such as url() with custom hacks.

Gábor Hojtsy’s picture

Why extend url() and why not the alias retrieval functions? BTW at that time it was decided that a single function should be used (conf_url_rewrite()) and not a hook, because of performance reasons. If you provide the functionality this was, then conf_url_rewrite() gets confusing, and even meaningless if you do it in the alias retrieval functions themselfs.

chx’s picture

FileSize
1.95 KB

Goba is so right, that I was already working on it.

Jose Reyero’s picture

This second version is more limited, and wont be that useful, because in case an alias exists, the rewritting is skipped.

IMHO, the aim for this patch should be allowing modules to add information in the path or in the query string, for *all* the outgoing urls. I was thinking of i18n module and language information, but this can be useful for other modules too.

So please, let's stick to the previous version of the patch

Gábor Hojtsy’s picture

Jose, so you advocate only mangling URLs on output? Really?

Jose Reyero’s picture

Goba, yes, my idea was to provide a hook, so any module can mangle the path/query string. What will happen with that incoming paths is a different story, I mean any module can access the query string any time later... And I'd like this to be separated from path aliasing, which can be done in a different step...

Ideally, of course, this would be done for outgoing and incoming urls, but there's a number of issues, like incoming path processing being done before module loading, in common.inc... Another issue, IMHO, is that current url handling is a bit messy and could be better streamlined.

But the first patch was simple enough and quite straight forward, at least it works for *all* outgoing URLs, and allows to rewrite query strings, while the second one, I see much more limited use for it....

Gábor Hojtsy’s picture

Jose, AFAIS if you mangle with the outgoing URLs (eg. put 'en/' before all URLs), and there will be nothing in the incoming processing to strip that, this will directly lead to 'page not found' replies. But I probably fail to see the exact use case you would like to propose this for.

Jose Reyero’s picture

Goba, yes, basically you are right... But these are my use cases:

For i18n
----------
- Add language prefix at the beginning of the outgoing urls
- Remove it in the module init hook.

Yes, I know... this is probably some of what killes calls evil hacks :-(

But with this new hook, I could also think of adding language in the query string. And maybe some other modules could want to add some info in the query string. There are a number of reasons for language to be in the URL -search engines, links...- and also the ability to create path aliases with or without language...

Hackish? Yes. But right now we are facing the following dilemma: no specific calls for non core modules -which I dont disagree with- but then any implementation of this, for not to require patching, will have to be in a module, and then incoming paths are first processed before module loading, and on top of that we have the cache system.... so its quite a complex thing....

I'd be happy with any idea to implement this more cleanly, or maybe we should aim higher, like reworking the whole init thing and path pre-processing... What we are trying for the moment is to introduce only some general use hooks, like this small patch... otherwise it is a too big all-or-nothing question to get this working in Drupal....

Thanks for your comments and I'd appreciate any suggestion.

Jose Reyero’s picture

FileSize
713 bytes

Updated simplified patch.

As other patches are already in, we only need this to run i18n module with Drupal 4.7 without patching!!

dfg’s picture

I would be more useful to have a similar hook in drupal_get_path_alias() and drupal_get_internal_path().

fago’s picture

patch applies (with offset) and doesn't break anything, as i could see.

a lot of people are interested in i18n, so please include this last one.
further the possibilty inject a query-string might be useful in other cases.

+1

Jose Reyero’s picture

Status: Needs review » Reviewed & tested by the community
mgifford’s picture

This is a worth while hook to add to the core code. It will ease the implementation of more multi-lingual sites in drupal and provide better support for a broader community of users/developers.

+1

Souvent22’s picture

+1. This patch is needed. We don't live in a vacum ya know, there are more languages than english. :). Hope this gets in, I just made a site for someone in Italy, and this could help out when making modules.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

If this patch gets committed, there will be a third mechanism to rewrite URLs. Also, it is a well-known fact that url() and l() are a performance bottleneck. I think we need to take a step back, see how we can overcome the limitations of the current system and come to a simple yet fast URL rewrite mechanism. There ought to be a better way.

chx’s picture

FileSize
1.87 KB

This version implements a hook in drupal_get_path_alias and in drupal_get_normal_path . Performance hit should be negligeble in most cases: a foreach on an empty array.

chx’s picture

FileSize
1.89 KB
Dries’s picture

Moving the code around doesn't change a thing; it's still a third/new mechanism to rewrite URLs. I'll take a closer look at this as time permits.

chx’s picture

FileSize
1.89 KB

No, it's a second mechanism only as it removes conf_url_rewrite -- conf_url_rewrite routines can be moved to a module and thus shared as a module. And you can have more than one rewrite, this way.

Jose Reyero’s picture

FileSize
1.61 KB

+1 for chx (plus alternative patch)

I like chx's patch, and I think also that getting rid of 'conf_url_rewrite' and replacing it with a hook is a good thing.

However, if the main concern is performance, we could use too 'conf_url_rewrite', if only all paths were run through it unconditionally.

So, making clear I'd prefer chx's solution, here's an alternative one.

Gábor Hojtsy’s picture

Status: Needs work » Needs review

I like chx's generic version (url_rewrite_5.patch) best, as it replaces an awkward URL rewrite mechanism (introduced by myself, pressed by performance reasons), with a lot cleaner, albeit a little bit less performant solution. BTW there is a spelling mistake, chx written 'inccoming' in the patch, plus I see no reason to check for empty($arguments) in arg() at all, since having the $q set properly (after this patch) would also ensure that $arguments is properly set. Note that this patch also fixes a small performance problem in arg(): now it always tries to do an explode, if $_GET['q'] is empty, that is on the homepage.

eldarin’s picture

Another approach which also works well:
in menu.module:menu_execute_active_handler() right after setting $path to the q variable,
I call a module which I called "urlpatterns" which resolves incoming URLs.
There I match against a set of URL regexp patterns configured by the module admin.
That way the URL rewrite happens very early in the useragent request to the server.

The reason for this is further down in menu.module:menu_execute_active_handler(), where I have a AAA function which decides if access should be given on a configurable URL basis - configured with another module doing AAA.

That way security settings mimic .htaccess in some way, while having the power of regexp flexibility as well as very good extendibility.

The get_normal_path() and get_path_alias() functions then are routed to a check and lookup in the "urlpatterns" module as well as the AAA module. The immediate benefits of this is that I don't link to anywhere on the server, where access is denied to the user.

Just improves security somewhat, as well as performance when I only let one "subscriber" hang on to the hook given in menu_execute_active_handler(). In my specialized case, I see no reason to have more than one URL rewriting module, since it would possible become a large spaghetti mess with possible unpredictable results if there is no central way of assuring URLs. I take care of "sub-URL aliasing" with regexp rules, so there should be no reason to do so either. It keeps security a bit tidier.

Gábor Hojtsy’s picture

Well, true the hook version of the patch does not ensure any order of the modules being called (currently alphabetical), so it needs careful programmers to implement the hooks.

eldarin’s picture

In my opinion, performance and security is key to successful URL aliasing/rewrites. Having it moved to a module is much better than the current non-perfect scheme. I can't see the need for multi-module direct access to rewriting URLs though. That's why I use weighted regexp patterns which flexibly enough also work as sub-URL rewriting for any module that would register such a rewrite rule - in the same manner as the menu-building with callbacks.

Does this make any sense ? Should give much better performance and security than the current (and suggested) schemes, no ?

eldarin’s picture

Goba, yes.
The effect of the multi-module rewrite policy would be probable loss of URL control for the site-admin and total chaos. It would also require massive efforts from module contributors to ensure their module rewriting would behave.

My outlined solution - which work well in practical terms for my needs - handles this by allowing the siteadmin to modify weights - and even disabling - of rules suggested by modules.

That should make life a lot easier for anyone.

eldarin’s picture

I meant to say "multi-module direct rewrite policy", where modules have direct control, even though siteadmin might have perceived control via multiple admin configuration pages scattered between all the modules who would implement a URL rewrite.
;-)

Gábor Hojtsy’s picture

OK, instead of talking, let us see, how your weighted regexps solution works (in the form of a patch or at least a code example). I have an idea, but it might be far away from what you actually do.

eldarin’s picture

The way I do revers lookup for outgoing matching is far from perfect right now, and is something I haven't had time to completely figure out. I was thinking of just using a similarly weighted reverse table - using the same patterns for now acutally - but it would be better with both incoming and outgoing separate although possibly overlapping rulesets. They are exclusive in matching since they apply to the domains of incoming, outgoing.

An idea would be to have the modules suggest the rules as one of three: for incoming, for outgoing, for both.
My solution involves a lot of security configurations, that's why it was setup in the very early menu URL handling, and only using cached lookup in the other functions.

eldarin’s picture

The weighting of the ruleset is really straightforward - a treegraph really. The intricate bit is in optimizing performance for this tree-graph with regards to the security rules as well. Yes, the proof is in the pudding, as they say ..
;-)

eldarin’s picture

I forgot to mention that this approach I used, broke normal core URL aliasing support. I solved that by hooking the nodeapi for insert of aliases just like the core.
Also - have the issues in http://drupal.org/node/21938 and http://drupal.org/node/22035 since been overlooked ?

A further possibility would be to use t() for wording matching, but I guess the best is to implement that directly on the rules and then further expand the tree-graph if multilanguage aliasing was needed for one site (think something in the lanes of pathauto). But I had no need for translating on my project.

Jose Reyero’s picture

Well, summarizing -and adding a little bit :-)

- eldarin's solution looks like an iteresting path to explore, but it still has some side effects - I'll let this for the future, maybe another thread?

- chx's patch loos like a good one, but has some performance concerns.

- then there's my patch which is like a performance-safe, middle way alternative.

So I propose we focus on that performace side, and do some benchmarking with chx's patch.

I'd like to know whether we have some data about how a generic new hook -not calling actually any module, only the hook execution itself-, and being called like a hundred times per page, actually impacts performance. Does anybody know?

And... in case performance is too bad.... I'm thinking of a variable, that could be set by modules and hidden to the user, to enable/disable this hook..... how does this sound like?

chx’s picture

Re. perf. I'll test these solutions with ab later but I am absolutely not convinced that a foreach on an empty array is so much slower than a function_exists against a non-existing function. I expect both to be negligeble. Stay tuned.

Dries’s picture

Goba: do you have time to follow up on this, and to make the final decision with regard to this problem? I'm aksing because it takes some digging to understand the problem and to evaluate the different approaches. If we want to commit this before the freeze, I'll have to delegate this job. (Also make sure left-overs and old mechanisms get removed.) If so, just give me a 'go' and I'll commit it after minimal testing.

Gábor Hojtsy’s picture

I have just arrived home, and I am going to go sleeping, as tomorrow I am going to have lessons at the university. Sadly I won't be able to look at the patches and ab results of chx before tomorrow afternoon. Since I think i18n is one of the most important aspects of the upcoming Drupal release, I will try to be around tomorrow afternoon to look at results coming up until then.

Dries’s picture

If you can look at this patch before Sunday I'll make it slip it (given you think it is ready).

chx’s picture

FileSize
1.87 KB

Removed array_reverse. Goba pointed out that URL rewrite implementations should work in any order. So it's not en/node/18 but lang/en/node/18.

Gábor Hojtsy’s picture

Chx, making the URL look uglier just to make it work in any order is not an option IMHO. The solution needs to be better. Here is another alternate version. What it does very closely resembles Jose's last patch, keeping conf_url_rewrite (albeit with a different name custom_url_rewrite) and with different parameterization. I made it to be in line with the drupal_lookup_path() function in parameter order and meaning, giving a third parameter to signal if the path is already aliased (this might be badly needed in solutions where an already aliased path should not be further aliased, we had enough discussion about this in the past).

The pluses of this patch is that the architecture nearly stays the same, the performance decrease should be negligible (a function_exists() is always evaluated even if there is no such function), but we know have an option to alias already aliased URLs (but we don't need to). The custom_url_rewrite() function can actually do a foreach on hooks, if someone wants to do that.

Preparing this patch, it struck me, that we don't actually need drupal_get_normal_path() or drupal_get_path_alias(), as they are just very thin wrappers around drupal_lookup_path(). In fact, by incorporating their contents to drupal_lookup_path(), we can (page-level) cache the results in the $map array. That is given that the results of the custom_url_rewrite() are time insensitive (there is no parameter to determine the return value other then those passed).

Opinions?

Jose Reyero’s picture

Goba, I like your latest patch

Just another small twist, which is passing the actual original path, more 'inexpensive' info..

> The custom_url_rewrite() function can actually do a foreach on hooks, if someone wants to do that.

This one, I think, is a great idea :-)

However I'd still like to know how 'expensive' one more hook for each outgoing link is...

Gábor Hojtsy’s picture

What I also "proposed", is that we get rid of the ...path_alias() and ...normal_path() wrapper functions, and make the final aliased URL cached. Jose, would there be any problem with (page level) caching of the resulting final aliased URL in the use case of the i18n module? I can hardly think of a custom_url_alias() function which does not return the same alias for the same parameters in different times.

chx’s picture

$_GET['q'] can change during the code flow thus arg() cache reset is necessary. It was in my patch, and actually it's a bit unrelated issue.

Gábor Hojtsy’s picture

Ah, anyway, I am busy implementing code which does custom alias caching too.

Gábor Hojtsy’s picture

FileSize
10.11 KB

OK, here is an update, which fixes Chx's noted stuff, and integrates custom alias handling into the page level caching system, removing drupal_get_path_alias and drupal_get_normal_path on the way. I know I also swapped the named actions of drupal_lookup_path(), but since the verb is lookup, the $op should be "what to look up", not "what I have passed".

Performance was taken care of as much as possible, so there will be no aliasing if there are no aliases in the database AND no custom alias function. If either is true, aliasing will be tried, and resultig aliases will be cached (on the page level). The only conceptual difference in this and the non-cached version is that the custom function should work on a one to one relationship level, so that the cache array does not change over time on the page. If this is not the case, the custom alias results should not be cached.

Patch yet untested, waiting for opinions from the i18n guys. Does the above constraints fit with the i18n plans?

Jose Reyero’s picture

Goba,

Well, about caching, I think it's no problem as long as it's separate for 'incoming' and 'outgoing' paths. I mean 'some_strange_path' -> 'node/21', doesn't imply 'node/21' -> 'some_strange_path'

But, about the wrappers,I actually like them and I'm thinking of some uses for them. In our case the 'custom_url_rewrite' function may call again drupal_lookup_path, to check whether the 'language+path' has some alias. So I'd like to have separate 'drupal_get_path_alias' and 'drupal_lookup_path' having the later restricted to sytem defined paths.
Also, on incoming paths, we re-search again the path without language on module_init, this way supporting aliases with and without language...

I mean, ok to some 'two level caching' (saving that hook in the wrappers), if it makes sense, but not that convinced about 'all in one function/single caching'.

Mmm, maybe I'm messing it too much... does it make sense for you?

And I was thinking too, what if we moved some ugly logic in common.inc into drupal_get_normal_path() ? I'm talking about this:

  // Initialize $_GET['q'] prior to loading modules and invoking hook_init().
  if (!empty($_GET['q'])) {
    $_GET['q'] = drupal_get_normal_path(trim($_GET['q'], '/'));
  }
  else {
    $_GET['q'] = drupal_get_normal_path(variable_get('site_frontpage', 'node'));
  }

However, I dont want this thread to be still more complex, just pointing at some possible use' of that wrappers.

moshe weitzman’s picture

The default database schema inserts an alias for node/feed to rss.xml. So 99.%% of sites will not experience your optimization. If this causes us to miss some performance optimization, I suggest removing that default alias.

Gábor Hojtsy’s picture

Moshe, sure, if you only have node/feed aliased to rss.xml, you will have the whole $map array built up and quite a few queries on pages. But this is also true if you have two or three aliases... Question is if a significant portion of our user base go without path aliasing at all, or it is the other way round.

Jose, with or without the wrappers, you will not be able to only retrieve a system alias, since the wrappers would always call the custom url rewrite function (this is common in all proposed patches so far). That said, we can exclude the custom generated path values from being cached, but having those wrappers is still just nice code with performance being hurt.

Dries’s picture

I thought we were going to nuke the custom rewrite?

If possible, do some performance tests. :)

Gábor Hojtsy’s picture

Dries, i18n will not work without the custom rewrite. We are trying to find a solution which works for the i18n scenario at least, so that Drupal can run unpatched with i18n features.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
3.02 KB

Here is an updated patch, which was approved by Jose and Chx (we prepared it as part of a private chat session this past hour). I have tested it functionally, and it seems to work fine.

What do we do?

  1. Always call the custom URL function (this is needed for i18n).
  2. Fix up the lookup actions to make sense in English
  3. Fix up arg() as $_GET['q'] can change over time (due to i18n magic).
  4. We pass the system resolved and original path values to the custom function, so it can decide whether further aliasing is needed. (The old behaviour is still possible with a properly written custom function.)

Why the other ways were abandoned?

  1. We cannot cache the results, since custom aliasing is not a one-to-one mapping.
  2. We cannot introduce a hook, since serious ordering problems would surface. We used to have, and with this patch, we still have fixed ordering, and the custom function controls the further actions.

As for performance, Moshe's pointer is quite valid, but it is unrelated to this issue: we are introducing a crucial feature for i18n. If it is said that majority of Drupal users are not using path aliases at all, then the RSS alias might be a good target to be removed. But it does not affect this solution.

Jose Reyero’s picture

+1 tested and works fine

This is the one :-)

Dries’s picture

- Did we do some performance testing, or isn't this necessary?

- Code looks clean to me. :-)

- Is there some "how to rewrite" URLs documentation, or any documentation that needs to be updated?

Gábor Hojtsy’s picture

The only performance difference for those not having a custom rewrite function is that now we always do a function_exists() check for every path_alias/normal_path function call. For those having a custom rewrite function, the most important change is that the function is always called, but from the passed parameters, you can still decide not to do any aliasing if you already work on an alias.

Documentation: http://drupal.org/node/23708 needs to be updated. I don't know how the documentation team goes about versioning now, how should pre 4.7 information be kept. The upgrade guide should also be updated, if this patch is accepted.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD. Please update the documentation.

eldarin’s picture

It seems further rewrites of URL-rewrite functionality will be coming as further features are added to URL-aliasing logic. One thing is rewriting on privileges, both incoming and outgoing (embedded in xhtml output); another one could be more flexibilty on language.

Finding an even more generic approach and then retrofitting legacy features with light layer is perhaps a good area for study. I had to tweak some to get my approach working, and still it omits core features - although they are reimplementable hooking into newer aliasing scheme.

When generic aliasing for stuff like multi-language support is involved, caching becomes a more important issue. Solving it perfectly when there are no easy persistent variables between sessions is a challenge - without resorting to platform dependent solutions.

Getting a solution which performs well on large autopath'ed sites will also be something to look into.
;-)

ax’s picture

Status: Fixed » Active

documentation (in path.module and http://drupal.org/handbook/modules/path) needs to be updated. setting status to ACTIVE.

Gábor Hojtsy’s picture

The Handbook is updated now (http://drupal.org/node/23708). Although I asked and received no assistance in how should versioning be done, I did come up with my own solution, which can be edited now by anyone :)

A patch is also attached, which removes the majority of mass url aliasing docs from Drupal core. Since this is programmers topic, it does not belong into the help text, no need to translate, etc. It is quite fine in the handbook. I would have set an URL alias for the book page as /handbook/modules/path/massalias but, since I have no rights to set aliases, I was unable to. This way the path in the patch looks quite odd, so if someone adds a correct alias, the path can be fixed.

Uwe Hermann’s picture

Status: Active » Needs review

It's a patch.

Dries’s picture

Status: Needs review » Fixed

Committed to HEAD.

tostinni’s picture

Category: feature » bug
Status: Fixed » Reviewed & tested by the community
FileSize
2.54 KB

Sorry, but the patch was a little buggy for the help text, here is a fix.

Dries’s picture

Status: Reviewed & tested by the community » Fixed
Anonymous’s picture

Status: Fixed » Closed (fixed)