pathauto automatically generates path aliases and stores them in the url_aliases table. These aliases are loaded for each and every page load (even cached ones) and isn't too efficient. It is unclear if Drupal core can easily be changed to provide a more efficient mechanism.

I therefore suggest to investigate the use of the conf_url_rewrite function as explained in the path module's help to create aliases on the fly instead of generating stored aliases.

Comments

mikeryan’s picture

Assigned: Unassigned » mikeryan

I missed that the entire url_alias table is loaded, obviously that presents a problem for sites that want to alias everything. However, I have looked at conf_url_rewrite, and I don't think it's a practical solution:

1. Functionally, how can it handle multiple nodes which share the same title (or, more precisely, same data-used-in-the-alias)? For example, on my Fenway Views site, I have many event nodes titled "New York Yankees at Boston Red Sox" differing only in their date and time - how would conf_url_rewrite handle an incoming URI of sports/new_york_yankees_at_boston_red_sox? With the static pathauto implementation, distinct aliases are generated for each node by appending a serial number.

2. Speaking of handling incoming URIs, while plugging values into a template to generate an alias is simple enough, taking an incoming URI and matching it against the various patterns is going to be very slow. Consider if you have three aliases defined - a default node alias of [cat]/[title], a book node alias of [book]/[title], and a category alias of [vocab]/[cat] - and conf_url_alias receives a URI of music/bso_concert... How does conf_url_alias know whether "music" is a category name, book name, or vocabulary name without the brute-force method of generating all of the possibilities and comparing them? Then, assuming we've eliminated the category alias case, we need to compare "bso_concert" to the cleanstring version of every node title in the database. That's not even considering the possibility of ambiguous results (if there's a category and book title with the same name, for example).

3. conf_url_rewrite can be called in the outgoing case many times per page (how many internal links are there right on this drupal.org page?) - while applying patterns to a given node isn't nearly as hairy as the matching case, this would still entail significant overhead.

I think it's important that drupal_get_normal_path be a very quick function - the thought of taking more time to figure out where to get the page content than to actually generate and deliver it makes me quite queasy:-). And, since it can be called many times per page, drupal_get_path_alias should also be optimized. As I explained above, I don't see how a dynamic method of generating and recognizing aliases can meet those goals (certainly not while supporting a rich set of patterns). On the other hand, the current core implementation that reads the entire url_alias table for every page that's delivered (even cached ones) can't meet those goals for sites with a significant number of aliases either.

The current core implementation inherently does not scale to many aliases - pathauto may make it (much) easier to run up against the performance problem, but it could just as well happen to a busy site that religiously adds aliases to new content manually. I believe that's the place to address the performance problem, and I'll open a new issue on that.

That all being said, one possibility that's opened up in my mind would be for pathauto to maintain its own alias table outside of url_alias, and use conf_url_rewrite to retrieve aliases/paths for anything that the core path map can't match - that would enable me to do a couple of things that would otherwise require changing core.

mikeryan’s picture

See node/22035 for a set of core patches to address the performance issue.

mikeryan’s picture

Closing this out - the performance issue is being addressed in core.