Drupal's path aliasing system is a serious performance hotspot, and there's currently no way to change the path retrieval algorithm that it uses. Making it swappable in the same way that cache.inc is swappable would make it vastly easier to switch high-traffic sites to memcached.

Haven't had time to roll the patch yet, but I think it could resolve the long-standing dispute between "load all paths at once" and "load paths as needed" sites.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

kbahey’s picture

Status: Active » Needs review
FileSize
7.02 KB

I support this direction.

For this to happen though, there should be some juggling around.

cache.inc provides only 3 functions, path.inc on the other hand has more function, which an alternate path.inc should provide.

So, these are the ones that a new path.inc would need to provide.

drupal_init_path()
drupal_lookup_path($action, $path = '', $path_language = '')
drupal_get_path_alias($path, $path_language = '')
drupal_get_normal_path($path, $path_language = '')
drupal_match_path($path, $patterns)
drupal_is_front_page()

The following functions though need to a new home, so I moved them to common.inc.

drupal_get_title()
drupal_set_title($title = NULL, $output = CHECK_PLAIN)
arg($index = NULL, $path = NULL) {

Patch attached.

kbahey’s picture

Actually, drupal_match_path() can be moved as is to common.inc too.

Attaching new patch.

Also, attached is a path-null.inc for testing that does not do anything other than provide the function stubs.

moshe weitzman’s picture

Moving arg() that late is gonna cause some problems, at least for custom code. I'd prefer to move that to bootstrap.inc. arg() is quite brief.

This patch is a good idea.

kbahey’s picture

Moved arg() to bootstrap.inc.

kbahey’s picture

Who will be the first to write a path-memcache.inc when this gets in?

meba’s picture

subscribing

Anonymous’s picture

Status: Needs review » Needs work

I can see some idiots setting 'path_inc' to empty. Better check to see if the value of path_inc exists and if not use the default with a status message in admin/logs/status.

kbahey’s picture

Status: Needs work » Needs review

It is not a bad idea, but we don't do this for cache_inc. We will have similarly missing functions and things will stop working badly if they do that..

Let us leave this for a future patch, and focus on this one for now.

Anonymous’s picture

I don't think we should base the merit of your current patch based on the usability issues of a previous patch. I'll leave it up to Webchick to decide.

kbahey’s picture

@earnie

How long does it take someone to realize that the error of their ways if they do that? A message will say function blah is missing.

This is overengineering, and extra code that will execute on every page load, yet will help with only 0.01% of the cases.

Let us have some real reviews now ...

Anyway, here is an update patch after the most recent commits.

chx’s picture

Status: Needs review » Needs work

arg() must be in path.inc because it can not be used without drupal_init_path -- if you put in bootstrap then some will surely use it from hook_boot and get back bad results. While we do not babysit broken code for sure, this is a great way to introduce bugs into everyone's code -- if you do not explicitly test your page with a path alias then it is going to break. Even core tests do not test that kind of bugs... I would much rather see most of path.inc staying put, and only drupal_lookup_path moved into something pluggable.

RobLoach’s picture

Would this be better accomplished with an interface? A variable_get describing a file path seems pretty old school......... http://www.garfieldtech.com/blog/drupal-handler-rfc

eaton’s picture

Would this be better accomplished with an interface? A variable_get describing a file path seems pretty old school.........

If handlers go into core, this is a perfect place to use them. Until we do have that mechanism, though, I think it's probably best to stick with the same approach used for cache.inc -- the consistency will make conversion to any new approach simpler

Damien Tournoud’s picture

Handlers is more a pattern than an actual patch: it's all about using a strategy pattern for swappable interfaces with a plain factory function based (or not) on a key specific to the sub-system. There is really little code to share between each implementations of handlers.

Given that there are several other patches to convert other sub-systems to that approach, I see very little value in that patch not following the mainstream.

andreiashu’s picture

great idea, subscribing

mfer’s picture

There are other proposed patches for swappable systems. They include, but are not limited to:
#352951: Make JS & CSS Preprocessing Pluggable
#259103: make pluggable password hashing framework more generic and use class auto-loading.
#331180: fix pluggable smtp/mail framework

Drupal 7 may be the pluggable system version.

eaton’s picture

Bumping this issue. It appears to have stalled due to the handlers discussion, but Handlers have been torpedoed for the time being -- solving things with the 'cache.inc style swapping' is certailny better than not solving it at all.

catch’s picture

eaton’s picture

that patch only means that the default is more efficient in common scenarios. We've demonstrated in the past that isolated chunks of functionality (path lookup, caching, session management) can benefit considerably from tuned implementations that only work in specific scenarios (with memcached, on certain platforms, etc).

An example would be a memcached backed path store: rather than caching a big chunk of paths for each page, just use memcache's hash to store the paths and their aliases directly. Adding more and more complexity to our core implementation just makes it more difficult to swap out optimized mechanisms.

catch’s picture

There's also #464164: Move URL alias functionality into a Path API module (still separate from the Path UI) which would just move it into path.module which is optional anyway.

eaton’s picture

Wouldn't that mean, though, that modules that use aliasing would all list path.module as a dependency, and we'd be left trying to figure out how to modify path.module's lookup mechanism instead of a central drupal .inc?

catch’s picture

Well maybe, but worst case could do something like http://drupal.org/project/update_notifications_disable

I still like #362747: Create a call routing system (fix broken dynamic includes) for this - it's drupal_lookup_path() which is the issue, not the whole .inc, half of which shouldn't be there anyway (drupal_set_title()??).

Dave Reid’s picture

Marked #553398: variable_get('path_inc', 'includes/path.inc') as a duplicate of this issue.

Dave Reid’s picture

Component: base system » path.module
Gerhard Killesreiter’s picture

Issue tags: +Performance

It would be good if this could be in D7, path.inc is nearly always a bottleneck for high-performance sites.

sun’s picture

Category: feature » task

Coming from

- #464164: Move URL alias functionality into a Path API module (still separate from the Path UI), which equally identified that some strange functions live in path.inc, which were already tried to be moved in this patch, and

- #332333: Add a real API to path.module, which added proper Path/URL alias API functions to path.module (i.e. path_load(), path_save(), which I still believe should really live in path.inc),

I support the moving of functions in this patch (not necessarily arg(), as chx already pointed out) and making path.inc swappable by using a simple variable approach like we already do elsewhere.

moshe weitzman’s picture

I have no objection to this, but for me the performance issue is 99% solved by new whitelist in drupal_lookup_path().

eaton’s picture

AFAIK, the whitelist reduces the fundamental inefficiency but doesn't eliminate the problem of hitting the DB whenever paths do need to be retrieved.

catch’s picture

The current situation in HEAD is this:
# whitelist - reduces number of paths to look up

# system path cache - stores all the system/normal paths which have aliases found per page in an array (but not the aliases themselves), so that drupal_lookup_path() can do a single query to lookup known paths next time.

Any 'new' paths on the page not contained in the system path cache are also looked up individually, or on cache misses (although only within the whitelist).

So what's missing is a way to put individual aliases into memcache - then you could either skip the system path cache entirely and hit memcache for each individual path, or it'd probably also work with cache_get_multiple() + individual lookups for the leftovers - with memcache just standing in for the db in both instances. memcache's ->get() allows for grabbing multiple cache keys but I don't know how much more efficient that is in practice.

As far as I can see this requires only a couple of changes to what's in HEAD:

1. Adding a cache_get_multiple() and/or cache_get() to drupal_lookup_path() in between the static lookups and the db hits. Haven't used it, but presumably that's what http://drupal.org/project/pathcache does in D6

2. Adding aliases to memcache in the first place - probably on path_save(), but maybe also on lookup since that'd allow memcache's LRU to work properly on sites with lots of aliases.

Neither of those make sense for a site using database caching, - since you'd just end up with the same number of select queries plus a lot of useless inserts. However the only thing that's needed to allow that to happen cleanly in contrib is to make drupal_lookup_path() itself swappable. The way we've been doing that with other systems where you might not want to override the whole thing is converting it to OOP, but we're too late for that here since freeze started.

So we have a couple of choices - try to factor as much as possible out of path.inc so that's all that's left pretty much is drupal_lookup_path() and the functions that call it. Or we put 2-3 lines of code into path.inc itself along the lines of $function = variable_get('path_lookup_function', FALSE); or an abuse of the hooks system, to let you populate the path from wherever before the db gets hit without having to duplicate drupal_set_title() and other oddities in the same file.

I have to say though, this is by far not the worst performance issue in HEAD at the moment - we have lots of new ones which are being more or less ignored in other issues, and even more in the critical path, while like Moshe said this is more or less solved for the 99% case (and the 1% case can apply 1k patches to core if this doesn't get in). Making path.inc swappable is at most going to save you between 2-10 very efficient database queries on most pages, whereas we just introduced about 20 queries on every page over at #607244: Decrease performance impact of contextual links some with filesorts, while people were discussing things here.

eaton’s picture

A fair point, catch. I still maintain that we are on a troubling path of introducing core complexity to solve a problem that contrib has already taken care of (via memcache and other swap-in-an-alternative-inc-file solutions), and are still making it very difficult for contrib to push in new and better solutions as they emerge.

I agree, though, that right now there are bigger fish to fry thanks to the multitude of performance problems in HEAD.

catch’s picture

Well the total amount of code for both the system path caching and the whitelist must be less than 50 lines, and it's really not very complicated. Additionally the whitelist is a good thing regardless of memcache, and the system path cache may be if memcache.module in D7 properly supports multiple cache get - you could get 90% of aliases for a page in a single memcache hit, instead of say 50, at least saving some network time if nothing else, and you can still store individual path / alias lookups to avoid too much duplication across requests

Core shouldn't make it impossible for the 1% (if that) of sites which run memcache to use that to the greatest degree possible, but it also shouldn't ship with implementations which punish the large numbers of sites which need to not have horrible performance bottlenecks but are still on single server / VPS setups without the resources to hire sysadmins or performance consultants. in many cases we're still doing both - crap default implementations with no easy way to swap it out.

catch’s picture

Component: path.module » base system
Status: Needs work » Needs review
FileSize
2.88 KB

Patch.

Deliberately not moving any functions anywhere, let's make it swappable, then bikeshed in a different issue about where the functions should live, and open a Drupal 8 issue about converting it to a factory.

The one case we don't solve well in core, is drupal_get_normal_path() - it'd be nice to be able to cache that in contrib, and it'd reduce the number of queries required to serve a page at all down to zero along with other pending patches in the queue. If this doesn't get in, it's technically possible to do the same from contrib via evil mis-use of drupal_static() alongside hook_boot(), hook_init() and hook_url_outbound_alter() but I started down that road then chx reminded me of this issue.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Let's do this quick. We can do better in D8.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Agreed. I think this is inline with our swappable caching system. While the whitelisting stuff has helped, eaton's correct that it still means you need to trawl through the database, and it'd be nice to let memcached eat those hits.

Committed to HEAD.

catch’s picture

Thanks!

Opened D7 port issue against pathcache #652892: Port Path Cache to Drupal 7.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

fgm’s picture

Out of sheer curiosity, did the idea of a Memcached-backed implementation of path.inc ever materialize ? I could not find it in the memcache project, and the cache project, which includes one, never had a 7.x branch.

Fabianx’s picture

Issue summary: View changes

@fgm: See https://drupal.org/node/652892 for a patch in progress.

fgm’s picture

@fabianx, nice, thanks.