Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Drupal's path aliasing system is a serious performance hotspot, and there's currently no way to change the path retrieval algorithm that it uses. Making it swappable in the same way that cache.inc is swappable would make it vastly easier to switch high-traffic sites to memcached.
Haven't had time to roll the patch yet, but I think it could resolve the long-standing dispute between "load all paths at once" and "load paths as needed" sites.
Comment | File | Size | Author |
---|---|---|---|
#32 | path_inc.patch | 2.88 KB | catch |
#10 | 320132-10-swappable-path-inc.patch | 8.3 KB | kbahey |
#4 | 320132-4-swappable-path-inc.patch | 8.67 KB | kbahey |
#2 | 320132-2-swappable-path-inc.patch | 8.56 KB | kbahey |
#2 | path-null.inc_.txt | 605 bytes | kbahey |
Comments
Comment #1
kbahey CreditAttribution: kbahey commentedI support this direction.
For this to happen though, there should be some juggling around.
cache.inc provides only 3 functions, path.inc on the other hand has more function, which an alternate path.inc should provide.
So, these are the ones that a new path.inc would need to provide.
The following functions though need to a new home, so I moved them to common.inc.
Patch attached.
Comment #2
kbahey CreditAttribution: kbahey commentedActually, drupal_match_path() can be moved as is to common.inc too.
Attaching new patch.
Also, attached is a path-null.inc for testing that does not do anything other than provide the function stubs.
Comment #3
moshe weitzman CreditAttribution: moshe weitzman commentedMoving arg() that late is gonna cause some problems, at least for custom code. I'd prefer to move that to bootstrap.inc. arg() is quite brief.
This patch is a good idea.
Comment #4
kbahey CreditAttribution: kbahey commentedMoved arg() to bootstrap.inc.
Comment #5
kbahey CreditAttribution: kbahey commentedWho will be the first to write a path-memcache.inc when this gets in?
Comment #6
meba CreditAttribution: meba commentedsubscribing
Comment #7
Anonymous (not verified) CreditAttribution: Anonymous commentedI can see some idiots setting 'path_inc' to empty. Better check to see if the value of path_inc exists and if not use the default with a status message in admin/logs/status.
Comment #8
kbahey CreditAttribution: kbahey commentedIt is not a bad idea, but we don't do this for cache_inc. We will have similarly missing functions and things will stop working badly if they do that..
Let us leave this for a future patch, and focus on this one for now.
Comment #9
Anonymous (not verified) CreditAttribution: Anonymous commentedI don't think we should base the merit of your current patch
basedon the usability issues of a previous patch. I'll leave it up to Webchick to decide.Comment #10
kbahey CreditAttribution: kbahey commented@earnie
How long does it take someone to realize that the error of their ways if they do that? A message will say function blah is missing.
This is overengineering, and extra code that will execute on every page load, yet will help with only 0.01% of the cases.
Let us have some real reviews now ...
Anyway, here is an update patch after the most recent commits.
Comment #11
chx CreditAttribution: chx commentedarg() must be in path.inc because it can not be used without drupal_init_path -- if you put in bootstrap then some will surely use it from hook_boot and get back bad results. While we do not babysit broken code for sure, this is a great way to introduce bugs into everyone's code -- if you do not explicitly test your page with a path alias then it is going to break. Even core tests do not test that kind of bugs... I would much rather see most of path.inc staying put, and only drupal_lookup_path moved into something pluggable.
Comment #12
RobLoachWould this be better accomplished with an interface? A variable_get describing a file path seems pretty old school......... http://www.garfieldtech.com/blog/drupal-handler-rfc
Comment #13
eaton CreditAttribution: eaton commentedIf handlers go into core, this is a perfect place to use them. Until we do have that mechanism, though, I think it's probably best to stick with the same approach used for cache.inc -- the consistency will make conversion to any new approach simpler
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedHandlers is more a pattern than an actual patch: it's all about using a strategy pattern for swappable interfaces with a plain factory function based (or not) on a key specific to the sub-system. There is really little code to share between each implementations of handlers.
Given that there are several other patches to convert other sub-systems to that approach, I see very little value in that patch not following the mainstream.
Comment #15
andreiashu CreditAttribution: andreiashu commentedgreat idea, subscribing
Comment #16
mfer CreditAttribution: mfer commentedThere are other proposed patches for swappable systems. They include, but are not limited to:
#352951: Make JS & CSS Preprocessing Pluggable
#259103: make pluggable password hashing framework more generic and use class auto-loading.
#331180: fix pluggable smtp/mail framework
Drupal 7 may be the pluggable system version.
Comment #17
eaton CreditAttribution: eaton commentedBumping this issue. It appears to have stalled due to the handlers discussion, but Handlers have been torpedoed for the time being -- solving things with the 'cache.inc style swapping' is certailny better than not solving it at all.
Comment #18
catcheaton, note #456824: drupal_lookup_path() speedup - cache system paths per page. got in. If we could fix up #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() too that might make this less necessary.
Comment #19
eaton CreditAttribution: eaton commentedthat patch only means that the default is more efficient in common scenarios. We've demonstrated in the past that isolated chunks of functionality (path lookup, caching, session management) can benefit considerably from tuned implementations that only work in specific scenarios (with memcached, on certain platforms, etc).
An example would be a memcached backed path store: rather than caching a big chunk of paths for each page, just use memcache's hash to store the paths and their aliases directly. Adding more and more complexity to our core implementation just makes it more difficult to swap out optimized mechanisms.
Comment #20
catchThere's also #464164: Move URL alias functionality into a Path API module (still separate from the Path UI) which would just move it into path.module which is optional anyway.
Comment #21
eaton CreditAttribution: eaton commentedWouldn't that mean, though, that modules that use aliasing would all list path.module as a dependency, and we'd be left trying to figure out how to modify path.module's lookup mechanism instead of a central drupal .inc?
Comment #22
catchWell maybe, but worst case could do something like http://drupal.org/project/update_notifications_disable
I still like #362747: Create a call routing system (fix broken dynamic includes) for this - it's drupal_lookup_path() which is the issue, not the whole .inc, half of which shouldn't be there anyway (drupal_set_title()??).
Comment #23
Dave ReidMarked #553398: variable_get('path_inc', 'includes/path.inc') as a duplicate of this issue.
Comment #24
Dave ReidComment #25
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedIt would be good if this could be in D7, path.inc is nearly always a bottleneck for high-performance sites.
Comment #26
sunComing from
- #464164: Move URL alias functionality into a Path API module (still separate from the Path UI), which equally identified that some strange functions live in path.inc, which were already tried to be moved in this patch, and
- #332333: Add a real API to path.module, which added proper Path/URL alias API functions to path.module (i.e. path_load(), path_save(), which I still believe should really live in path.inc),
I support the moving of functions in this patch (not necessarily arg(), as chx already pointed out) and making path.inc swappable by using a simple variable approach like we already do elsewhere.
Comment #27
moshe weitzman CreditAttribution: moshe weitzman commentedI have no objection to this, but for me the performance issue is 99% solved by new whitelist in drupal_lookup_path().
Comment #28
eaton CreditAttribution: eaton commentedAFAIK, the whitelist reduces the fundamental inefficiency but doesn't eliminate the problem of hitting the DB whenever paths do need to be retrieved.
Comment #29
catchThe current situation in HEAD is this:
# whitelist - reduces number of paths to look up
# system path cache - stores all the system/normal paths which have aliases found per page in an array (but not the aliases themselves), so that drupal_lookup_path() can do a single query to lookup known paths next time.
Any 'new' paths on the page not contained in the system path cache are also looked up individually, or on cache misses (although only within the whitelist).
So what's missing is a way to put individual aliases into memcache - then you could either skip the system path cache entirely and hit memcache for each individual path, or it'd probably also work with cache_get_multiple() + individual lookups for the leftovers - with memcache just standing in for the db in both instances. memcache's ->get() allows for grabbing multiple cache keys but I don't know how much more efficient that is in practice.
As far as I can see this requires only a couple of changes to what's in HEAD:
1. Adding a cache_get_multiple() and/or cache_get() to drupal_lookup_path() in between the static lookups and the db hits. Haven't used it, but presumably that's what http://drupal.org/project/pathcache does in D6
2. Adding aliases to memcache in the first place - probably on path_save(), but maybe also on lookup since that'd allow memcache's LRU to work properly on sites with lots of aliases.
Neither of those make sense for a site using database caching, - since you'd just end up with the same number of select queries plus a lot of useless inserts. However the only thing that's needed to allow that to happen cleanly in contrib is to make drupal_lookup_path() itself swappable. The way we've been doing that with other systems where you might not want to override the whole thing is converting it to OOP, but we're too late for that here since freeze started.
So we have a couple of choices - try to factor as much as possible out of path.inc so that's all that's left pretty much is drupal_lookup_path() and the functions that call it. Or we put 2-3 lines of code into path.inc itself along the lines of $function = variable_get('path_lookup_function', FALSE); or an abuse of the hooks system, to let you populate the path from wherever before the db gets hit without having to duplicate drupal_set_title() and other oddities in the same file.
I have to say though, this is by far not the worst performance issue in HEAD at the moment - we have lots of new ones which are being more or less ignored in other issues, and even more in the critical path, while like Moshe said this is more or less solved for the 99% case (and the 1% case can apply 1k patches to core if this doesn't get in). Making path.inc swappable is at most going to save you between 2-10 very efficient database queries on most pages, whereas we just introduced about 20 queries on every page over at #607244: Decrease performance impact of contextual links some with filesorts, while people were discussing things here.
Comment #30
eaton CreditAttribution: eaton commentedA fair point, catch. I still maintain that we are on a troubling path of introducing core complexity to solve a problem that contrib has already taken care of (via memcache and other swap-in-an-alternative-inc-file solutions), and are still making it very difficult for contrib to push in new and better solutions as they emerge.
I agree, though, that right now there are bigger fish to fry thanks to the multitude of performance problems in HEAD.
Comment #31
catchWell the total amount of code for both the system path caching and the whitelist must be less than 50 lines, and it's really not very complicated. Additionally the whitelist is a good thing regardless of memcache, and the system path cache may be if memcache.module in D7 properly supports multiple cache get - you could get 90% of aliases for a page in a single memcache hit, instead of say 50, at least saving some network time if nothing else, and you can still store individual path / alias lookups to avoid too much duplication across requests
Core shouldn't make it impossible for the 1% (if that) of sites which run memcache to use that to the greatest degree possible, but it also shouldn't ship with implementations which punish the large numbers of sites which need to not have horrible performance bottlenecks but are still on single server / VPS setups without the resources to hire sysadmins or performance consultants. in many cases we're still doing both - crap default implementations with no easy way to swap it out.
Comment #32
catchPatch.
Deliberately not moving any functions anywhere, let's make it swappable, then bikeshed in a different issue about where the functions should live, and open a Drupal 8 issue about converting it to a factory.
The one case we don't solve well in core, is drupal_get_normal_path() - it'd be nice to be able to cache that in contrib, and it'd reduce the number of queries required to serve a page at all down to zero along with other pending patches in the queue. If this doesn't get in, it's technically possible to do the same from contrib via evil mis-use of drupal_static() alongside hook_boot(), hook_init() and hook_url_outbound_alter() but I started down that road then chx reminded me of this issue.
Comment #33
chx CreditAttribution: chx commentedLet's do this quick. We can do better in D8.
Comment #34
webchickAgreed. I think this is inline with our swappable caching system. While the whitelisting stuff has helped, eaton's correct that it still means you need to trawl through the database, and it'd be nice to let memcached eat those hits.
Committed to HEAD.
Comment #35
catchThanks!
Opened D7 port issue against pathcache #652892: Port Path Cache to Drupal 7.
Comment #37
fgmOut of sheer curiosity, did the idea of a Memcached-backed implementation of path.inc ever materialize ? I could not find it in the memcache project, and the cache project, which includes one, never had a 7.x branch.
Comment #38
Fabianx CreditAttribution: Fabianx commented@fgm: See https://drupal.org/node/652892 for a patch in progress.
Comment #39
fgm@fabianx, nice, thanks.