Executive Summary as of May 24, 2013
This issue has been patched and committed in D8. There has been a patch created in D7 and the remaining task is to create a unit test in D7.
The problem
The query we added in #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() is not well indexed - on a Drupal 6/Pressflow site with 4.5m aliases it takes between 5 and 19 seconds (on their staging server). We knew this was a nasty query when we put it in, part of the reason it's using the variable system instead of cache_get()/cache_set() so it doesn't get blown away on cache clears.
Here's the EXPLAIN:
mysql> EXPLAIN SELECT SUBSTRING_INDEX(src, '/', 1) AS path FROM url_alias GROUP BY path;
+----+-------------+-----------+-------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | url_alias | index | NULL | src | 386 | NULL | 4586690 | Using index; Using temporary; Using filesort |
+----+-------------+-----------+-------+---------------+------+---------+------+---------+----------------------------------------------+
This same client (with a lot of contrib modules installed) has 138 unique system path prefixes on their site, i.e. the same query run on menu_router:
SELECT SUBSTRING_INDEX(path, '/', 1) AS path2 FROM menu_router GROUP BY path2;
138 rows in set.
Pressflow currently stores the whitelist in the cache system, and rebuilds it every time an alias is added/update/deleted - see https://code.launchpad.net/~catch-drupal/pressflow/path_slow_query for changes that bring that back to where D7 currently is, this led to the query showing up near the top of the slow query log since it was running a lot more often than necessary. However while looking at that I think there may be away around running this query at all.
So here's a suggestion:
Instead of array($path => TRUE, $path1 => TRUE); for the whitelist, we could store array ($path => TRUE, $path1 => TRUE, $path2 => FALSE);
paths set to FALSE are blacklisted rather than whitelisted.
In drupal_lookup_path(), if strtok($path) isn't found in the whitelist at all (neither blacklisted nor whitelist), run this query:
SELECT dst FROM url_alias WHERE src LIKE 'user%' LIMIT 0, 1; - that's the same logic as we have currently, except for one path root at a time.
EXPLAIN looks like this:
mysql> EXPLAIN SELECT SQL_NO_CACHE dst FROM url_alias WHERE src LIKE 'node%' LIMIT 0, 1;
+----+-------------+-----------+-------+----------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+----------------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | url_alias | range | src_language_pid,src | src | 386 | NULL | 154112 | Using where |
+----+-------------+-----------+-------+----------------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
A lot of rows examined, but not 4m+ and various different iterations of this query all completed in milliseconds via MySQL cli.
Same query for D8:
MariaDB [d8]> SELECT alias FROM url_alias WHERE source LIKE 'admin%' LIMIT 0, 1;
Empty set (0.00 sec)
MariaDB [d8]> EXPLAIN SELECT alias FROM url_alias WHERE source LIKE 'admin%' LIMIT 0, 1;
+----+-------------+-----------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------------+---------------------+---------+------+------+-------------+
| 1 | SIMPLE | url_alias | range | source_language_pid | source_language_pid | 767 | NULL | 1 | Using where |
+----+-------------+-----------+-------+---------------------+---------------------+---------+------+------+-------------+
1 row in set (0.03 sec)
A default Drupal 8 install shows the following in menu_router:
MariaDB [d8]> SELECT SUBSTRING_INDEX(path, '/', 1) AS path2 FROM menu_router GROUP BY path2;
+--------------+
| path2 |
+--------------+
| admin |
| batch |
| comment |
| file |
| filter |
| node |
| overlay |
| overlay-ajax |
| rss.xml |
| search |
| sites |
| system |
| taxonomy |
| toolbar |
| user |
+--------------+
15 rows in set (0.03 sec)
So in the best case, we'd be replacing the SUBSTR_INDEX query + set, for 15 well indexed queries + set. In the worst case, it would replace that same query for 130+ well indexed queries + set (although likely the more cruft there is in {menu_router} the less likely sites will be linking to every single one of those paths - I doubt we ever call url() on overlay-ajax for example.
The most unique paths with a substring of 5 characters I could find in {menu_router} for any site was 130 by the way.
As well as those additional queries, we'd be increasing the size of the whitelist array from say 5-10 items to between 20 and 130, so there'd be a bit more memory cost as well.
The advantage is that with a lot of path aliases and a cold start, instead of a 5+ second gap building this whitelist where you can't serve any pages (and currently there's no locking so it could also be a stampede), assuming the LIKE 'foo%' query always runs in around 1ms or so, you'd have roughly flat performance - a very slight hump from the extra sets building the cache, but no single massive rebuild blocking everything.
This is another case where we'd be cumulatively building a cache entry as an array, so it is likely a use case for #402896: Introduce DrupalCacheArray and use it for drupal_get_schema() again (whether we stick with variable_set()/variable_get() or switched to the cache system).
Proposed solution
The current patch completely removes the slow query from the whitelist, replacing it with an indexed LIKE 'foo%' query (although one that will run more often).
Unfortunately url() does not have an option for "this path has already been altered, don't try to look up an alias", so this means it's valid to pass arbitrary paths to url(), not only things which match paths in {menu_router}. We need to fix that for 8.x, see #1297714: Allow both path processing and aliases to be by passed when using Url. Since you could end up with an unlimited number of paths from the strtok($path, '/') check and each of those could create an array key in the new cached item, we need to work around that.
So to deal with that, the patch extend DrupalCacheArray to instead of using strtok(), use a substring of the path. Since this could still be arbitrarily high on many sites, it starts at 5, but each time the size of the array reaches over 100, reduces the substring length by one character.
The result of this is that if you only call url() with paths in {menu_router} (which lots of sites will do), you save the slow query and still have an effective alias whitelist.
If you are calling url() with arbitrary paths (either broken code or on altered paths due to the API limitation), then the whitelist will gradually get less effective, but you still avoid the slow query.
Comment | File | Size | Author |
---|---|---|---|
#145 | path-alias-whitelist-1209226-145.patch | 8.73 KB | arnested |
#143 | path-alias-whitelist-1209226-143.patch | 8.64 KB | arnested |
#140 | path-alias-whitelist-1209226-140.patch | 19.83 KB | Berdir |
#140 | path-alias-whitelist-1209226-140-interdiff.txt | 1.73 KB | Berdir |
#136 | path-alias-whitelist-1209226-136.patch | 19.69 KB | Berdir |
Comments
Comment #1
catchComment #2
catchHere's an excerpt from the slow query log of a D6 site, then EXPLAIN and query times before and after:
(query takes 15 seconds on the dev hardware, averages at 5 seconds on production).
After:
Query completes in < 5ms on the dev hardware. Still has to scan quite a lot of rows but it's significantly better.
Also a diffstat of the patch without the DrupalArrayCache hunk which should be added from #402896: Introduce DrupalCacheArray and use it for drupal_get_schema(). Ends up a net reduction of 6 lines of code.
The simpletest changes are because we are currently calling drupal_static_reset() after the host database has been moved out of the way, but before the child database has been installed. This means if the destructor of an object gets called (which will happen if it's held in drupal_static() storage) and the destructor queries the db, you get a lovely PDO exception telling you the table doesn't exist. Just changing the order of operations fixes this. This is related to #1188316: DrupalUnitTestCase incompatible with any classes with destructors that access the database but tests won't pass without it.
Also attaching a screenshot from xhprof of PathAliasWhitelist::resolveCacheMiss() on a default D8 install - this shows four queries for path aliases, 'node', 'user', 'admin' and 'toolbar'. Two of those are only exposed to admins, so an anonymous user on a cache miss would only end up with one extra query compared to now.
Note this also takes the whitelist out of the variable system and puts it back into cache - it's been in variables previously to avoid this query running just because there was a cache clear.
Comment #4
catchThemeLinksUnitTest is calling all kinds of functions that rely on global scope etc. and only narrowly avoids querying the db, so this patch just makes it use DrupalWebTestCase.
Comment #5
sunComment #6
catchExactly the same patch minus DrupalCacheArray now that's been committed. Also 5-20 second queries in core are a bug report.
Comment #7
webchickFixing tag.
Comment #8
catchAdded some PHPdoc for the new class.
Comment #9
chx CreditAttribution: chx commented+ $exists = (bool) db_query_range('SELECT 1 FROM {url_alias} WHERE source LIKE :path', 0, 1, array(':path' => $offset . '%'))->fetchField();
missesdb_like
Was this benchmarked? That's a lot of getOffset magic calls.
Finally, while I know $offset is proper on an interface level, for a PathAliasWhitelist I would've expected something like path_root, first_path_part, path_arg_zero etc
Comment #10
Anonymous (not verified) CreditAttribution: Anonymous commenteddiscussed making the path lookup code into a pluggable class with catch in IRC, very rough interface idea being:
EDIT: created a new issue here: #1269742: Make path lookup code into a pluggable class.
Comment #11
catchI added db_like() and changed the $offset argument to $path_root.
Have not benchmarked this. I doubt benchmarks will be meaningful however I'll post profiling data from a page with a lot of links assuming this patch passes tests.
Comment #13
catchFixed the patch.
Also profiling for chx. 69 calls to DrupalCacheArray::offsetExists() is 369 microseconds (wall time in xhprof) - equates to 5.3 microseconds per call. 200ms out of the 369 is spent in array_key_exists() in offsetGet() - we might want to make that an isset + array_key_exists() instead of straight array_key_exists() - but I'd want some more subsystems converted to this to see the actual distribution (that pattern is slower if isset() returns false a lot).
Comment #15
catchGrr. Fixed the patch but uploaded the wrong one :(
Comment #16
catchAlright so the current approach here can lead to cache pollution if you have url() called with lots and lots of URLs that have different content before the first /. This wasn't a problem with the original single query approach since it'd only ever store what it found, not what it didn't find as we need to do building this over time.
So... here's a new version. Instead of strtok($path, '/') it uses an arbitrary substring of the path. This starts at 5 characters, but if $this->storage gets too big, it cuts it down by one and starts again, until it gets down to a single character. Once it's at a single character we can't cut it down any longer, but in any remotely normal usage there should not be too many unique single characters.
This same logic is in the latest releases of memcache (for dealing with wildcard logic) and has been working well there. I'd considered trying to put it into a dedicated class by itself but that's for another patch if at all.
Comment #18
catchComment #20
catchComment #22
catchThis one should pass tests, or at least more tests.
Comment #23
catchAdded explicit tests for the whitelist class to check the key shrinking behaviour and that the cache item gets written etc.
Comment #25
catchThat's going to fatal due to .htaccess cruft, sorry.
Comment #26
catchRemoving debug.
Comment #28
catch:(
Comment #30
catchThis should actually pass, missing one line.
Comment #32
catchHmm, this passes locally. Trying one more time.
Comment #33
catchDiscussed with irc in chx and he reckons supporting non-system paths being passed to url() is 'babysitting broken code'. I'm ambivalent on this - since while url() has an 'alias' option it does not have an 'altered' option so if you have a site that alters something-foo-bar to something/foo/bar that feels valid to pass to url() without the 'alias' option - in fact the alias option would be wrong since it's not one. We could look at changing the docs for this for D8 though. #15 is still valid for that original approach though.
Comment #34
chx CreditAttribution: chx commentedWe discussed this a lot and for now we do not have a better choice because anything better would require a change to url() signature -- namely changing alias to something like 'final' to stop not just aliasing but altering as well.
Comment #34.0
chx CreditAttribution: chx commentedComment #34.1
catchUpdated issue summary.
Comment #35
Anonymous (not verified) CreditAttribution: Anonymous commentedthe patch at #32 didn't apply for me, here's a re-rolled one that does. please RTBC if i got the re-roll right.
Comment #36
catchAnother possible way to do this, no time to write a patch though and may not get to it this week.
That gets all the possible 'directories' for router items in the system, but on a much smaller table so it doesn't matter that the query isn't indexed. Or could even do it in PHP.
Use this as as the basis for the whitelist, then record whether a path has an alias or not using the same logic as now.
Since the list would be finite, if something's not in the list at all, we just don't check for it. Would need to clear that cache on menu_rebuild().
Comment #37
rjbrown99 CreditAttribution: rjbrown99 commentedSubscribing. My new policy is to provide value on 'subscribe' posts, so here goes.
A SQL Query walks into a bar. In one corner of the bar are two tables. The Query walks up to the tables and asks: Mind if I join you?
I have nothing relevant to offer here, just quite interested in following the discussion.
Comment #38
chx CreditAttribution: chx commentedI love #36. Given that we have this information during menu building we could directly store it and never run that query (aside from an hook)update_N function). You want
$item['_parts']);
inside_menu_router_build
once it's completed (just before// Apply inheritance rules.
)Comment #39
catchYeah let's do #36. Figuring it out from the router sounds good although we need to be able to rebuild it on a cache miss as well - but could possibly just invoke hook_menu() and hook_menu_alter() ourselves even?
This same problem of needing to distinguish between nothing and nothing came up with #1011614: Theme registry can grow too large for MySQL max_allowed_packet and memcache default slab size as well, and a similar approach worked there (store the keys of the theme registry only then build out from there).
Comment #40
catchHere's a start on #36, not passing tests yet though.
Comment #41
catchAnd this should pass.
Comment #43
catchWell maybe not. Thought of one other logic issue, will look at the rest later.
Comment #45
Anonymous (not verified) CreditAttribution: Anonymous commentedone character fix to make path.inc compile.
Comment #47
chx CreditAttribution: chx commentedI think your patch will rebuild the menu router on every page load.
Comment #48
catchIt calls parent::__construct() before checking if (empty($this->storage)) so that should only ever happen on cache misses.
Comment #49
catchLooked at the test failures.
update.php calls drupal_bootstrap_full(), this calls path initialize, which is hitting field_ui_menu(), which is calling entity_get_info(), which is undefined because the entity system moved to a module and isn't enabled yet.
Comment #51
catchAdded a comment to clarify what's happening in __construct()
Comment #52
chx CreditAttribution: chx commentedWhat's missing from the patch is that the parent constructor will fill in the storage if there is something to fill. But you are likely to rebuild the menu unnecessarily. I suggested pushing the parts right where they are available readily parsed into parts during router rebuild ... somewhere :)
Comment #53
catchDo you mean in addition to the logic here or instead? If we did it instead it won't catch cache clears. I agree with trying to do this from a performance angle but it could get a bit messy.
Comment #54
chx CreditAttribution: chx commentedwe store menu masks in a variable. cant we store this in a variable too?
Comment #55
catchHere's a version with the variable. I couldn't think of a better name than 'menu_path_roots' but there ought to be a better one.
Also did some manual testing and profiling with this, all looks to be working fine and tests passed locally for me.
Comment #56
catchForgot to mention, this means we'll need an update when this is backported to D7 - the update will need to run the original query except against the {menu_router} table and store the variable - that way we're not invoking any hooks in the update and we don't get left with an empty variable.
Comment #58
catchRight patch would help.
Comment #59
chx CreditAttribution: chx commentedQuestion.
$query->condition('u.source', db_like($root) . '%', 'LIKE') ->range(0, 1);
does this need/%
?Comment #60
catchI think so yeah, if we don't have a path whitelisted, someone calls url('node/1');
That gives us a path root of 'node'.
We then want to know if there's any alias in the system at all for any path starting with 'node', so LIKE 'node%' will find one of those.
Comment #61
catchSpoke to chx about the new variable, we can save a bit of memory by using array_values() when we set the variable.
Comment #62
BerdirImplementation looks good to me, just one question...
"that paths that do exist in the router". Not sure I understand this. Everything in $this->storage *does* exist in the router, no? Looking at resolveCacheMiss(), what it actually checks is if there is a path *with an alias* with that prefix.
Maybe that can be mentioned more explicitly here?
-24 days to next Drupal core point release.
Comment #63
catchHmm yeah the comment isn't great, basically we store three states:
NULL - exists in the router but not whitelisted or blacklisted yet.
FALSE - blacklisted.
TRUE - whitelisted.
By having the array with those three possible values, it means that if you do url('12345'); and that's not in the router, it's not going to run the query at all - this was mainly just about explaining why we initialize the entire array to NULL rather than just querying and storing TRUE/FALSE (which is what earlier patches did but is dangerous given what url() allows for).
Comment #64
catchRe-rolled for /core. Also bumping priority to major - the query can take multiple seconds to complete, and until it does so, blocks any request to the site that calls url() which could be all of them.
Comment #65
catchDiscussed this with chx in irc and he pointed out adding an extra array key to be parsed with list() is not safe for backwards compatibility. So instead menu_router_build() and _menu_router_build() get an optional $save parameter.
Comment #66
chx CreditAttribution: chx commentedI like it. Very, very much like it finally.
Comment #67
catchSince this is for backport to D7 and my patch, assigning to Angie.
Comment #68
webchickI had started reviewing this the other night with catch on IRC and I pointed out:
What happens when we're in maintenance mode?
And he said "Oh, you found a bug." :) So marking needs work for now.
6 days to next Drupal core point release.
Comment #69
catchAlright, fix the no longer needed MAINTENANCE_MODE check and updated the comment. This was previously in there because a cache miss could trigger a menu_rebuild(), but that is no longer the case.
Comment #70
Anonymous (not verified) CreditAttribution: Anonymous commentedthis is ugly. i don't mean catch's work, i just mean the whole path lookup system :-(
looks like we're missing some of the implications of coupling with menu rebuilds here. AFAICS unless new router item additions clear the path whitelist cache, we can refuse to look for aliases for valid system paths, so i think we need something in _menu_router_build() to wipe or check-and-wipe-if-new-roots or something.
also, drupal_path_alias_whitelist_rebuild() doesn't look right when called from path_* CRUD functions.
1. add a path alias for 'admin/people' --> 'mypeeps'. we write out 'admin' => TRUE to the whitelist cache, all good, number of rows in url_alias == 1.
2. delete the 'mypeeps' alias, number of rows == 0. in drupal_path_alias_whitelist_rebuild(), we don't clear path_alias_whitelist cache, because we get a match for 'admin' in the whitelist.
also, i think we need to wrap a single instance of the PathAliasWhitelist class, or use drupal_static to grab the instance we build in drupal_lookup_path(). otherwise we load another one in drupal_path_alias_whitelist_rebuild() (another hit on persistent cache, *without* anything we've learned from this request :-(), then depending on the flow, we might just go ahead and fetch another one. if we just add a PathAliasWhitelist::reset() or similar we can fix that.
i'll try to catch catch and chx in IRC to hash this out.
Comment #71
catchUnassigning webchick.
Comment #72
rjbrown99 CreditAttribution: rjbrown99 commentedcatch this is still problematic for me on a D6+Pressflow site, with a similar situation to your original issue description and comment #2. Might you have a workable start to a D6 fix for this? I checked your github tree and it doesn't look like you committed anything there. The original launchpad code is here but that just backs it out of cache and into a variable. Just trying to ease a bit of work in hacking at it myself. Thanks!
Comment #73
catchThe master branch of that tree is just the same as Pressflow, https://github.com/tag1consulting/pressflow6/tree/path_alias_slow_query should be the right branch for D6 work.
Comment #74
rjbrown99 CreditAttribution: rjbrown99 commentedExcellent, thank you for the pointer and for contributing the fix to the community! This will definitely be helpful.
Comment #75
catchComing back to this.
Added this check (untested).
I think this is fine - it just means we'll get some pointless path alias requests until the cache is built from scratch again, for one very specific case.
Went for drupal_static().
Also did PSR-0-ification after the DrupalCacheArray -> CacheArray conversion.
Comment #76
catchArgh forgot to add some files before rolling the final patch, proper roll this time.
Comment #78
catchFixing trailing whitespace and the syntax error.
Comment #80
catch#78: path_alias_whitelist_1209226.patch queued for re-testing.
Comment #82
Anonymous (not verified) CreditAttribution: Anonymous commentedfixed old function signature of PathAliasWhitelist::set(), lets see if this will get somewhere with the bot.
Comment #84
BerdirNot 100% sure this is correct but it does fix the test.
Comment #86
dagmarRe-rolled #82 including change by @Berdir.
I tried to install Drupal with this patch applied, and seems this change #1477446: Move lock backend to PSR-0 code was not included in this patch. Change lock_adquire by lock()->adquire() fix the issue.
Comment #88
BerdirOk, this should pass the tests.
- Had to prevent drupalGet() from converting the source path to an alias, which makes sense.
- The forum tests had to wrong calls (drupalGet('/')) so offsetGet($offset) was called with FALSE. Fixed the calls, we might also want to add a check for this, not sure...
Comment #89
BerdirNot true, nothing is returned?
Not sure if the documentation is wrong or the code.
Comment #90
catchhmm it doesn't return it, it adds a new whitelist instance to the drupal_static() cache. It probably makes sense to return in the 7.x backport for bc, but I don't see a reason to in 8.x.
Comment #91
BerdirOk, here is a re-roll that removes the inconsistent return, uses state() for the root parts and also fixes the rebuild function, which was IMHO completely broken. It created a new instance and assigned it to $whitelist, which was only defined if a $source was passed in and it was *not* by reference.
I think now it should actually work. I believe it only "worked" previously because it assigned the empty return value to $cache['whitelist'] in case of a wipe call which completely disabled the whitelist handling. At least that's how I understood it.
Comment #92
miro_dietikerBackreference to issue porting variables => state, postponed:
#1798884: Convert path_alias_whitelist to state system
This is still current:
#1269742: Make path lookup code into a pluggable class
Comment #93
miro_dietikerRerolled, clashed with #1798884: Convert path_alias_whitelist to state system
Some things described from Berdir seems not to be fixed in his latest patch. Discussed, fixed.
Added and fixed some doc, added missing public decorator.
Reviewed in depth, looks good. Found many inconsistencies with DrupalCacheArray that was renamed to CacheArray, followup:
#1831646: Document rename DrupalCacheArray to CacheArray
Added clear() function to CacheArray to cleanly abstract the clearing process. This was ugly.
Providing new version.
Comment #95
miro_dietikerNext try. Fixed clear() that missed some reset parts.
Comment #97
miro_dietikerDebugged, fixed. :-)
Prepopulation of the cache was missing!
Comment #99
miro_dietiker#97: 1209226_path_alias_whitelist_cache_97.patch queued for re-testing.
Comment #101
miro_dietiker#97: 1209226_path_alias_whitelist_cache_97.patch queued for re-testing.
Comment #102
miro_dietikerYessssss! Passed!
Comment #103
catchShouldn't this use the new clear method?
Otherwise the updates to this look great.
Comment #104
miro_dietikerHmm you catched the point.. :-)
The separation was still not perfect.
I dropped the cache() call as it should be completely wrapped by the class .. and path.inc provides functional wrappers.
Switched to drupal_clear_path_cache() that doesn't do too much more.
Additionally found a typo LoadMenuPathRoots -> loadMenuPathRoots().
Let's see what testbot thinks about it.
Comment #105
BerdirWhat I'm worried about is that this introduces a call from menu.inc to path.inc, and I'm not sure that it's good to have this dependency.
Also, do we need this for the new router system as well?
Comment #106
miro_dietikerWe can only solve this by introducing a hook in the menu system. (e.g. hook_menu_router_save)
The path system is dependent on the menu building.
Generally, extracting the roots is here done in menu.inc. But the only reason why we're doing this is the whitelisting of the path.inc.
I consider this a followup issue as this dependency was here already before. It currently just became visible.
EDIT: So if we want to reduce the dependency, we also need to pass extraction of the menu_roots to path.inc and even add a second hook. Possibly we can cover it with one, if we pass $menu...
Comment #107
catchPosting from phone but the generator patch depends on the pluggable path inc patch which should be easier with this patch in. It'd be better if menu.inc provided a hook to react to so the cache could clear there.
Comment #108
miro_dietiker:-) crosspost!
Followup or in this issue?
Comment #109
sunI don't quite see how a hook would be helpful here, since path.inc is not a module, and the hook implementation would have to be invoked regardless of whether path.module is enabled or not.
For now, I think it would be better to move along with this hard-coded call from menu.inc to path.inc. Both operate at full bootstrap. Both will be converted into services either way, so the only real requirement we have are good tests to prevent this code from being mistakenly deleted without modern replacement. ;)
Comment #110
catchPath aliasing is likely to move to a listener as part of the new router changes - in effect it'll be implementing the replacement of hook_url_outbound_alter(), if that happens it ought to be possible to provide it from a module. Since there's multiple patches being worked on to refactor both these systems I think the hard-coded call is probably OK too - it's that or taking another performance hit from building and throwing away menu routers in path.inc.
Comment #111
BerdirLooks like #973436: Overzealous locking in variable_initialize() also introduced a clear method but called it remoteStorage() and only added it to the specific cache array implementation.
@sun: There are no tests for this functionality yet, I think. Sounds like we need some specific tests that create/remove aliases for something like system/... or user/.. and then check if this was correctly updated. Adding tag.
Comment #112
Berdir#104: 1209226_path_alias_whitelist_cache_104.patch queued for re-testing.
Comment #114
mgiffordDoesn't apply locally either:
$ git apply 1209226_path_alias_whitelist_cache_104.patch
error: patch failed: core/includes/path.inc:9
error: core/includes/path.inc: patch does not apply
Comment #115
miro_dietikerYes the path system is now completely rewritten so this needs complete overhaul.. (but is still relevant!)
(The previous patch still is the best start for a later D7 backport.)
Comment #116
BerdirHere is a re-roll. @katbailey probably isn't going to like this much yet as it doesn't inject stuff yet, but I *really* like this, the code is way easier now that it's all encapsulated in a single class. Also moved the whitelist class into the new Drupal\Core\Path namespace.
Had to trick a bit in the Path unit tests by manually setting the menu roots variable.
Still would like to add better tests for the actual whitelist handling, so that we have an explicit test that makes sure the whitelist is updated correctly as aliases are added and removed. Also not yet sure how to integrate it with the new router system (is that necessary to get it in in atm?)
Comment #118
catchA lot of things don't work with the new router system yet, so that's not a requirement no.
Comment #119
miro_dietikerA debug statement :-)
Another debug :-)
Rest looks really awesome! I'm also dreaming of tests for these cases. I tried for a short time without too much success of a simple / clean approach.
Comment #120
BerdirYes, and removing those also seems to fix the tests.
Comment #121
katbailey CreditAttribution: katbailey commentedThis looks like a great improvement - but yes, you guessed it, I am curious as to why the AliasWhitelist isn't being injected into the AliasManager :-)
Comment #122
BerdirInject all the things :)
- Defining AliasWhitelist as a service now, and inject cid, bin, state and database (not sure about having to inject cid/bin)
- Injecting that to AliasManager
- Fixing some comments
- Added tests for the whitelist
Let's see if this works. A bit worried about the __destruct() in a service as that's breaking my lock/form_state patches. But if that would be a problem here, the above patch should already fail as well I think.
Comment #123
Lars Toomre CreditAttribution: Lars Toomre commentedAttached are some coding/documentation standard thoughts from reading through the patch in #122:
My understanding is that '(Optional)' should be '(optional)' [lower case].
I believe this needs a doclock with @param directive for each parameter. Same with other constructor in this patch.
I am not sure, but @sun, @xjm and @tim.plunkett have all said that there should be a blank line before the of a class declaration. I am not sure, but they say that it is part of our coding standards.
Comment #125
Berdir#122: path-alias-whitelist-1209226-122.patch queued for re-testing.
Comment #127
BerdirRe-roll, guzzle had to add it's stuff right where I made my change as well :p
Comment #128
BerdirUpdated to fix the documentation issues outlined in #128.
I think this is ready.. any final reviews? :)
Comment #129
BerdirAnother one to fix two copied comments in AliasTest, did not fix the original that file needs a cleanup follow-up to avoid code duplication and fix those wrong and missing comments. Not a problem introduced here.
Comment #130
Berdir#129: path-alias-whitelist-1209226-129.patch queued for re-testing.
Comment #132
BerdirArgh, here we go. Segmentation faults.
And the same views Filter tests as e.g. in the form state issue.
Which means we are blocked on #512026: Move $form_state storage from cache to new state/key-value system and possibly #1858616: Extract generic CacheCollector implementation and interface from CacheArray too although we can avoid that by overriding __destruct() if we have to.
Comment #133
BerdirRe-roll that includes the latest service destructor patch and uses it, including an override of __destruct() to avoid the segfaults. Ugly but the cache collector will remove the need for that.
Comment #134
mgiffordPatch applies nicely. What do we need to do to get this marked RTBC?
Comment #135
Berdir#1891980: Add Service termination API to reliable terminate/shut down services in the container needs to get in, which should hopefully happen soon as it's now RTBC. Then I will re-roll this issue and it should be ready for final reviews/RTBC.
Comment #136
BerdirOk, the service destructor is in, here's a re-roll.
Comment #137
danielnolde CreditAttribution: danielnolde commentedPatch from #136 applies, doesn't break anything (install) so far and works.
Comment #138
BerdirMaybe someone can do some extended manual testing and confirm that it is working. Basically, what you have to do is add and remove aliases for different root paths (node, user, admin, ...) and make sure that they are found and correctly resolved without having to do manual cache clears and report back with what you did exactly.
There is test coverage for the same thing, so removing the Needs tests tag. I think this is ready.
Comment #139
amateescu CreditAttribution: amateescu commentedJust a small nitpick here, I'd use
randomName()
for generating an invalid path. Other than that, it looks great to me.Comment #140
BerdirUpdated the patch to use randomName() for invalid rooth paths and alias names. There is no conflict potential there because user/admin don't have a length of 8.
Note that the array syntax will be replaced with a CacheCollector from #1786490: Add caching to the state system once it is in. It doesn't matter which way this gets in, I can re-roll/do a follow-up to update it either way round. Both issues need to go in anyway.
Comment #141
amateescu CreditAttribution: amateescu commentedI think we're good to go here.
Comment #142
catchI worked on this a fair bit but enough people reviewed/updated the patch I'm OK committing it.
Committed/pushed to 8.x, moving to 7.x for backport.
Comment #143
arnested CreditAttribution: arnested commentedI backported the patch to Drupal 7 (actually I started off with last D7-version of the patch and carefully implemented all further changes that made sense in D7).
Comment #145
arnested CreditAttribution: arnested at Reload commentedWoops. Left out the $save = TRUE flag on menu_router_build(TRUE) in menu_rebuild()
Comment #146
BerdirLooks good, but I think those tests need to be backported as well. We'll have to make it a normal webtest though, as 7.x doesn't have DUBT :(
Comment #146.0
BerdirUpdated issue summary.
Comment #146.1
mchampsee CreditAttribution: mchampsee commentedadded executive summary so that people know where things stand.
Comment #147
mgiffordThe path in #145 still applies nicely in D7. Looks like this would be a good fix.
Still waiting on simple tests to get into Core.
Comment #148
kopeboy CreditAttribution: kopeboy commentedAny update?
Comment #149
fgmRelated issue: there is currently a bug regarding the whitelist: #2540154: Path alias whitelist not rebuilt on alias deletion.
Validating the patch for the current issue should probably also test the bug scenario in that other issue.
Comment #150
fgmComment #151
fgmComment #152
arnested CreditAttribution: arnested at Reload commentedI'm on a tight schedule currently and probably won't have time to implement the test cases any time soon.
Comment #155
attiks CreditAttribution: attiks at Attiks commentedFYI, current code does not even work on postgres #1791366: Hardcoded MySQL function in path.incnvm, bad database restore
Comment #156
HarryAscent CreditAttribution: HarryAscent commentedHello, Immediately apologize for my English, I use Google translator. :)
I have a problem on the site arose as follows:
SELECT source, alias FROM url_alias WHERE source IN ('node/4389', 'user/%/cancel/confirm/%/%', 'node/%/revisions/%/view', 'node/%/revisions/%/revert', 'node/%/revisions/%/delete', 'node/add/selection-recipes', 'user/%/cancel', 'node/add/page-text', 'node/add/video-post', 'node/add/blog-post', 'node/add/cook', 'node/add/journey', 'node/add/book', 'node/add/news', 'node/add/about', 'node/add/advice', 'node/add/recipie', 'node/add/header', 'node/add/page', 'file/add', 'file/%', 'taxonomy/term/%', 'node/%', 'user/%', 'node/add', 'user/logout', 'user', 'taxonomy/term/2872', 'taxonomy/term/2866', 'taxonomy/term/1096', 'taxonomy/term/1095', 'taxonomy/term/1093', 'taxonomy/term/1091', 'taxonomy/term/1090', 'taxonomy/term/1089', 'taxonomy/term/1088', 'taxonomy/term/1087', 'taxonomy/term/1084', 'taxonomy/term/1081', 'taxonomy/term/1080', 'taxonomy/term/2880', 'taxonomy/term/2855', 'taxonomy/term/1079', 'taxonomy/term/2884', 'taxonomy/term/1097', 'taxonomy/term/2861', 'taxonomy/term/1083', 'taxonomy/term/1085', 'taxonomy/term/2952', 'taxonomy/term/2888', 'taxonomy/term/1086', 'taxonomy/term/1092', 'taxonomy/term/2876', 'taxonomy/term/2868', 'taxonomy/term/1082', 'taxonomy/term/2859', 'taxonomy/term/1094', 'taxonomy/term/17', 'taxonomy/term/2867', 'taxonomy/term/2874', 'taxonomy/term/14', 'taxonomy/term/13', 'taxonomy/term/18', 'taxonomy/term/2886', 'taxonomy/term/2891', 'taxonomy/term/2860', 'taxonomy/term/22', 'taxonomy/term/15', 'taxonomy/term/2869', 'taxonomy/term/2879', 'taxonomy/term/16', 'taxonomy/term/21', 'taxonomy/term/20', 'taxonomy/term/2953', 'taxonomy/term/2858', 'taxonomy/term/2881', 'taxonomy/term/131', 'taxonomy/term/2854', 'taxonomy/term/19', 'taxonomy/term/12', 'taxonomy/term/998', 'taxonomy/term/994', 'taxonomy/term/997', 'taxonomy/term/993', 'taxonomy/term/988', 'taxonomy/term/1000', 'taxonomy/term/992', 'taxonomy/term/996', 'taxonomy/term/991', 'taxonomy/term/1005', 'taxonomy/term/2887', 'taxonomy/term/2862', 'taxonomy/term/2875', 'taxonomy/term/2853', 'taxonomy/term/995', 'taxonomy/term/1002', 'taxonomy/term/990', 'taxonomy/term/2954', 'taxonomy/term/2882', 'taxonomy/term/999', 'taxonomy/term/2857', 'taxonomy/term/2889', 'taxonomy/term/1006', 'taxonomy/term/989', 'taxonomy/term/2870', 'taxonomy/term/2878', 'taxonomy/term/1001', 'taxonomy/term/2865', 'taxonomy/term/1003', 'taxonomy/term/1004', 'taxonomy/term/8', 'taxonomy/term/3', 'taxonomy/term/646', 'taxonomy/term/778', 'taxonomy/term/779', 'taxonomy/term/2', 'taxonomy/term/5', 'taxonomy/term/782', 'taxonomy/term/7', 'taxonomy/term/2885', 'taxonomy/term/2863', 'taxonomy/term/781', 'taxonomy/term/2873', 'taxonomy/term/2852', 'taxonomy/term/6', 'taxonomy/term/10', 'taxonomy/term/4', 'taxonomy/term/2849', 'taxonomy/term/2883', 'taxonomy/term/2856', 'taxonomy/term/647', 'taxonomy/term/2890', 'taxonomy/term/1', 'taxonomy/term/138', 'taxonomy/term/11', 'taxonomy/term/2877', 'taxonomy/term/2871', 'taxonomy/term/9', 'taxonomy/term/2864', 'taxonomy/term/780', 'node/32', 'taxonomy/term/2972', 'taxonomy/term/3012', 'taxonomy/term/2995', 'taxonomy/term/2985', 'taxonomy/term/3033', 'taxonomy/term/3017', 'taxonomy/term/2850', 'taxonomy/term/3028', 'taxonomy/term/2987', 'taxonomy/term/3025', 'taxonomy/term/3063', 'taxonomy/term/3057', 'taxonomy/term/3060', 'taxonomy/term/3073', 'taxonomy/term/3003', 'taxonomy/term/3036', 'taxonomy/term/2993', 'taxonomy/term/3008', 'taxonomy/term/2967', 'taxonomy/term/3049', 'taxonomy/term/3069', 'taxonomy/term/2977', 'taxonomy/term/2921', 'taxonomy/term/2965', 'taxonomy/term/2894', 'taxonomy/term/3302', 'taxonomy/term/2935', 'taxonomy/term/3001', 'taxonomy/term/3075', 'taxonomy/term/3282', 'taxonomy/term/3266', 'taxonomy/term/2927', 'taxonomy/term/3020', 'taxonomy/term/2906', 'taxonomy/term/3344', 'taxonomy/term/3298', 'taxonomy/term/3246', 'taxonomy/term/2938', 'taxonomy/term/2979', 'taxonomy/term/2898', 'taxonomy/term/3338', 'taxonomy/term/3286', 'taxonomy/term/3250', 'taxonomy/term/2942', 'taxonomy/term/3044', 'taxonomy/term/2911', 'taxonomy/term/3330', 'taxonomy/term/3314', 'taxonomy/term/3278', 'taxonomy/term/3258', 'taxonomy/term/2951', 'taxonomy/term/3051', 'taxonomy/term/2919', 'taxonomy/term/3335', 'taxonomy/term/3320', 'taxonomy/term/3310', 'taxonomy/term/3294', 'taxonomy/term/3270', 'taxonomy/term/3254', 'taxonomy/term/2930', 'taxonomy/term/3040', 'taxonomy/term/2903', 'taxonomy/term/3326', 'taxonomy/term/3322', 'taxonomy/term/3307', 'taxonomy/term/3290', 'taxonomy/term/3274', 'taxonomy/term/3262', 'taxonomy/term/2946', 'taxonomy/term/2959', 'taxonomy/term/2913', 'taxonomy/term/3236', 'taxonomy/term/3241', 'node/4509', 'node/4508', 'node/4504', 'node/4495', 'node/4493', 'node/4488', 'node/4499', 'node/4458', 'node/4445', 'node/4417', 'node/4406', 'node/2511', 'node/2002', 'node/879', 'node/648', 'node/3478', 'node/891', 'taxonomy/term/140', 'taxonomy/term/185', 'taxonomy/term/51', 'taxonomy/term/304', 'taxonomy/term/634', 'taxonomy/term/157', 'taxonomy/term/265', 'taxonomy/term/119', 'node/4455', 'node/4452', 'node/4444') AND language IN ('ru', 'und') ORDER BY language ASC, pid ASC
as I understand the discussion here on the part of my problem
also I attach a screenshot of the debugger devel:
I would be grateful for any help to solve this problem. Thank you
Comment #157
HarryAscent CreditAttribution: HarryAscent commentedYes, and my server crashes because of these database queries, MySQL uses all available memory.
originally the site was on the 6th Drupal.
Comment #158
HarryAscent CreditAttribution: HarryAscent commentedsomebody something can prompt?