Over in #1606794: Implement new routing system, we are preparing to commit a new "Nested Matcher" to Drupal 8. The basic concept, as stated in that issue summary, is:
The current implementation defines a multi-stage matching system, NestedMatcher. A NestedMatcher has 3 parts:
- One InitialMatcher, which matches a Request object against whatever logic it wants.
- Zero or more PartialMatchers, which are always assigned a RouteCollection from the previous stage and return a RouteCollection.
- One FinalMatcher, which is always assigned a RouteCollection from the previous stage and returns an array of attributes to add to the Request, like any other Matcher.
Every sub-matcher is required to throw an appropriate exception if it comes up with no matching routes.
On the plus side, this is very flexible. PartialMatchers can be any object, which means they can be spawned from and use any service in the DIC, which means they can filter routes based on whatever the hell they want.
To be clear: The current model has already been public for a while and has been signed off on by both Dries and catch, so we're not changing it right now. This discussion is for post-commit refactoring.
sdboyer has indicate a possible problem with this approach. Specifically, PartialMatchers can do "anything" and leave "anything" behind for the next step. When we get to the Final Matcher, which is responsible for narrowing down the list of possible routes to one and returning the appropriate attributes, we could have all kinds of routes left, potentially. The current default logic of "meh, whatever the first one is" may not be sufficiently deterministic. Once we start adding additional filters, like "this route only matches if /node/{node} is a node of type Article", this process could get highly confusing to debug.
Additionally, all partial matchers run in all cases, even if they're not relevant. For instance, the aforementioned "is this node an Article" filter will try to run even if we're on /user/{user} or /admin/structure/hello/mom. It's then up to the partial matcher to know that it should just return the RouteCollection it's passed and be done with it, even though we have to iterate over the collection and call it. Once we have a decent number of partial matchers in place, that could get uncomfortably slow. (And remember we still have to do mime-type matching!)
The mid-term goal is to finish up this architecture and then push it upstream to Symfony CMF's Routing component, where it can be shared by Drupal, Symfony CMF, ezComponents, and others that have expressed interest in complex request-based routing. We should probably determine before that happens what if anything we should change here to make it more clearly deterministic and scalable.
This issue is mainly for discussion purposes. Anyone who understands routing is welcome. :-)
Comments
Comment #1
dbu commentedthanks for this writeup. and as the lead of the cmf routing component i want to confirm that we would love to get this improvement into the DynamicRouter (or rename it NestedRouter, whatever).
i think we have to handle the risk of partial matchers doing bad things with documentation and good templates, maybe a base class. while we could add a method to the PartialMatcherInterface to ask a matcher if it cares about this request, but its one more call to do. special PartialMatcher that should not filter in some circumstances should check that at the start of there match method and just flip back the RouteCollection if they do not care about the request.
for building a partial matcher for basic http things, there is an open issue on symfony to make their matcher flexible enough to do partial matching: https://github.com/symfony/symfony/issues/5014
i think these operations do not need to be further split up - its also relevant for performance as the combined filter will be faster than the 4-step filtering in the current nested router draft in drupal.
another thing we should decide is for adding additional information to the matched route information (the $defaults array). currently there are some rules hardcoded into DynamicRouter, which is definitely not generic: https://github.com/symfony-cmf/Routing/blob/master/DynamicRouter.php#L199
we should either abstract this into a list of RouteEnhancerInterface or say this should be done in an event listener for an appropriate symfony event. i would prefer to add a list of RouteEnhancerInterface to the DynamicRouter to make it more explicit what is going on.
Comment #2
effulgentsia commentedYeah. What would be ideal here is to avoid PHP's function/method call overhead entirely for partial matchers that are irrelevant for the path. Would it make sense to let partial matchers somehow identify route ids that they are relevant to, and then have that information compiled/cached somewhere?
Comment #3
Crell commentedPossibly. But if we do some sort of regex at runtime, would that be a net win? All it would do is eliminate a few function calls, unless we figure out how to lazy-instantiate the partial matchers, too.
Comment #4
sdboyer commentedi'm a bit fearful this could split discussion in an unhelpful way, but i've already started, so i'll keep with it. i posted a quick rundown of what one complex path might look like on g.d.o, and i want to use that as the basis for further laying out my concerns about the determinism of the Nested approach here. also, i'm not suggesting that the approach is ACTUALLY non-deterministic; just that it has the potential to become so complex as to appear as such. essentially, that we could be inviting mandelbugs by using it. and Drupal already has enough of those, kthx.
first, lemme make clear that i'm not opposed to the Nested Router itself. it is an elegant formulation, combining a first pass that should be driven by set intersection to trim out irrelevant routes (roughly speaking,
PartialMatcherCriteria ∩ RouteCollection = NewRouteCollection), then pairs it with a series-approach for final, more controlled selection. the part that concerns me is the way we've been throwing around statements like "which means they can filter routes based on whatever the hell they want." i believe the Partial vs. Final matchers can do really great things if they respect some rules, but poor implementations could wreak havoc.my chief concern is that the
PartialMatchers in the nested approach work best when they operate in a very constrained fashion on particular, known sets. anHttpMethodPartialMatcheris probably the easiest example: given the method specified on theRequest, it is trivial for said matcher to iterate over theRouteCollectionand determine whether or not a route is compatible with the method (assuming that we either ensure routes always have method compatibility specified via default injection, or by equipping the matcher with logic to handle empty sets).however, when it comes to providing a choice to an end-user (or developer, for that matter) about selectively differentiating a particular page/type of page, no single
PartialMatcherprovides a complete picture. moreover, i suspect that most of the time, when people choose to make a routing decision using this system, they will be aiming to differentiate "this path" from "something else," and we have a fair bit of footwork to do in order to figure out what the "something else" is and translate it back into well-formed logic in aPartialMatcher, if it even makes sense to represent the differentiation with aPartialMatcherat all.the breakdown of
PartialMatchers is easy enough to explain. in my example on g.d.o, one of the routes is differentiated by being used only for "nodes that are less than a week old." attempting to implement that in aNodeCreatedInLastWeekPartialMatchersimply makes no sense; for it to perform any route-stripping operations, it would have to operate on the implicit set of{($node->created >= time() - 604800) , ($node->created = time() - 604800)}. if the only other route living atnode/{node}was the node catchall, then that would be OK, but if you've also got a route with a node type dimension, then you've got a problem. and it's a different problem than if, say, theHttpMethodPartialMatcherencounters the empty set, because theNodeCreatedInLastWeekPartialMatcheris checking a basic part of the node datastructure that will ALWAYS be there. it will never encounter an empty set, and thus cannot safely determine whether it is *really* intended to strip the route.in short: i hope that
PartialMatchers are written very infrequently, and only for very well-defined domains, and only by people who really know what they're doing. 80% or more ofPartialMatchers should be highly reusable libraries. the rule should be that aPartialMatcheris only used if it operates on a self-contained, conceptually complete set, AND it comes with sane fallback/empty set behaviors. externally defined stuff like HTTP context is great, but node types or user roles could also work, at least sometimes.given these restrictions on the
PartialMatcher, i have big hopes for theFinalMatcher. i've got like a bajillion other things to do today though, so i'll follow up with that in a separate comment a little later.Comment #5
sdboyer commentedoh, one advantage to making
PartialMatchers operate on highly defined sets. for sets with a knowable set of entries, we could explicitly (or even predictively during container compilation) translate them to a bitmask, then create a subinterface for the correspondingPartialMatcherand simply have theNestedRouterperform the check using binary math without ever having to call a userspace function. might save an object instantiation or two, as well.microoptimizations++ :P
Comment #6
dbu commentedi totally agree with the point of documenting the PartialMatchers in a way to say people they do not usually need to write a new matcher.
for the bitsets: what would be a set of entries that needs a specific PartialMatcher? i do not think we need PartialMatcher just for nodes or users or something like that. at least not in the NestedRouter/DynamicRouter. the PartialMatchers just filter the list of matches. would you say routing for /node/37 is different whether node 37 exists? i would not - if any, the route enhancement step trying to get the content for this would explode and say 404. then the 404 handling can kick in and do what it has to do.
the check for concern could apply to the ChainRouter though. but i would expect you chain 2, at max 3 routers and for such a low number the pre-check sounds like overkill, don't you think? unless you would want different routers for each type of drupal entity, but i think you do not need that.
one more point: routing comes before security. so if you would manage to have several potential matches, but one of them is prohibited by access control, its too late to select the alternative. but imho it would be extremly bad url design if the same url has different routing based on access control. (there is of course the handlers for access denied exceptions that can use a controller to handle the situation - its just not the job of the routing to do the checks)
i have problems understanding the use case of
NodeCreatedInLastWeekPartialMatcher. we talk about matching routes here, and in the end of the process, we need one single match. if i try to match /posts/recent or something like this, its a simple match that points me to the controller that renders recent posts by selecting them with a query and repeatedly call generate() to build each of those routes.and as i understand it, a
PartialMatchershould never generate new routes, this is the job of InitialMatcher.so the only application of your use case would be to say: i only want to ever show content in my system that is newer than a week, other things disappear. which imho would be a weird case.
lets say we try to match /node/37 or /my-web-page which would be an alias to /node/37
the last step right now is hardcoded in the cmf DynamicRouter. i think we could elaborate that one to attach enhancers. one could be a DrupalContentEnhancer that handles user, node, ... and loads the content (the cmf routes link their content so its "just there"). if the content asks for a specific controller, the enhancer could also replace the _controller, or as in the case of the cmf, add it if the route does not specify explicitly.
Comment #7
dbu commentedah, and we should try to keep the number of PartialMatchers low. the whole Url matching could be done in one go, i don't see why we would change the order or put steps in between - they can always come before or after. note the open issue on symfony to make the core UrlMatcher usable as a filter: https://github.com/symfony/symfony/issues/5014
Comment #8
dbu commentedi just read through your "anatomy of a complex path". at least the case of recent node would be handled by a RouteEnhancer rather than by having a special route variant. probably the same for the Product use case - or you could store the controller into the route, though that would make changing the controller very hard as you have to make sure all routes get updated.
in general i would not have 3 routes matching /node/{id} and filter on other things, but one route and then in the RouteEnhancer decide on whatever criteria you need which controller should handle this. in the cmf we came up with:
now if you define content type as any drupal object including user and so on, there is just a few RouteEnhancers that add information to the matched route.
Comment #9
Crell commentedFor comparison, the current routing patch has the following implemented:
- Initial matcher: Path-only (includes the regexe borrowed from UrlMatcher)
- Partial matcher: HTTP Method
- Final matcher: Get the first
We want to add a mime-type partial matcher as soon as #1505080: [META] Content Negotiation for Accept Headers happens, but I think that's it for the default configuration. Individual sites are free to inject more in there through DIC configuration but I doubt most will.
The exception is "object type"; routing to different layouts based on node type, for instance. "This week" is a hypothetical that's a good thought exercise but I don't know anyone whose done it. Another real-world use case would be a different layout et al based on the metadata of the object; such as what Organic Group a node is associated to. (Yes, I have had this use case. Like, in the last week. :-) )
The Drupal contrib answer in Panels (our model system for much of this) is variants, which are a form of second-stage routing. Basically, the one route that /node/$nid matches to locates a controller that does a second level of routing based on "Other stuff", and sub-calls to some other controller, called a variant. People have implemented all sorts of weird "other stuff" logic, too. The problem is that somewhere around 1.37% of the Drupal population can wrap their head around variants, and I think they're all currently working on either WSCCI, SCOTCH, or VoDCa (Views in Drupal Core). (Yes, Variants drove us all to the bottle...)
One of the goals for the new routing system is to retain that level of flexibility without making people's brains bleed. To date we've been thinking of that as allowing multiple routes, that is, pushing the "other stuff" logic up a level since it's still, in a sense, routing. It sounds like dbu is saying CMF is doing... what Panels does, with 2 distinct stages. I find that interesting. Do people actually understand it? Is that something we could make not-suck with just a better UI?
Arguably, with the switch to a common default _controller after Munich and the assumption that SCOTCH's default controller will be pulling most of its information from CMI, it's possible that the need to use _controller for variants goes away. Sam, what if that just turned into checking for a layout.$nodetype CMI object? (OK, the logic would need to be more complex than that, but you get the idea.) Basically, variants are dead, long live variants.
We are already using a "route enhancer", sort of, to set the default _controller if only _content is specified. Perhaps that's a better place to implement neo-variants? I also still need to implement the object loading, which I was going to base off of Symfony full stack. David, you're saying CMF has a different mechanism for that?
Comment #10
dbu commentedhi crell,
indeed, looks light we invented some sort of "variants light" for the cmf. a route always can have a content (PHPCR reference) and if it has, the current DynamicRouter implementation accepts a map of content class to controller and content class to template with a default controller (the later would be default behaviour in drupal 7, with a generic template in case there is none specified).
in symfony we expect the devs to want to code rather than configure, so we expect the standard case for less-than-trivial customization will be to map a content type to a custom controller. for now we do this in the post-processing the matched route in DynamicRouter. we always thought about needing this post-processing only if the DynamicRouter was used. but for drupal you might need to do such things even if some fast-match router did the matching. so maybe we should listen to the controller-event [1] and do something there. the other reason we did not do this in the cmf is that at that point the actual route object is lost, and we use PHPCR references to get the content of a route...
pushing the decision into the controller is too late, it would boil down to two-step routing or pulling information instead of pushing.
so all in all, maybe we want to do a RouteEnhancement step in DynamicRouter but would need "something" to also do that step when in other routers. if you do another totally custom router, you could still do the enhancement step - though such a concept would prevent generic apache rewrite routing, only if you rewrite exact matches.
[1] http://symfony.com/doc/current/book/internals.html#kernel-controller-event
Comment #11
sdboyer commentedlotsa things to reply to. i'll break it up for clarity.
agreed. i don't want to push routing[-like] behavior into any later stage.
==========
with respect to that suggestion, i was imagining something simple. e.g., for HTTP method:
...
since the set of HTTP methods is immutable, it'd be perfectly safe to write them statically. then, routes define the bitset they can handle ($accepts = GET | POST), and the matcher (maybe even the NestedMatcher) can perform a simple bit check to see if a route can handle the Request.
==========
entirely agreed - access is different from routing, and the two should not be conflated. route selection is performed first, then access checks are run on selected route.
==========
i've been pondering the mechanism we use to pair route with conf. something like that is one possible approach, though it precludes the potentially-useful possibility of reusing the same conf on multiple routes. but maybe that's not a real use case. gotta do the pairing somehow, though.
==========
Kris and i were basically convinced in Munich that if we work off of the Storables stuff that's being done, loaders could be elegantly simple and full of reused code. i think we should use that as the baseline.
==========
that's not quite the scenario i was going for - i was trying to describe route selection based on an internal data feature. it's predicated on having multiple routes and selecting between them. as Larry pointed out, though:
so, i originally gave the 'This week' example with the intention of using it to illustrate something that people may TRY to do in the routing layer that could create challenges. it's far from the craziest business logic thing that i've heard about. we would be doing Drupaldom a disservice, however, if we don't at least give some thought to ways in which people might try to bend this system - i see business logic like that created all the time, and if it's possible, people will try to do it. and the thing that gets people into unmanageable situations in Panels is when they DO try to stack in the complex logic. So, with respect to this statement:
i disagree. it's not that they can't wrap their heads around it - they understand the essence of it just fine. the problem is, the tool isn't very sophisticated, and does not scale well into sophisticated cases. by defining some rules around what type of filtering/matching is done when, we can help people make better decisions when implementing their crazy business logic.
but, i diverted from addressing dbu's point about generating new routes. Crell's example of routing based on associated OG, which is a more conventional case. really, the key idea is the multiple routes on a given path thing. sooo....
==========
if there's a summary to my initial point, it's that we need discrete layers for solving different classes of routing problems. otherwise the system will tend to produce massive snarling snakepits of routing logic that are incomprehensible, undebuggable, and also maybe nonperformant. the first layer should deal with those very well-defined sets: HTTP method, Accept headers, user agent, etc.. the big criteria here are, again, well-defined ranges and clear fallback logic. subsequent layers are where we should get off into the weeds, relatively speaking, of dealing with less-well-defined criteria.
with that in mind, i believe we're taking a similar approach to this problem. certainly the RouteEnhancer, as you've described it (and linked me to its potential forerunner logic) constitutes a subsequent layer. however, it seems to me that even the RouteEnhancer might be a bit too far in. two reasons for that:
what we need is for the routing system to read all the routes which have been declared and attached to a particular path (and cannot be fully resolved by PartialMatchers alone), and then create a strategy for selecting between them. maybe that could be put into the FinalMatcher, but i'm not even necessarily sure about that. my current thinking is that we further differentiate the PartialMatcher into StrictSetMatcher and LooseSetMatcher (or something like that) and have all matchers implementing the former run first, and the latter run second. that would make it perfectly fine to allow the route to specify the RouteEnhancer that it should use, which would be an easy inroad for doing things like IPE selection.
one note, though - to take this approach, it means that loading the data objects would need to happen well before the RouteEnhancer, as they'd be potentially needed to resolve some of the LooseSetMatchers. chicken, egg.
really...it's time for me to code this, or at least pseudocode it. i'm gettin on it.
Comment #12
dbu commentedi would see a system wide set of route enhancers that first check if they have anything to enhance on the supplied route and enhance what they need to enhance. this is probably not something strictly tied to the route matching (though doing it before final selection could provide ways to add information that helps selecting the right final route)
so you would enhance the route after the LooseSetMatcher but before the StrictSetMatcher? this can make sense, but i am afraid this adds another overhead when we have to run every matcher twice. i would separate that into the FinalMatcher and let that one do the final decision what route to take. i think a multi-step final selection makes it harder to decide where to put the logic, and as it needs to boil down to exactly 1 route, i think we need one single logic that does the selection.
Comment #13
sdboyer commentedif (this aspect of) route enhancement is the process during which we, e.g., transform
{node}into a node object, then my preference would be that that be orthogonal to the phases of route matching/selection, and is done as-needed with lazy loading.system-wide route enhancement is interesting. my tendency is to want to say that route enhancement should be done at compile-time, not runtime, but there are advantages to runtime.
Comment #14
Crell commentedRelated: #1798214: Upcast request arguments/attributes to full objects
Comment #15
dbu commentedsdboyer: what do you mean exactly with orthogonal? would you insert a proxy object that loads the actual node when requested? for the cmf use case i think it is enough to just load the content document at the end of the routing process - if you handle the route for that content you are supposed to want to use it. otherwise you can just use a route without associated content. and i don't think the routing decision should depend on the content object.
for compile-time enhancement i don't know how that would look. loading a node object costs about the same whether its compiled into the route or done outside of the route - and dumping the data into the route sounds not realistic. but maybe i misunderstand what you mean by doing route enhancement at compile time.
Comment #16
sdboyer commentedthanks for that link, Crell - definitely intimately related. i'll give it a review.
by orthogonal, i mean not part of the same progression of 'phases' we're discussing with respect to doing different forms of matching.
sure, that makes perfect sense. if the route says it wants that data, it should use it.
yeah, i'm referring to something imagined & different with respect to doing compile-time enhancement, i think.
i'm pretty sure that at this point, i'm gonna stop making any useful sense without writing code. so i'll shut up until i have a patch :)
Comment #17
Crell commentedCompile-time enhancement would likely be some sort of flagging of a route that "this route requires the node converter and user converter, but not the Views object converter", so that we don't run all three converters on every frickin route we process. I'm discussing exactly that sort of logic for the access system with catch over in #1793520: Add access control mechanism for new router system.
Comment #18
sdboyer commentedok i've thought and coded more...lots more: #1812720: Implement the new panels-ish controller [it's a good purple]
now that i've worked through that, i have a clearer sense of these separation of concerns here. i'll follow up soon with, hopefully, some clearer thoughts.
Comment #19
Crell commentedWith the new CMF Routing work, I think this issue has been sufficiently resolved.