Currently Drupal supports the arbitrary creation of URL aliases, stored in the url_alias table. This is not really appropriate for the configuration system. It is not uncommon for there to be thousands of aliases in this table, and the config system doesn't offer the queryability that would be necessary to make this efficient. However there is some argument to be made that this is not actually configuration at all. In core, these aliases represent a 1:1 relationship between a URL and a piece of content. ('about' -> 'node/123'). In this case, shouldn't this information be stored with the node? I think there is a strong case to be made that aliasing should be moved into the entity system, and the arbitrary alias creation at admin/config/search/path should be removed entirely.

There are other types of aliasing that are appropriate for the config system (like pathauto url patterns) but until pathauto is in core that's not really under discussion for this purpose.

Thoughts?

Comments

daften’s picture

What about translatable content? (just a question, not sure how it is handled exactly now)

stevector’s picture

I agree (mostly). Pathauto declaring that new articles should alias in the pattern of article/[node:title] is an issue of configuration. The actual aliases of the 10,000 nodes are content.

Another problem here is that there's currently no audit trail for changes to the url alias for a given node. If a node has multiple revisions, those revisions may have changed the url alias. However, because the only storage mechanism is the aliases table, old aliases are thrown out.

Instead, historical alias data for an entity could live as field on the entity. When a new revision is saved, the new value of that field is moved to the alias table.

I think admin/config/search/path still has usefulness. A View may take arguments in the form of a tid or nid and there may be a need to alias these. I find admin/config/search/path handy for such situations.

Where is the argument being made that CMI should store each alias?

Dave Reid’s picture

Ack, this has me pulling out my hair.

Part of me thinks we underestimate just how much people use the URL alias functionality for arbitrary paths that are not linked to entities. And how often people use multiple aliases for the same path (commonly not a 1:1 relationship).

Part of me wants to make URL aliases fields so that they're revision-able and translatable.

Part of me wants to curl into a ball and die inside so I don't have to deal with them anymore.

Overall I don't think URL aliases are going to be appropriate to store in CMI. I do think path.module needs to have a better API and support adding it's aliases in hook_entity_load(), saving them in hook_entity_insert/update() and deleting them in hook_entity_delete(). I also do want the {url_alias} table to be able to track an associated entity type/id if it's appropriate (#686214: Add entity information to {url_alias}), but I don't think it should be mandatory. It would also be nice to be able to store additional data about an URL alias (aka if it was automatically generated): #937580: Add a {url_alias}.data column to allow modules to save extra information about URL aliases.

tl;dr: I don't know what we should do. And I want to pull out my hair.

chx’s picture

Let's make url aliases entities. You think I am kidding? I am not. We are changing the problem of relating url_aliases to entities to relating two entities to each other which we already need to solve (anything-taxonomy, node-user for authorship, node-comment, taxonomy-taxonomy for related and parent). Arbitrary data storage, be my guest. Translation? Don't we have a full blown initiative to have multilanguage entities? Meanwhile the base table won't change so the performance won't die.

Dave Reid’s picture

Yeah making them entities just for the data/storage API makes sense. They already have a PID auto-increment column, they just wouldn't be fieldable.

chx’s picture

Why not? I have no problems with making them fieldable...

However, this would be a really really strong use case for relation in core... cos this would be a symmetric relation between two entities. And then you do not need to make em fieldable cos relation is not a field.

eaton’s picture

I tend to agree with #3 this one. I just had a longish conversation with heyrocker in IRC about this, and there's a multipart problem that ultimately needs to be adressed. Whether or not CMI is an appropriate place for that problem is another ball of worms entirely. There are basically four major ways I've seen user-facing paths treated across a number of systems that use frontside controllers, drupal-style.

  1. Code based path-routers that use human-friendly slugs instead of raw IDs. (node/about instead of node/1)
  2. Code base path-routers that use multiple pieces of data to find the correct entity to load. (This is probably just a superset of #1, but it's worth separating)
  3. Full-text aliases that are generated by patterns including multiple pieces of data.
  4. Full-text aliases that are manually entered by content creators or editors.

Eliminating #4 is a nonstarter for most real world web sites. The ability to enter a totally customized URL for the odd piece of content is just not negotiable for large sites, or sites that need to do one-off promotional URLs and so on. The real question, IMO at least, is whether we can make the third scenario more like #2 or #1.

Today, core provides #4 and pathauto provides #3. Tools like Page Manager and Views, via multiple-argument paths, provide parts of #2 but not the whole deal. And individual entity implementations sometimes provide their own variations of #1, but it's currently very uncommon in the Drupal universe -- path aliases are just too easy and ubiquitous, and devs are trained to use unambiguous serial IDs and leave the rest up to path aliasing.

On many of the largest sites that I've worked on, #3 is just being used to simulate #2. They use a custom field with a name like 'short-url' or 'slug' to let users enter a URL-safe string that will be used as a segment of the final URL path. That approach -- hiding the manual path entry field but giving users explicit control over a 'url-safe string' -- means that they also have something other than the raw nid to use when building argument-driven paths in Panels and Views.

Things I will write that don't have an easy transition from the previous paragraph:

  1. A sufficiently motivated tinkerer could approximate option #1 in the current menu system by creating a node_by_slug_load() function, and using %node_by_slug in paths instead of %node. Specifying that kind of rule is totally configuration, as it's about a given router path the existence of a given property or field on an entity.
  2. #2 is harder, as flexible wildcard-based path routers (such as the ones I've worked with in django and rails) allow you to look up a single entity based on multiple wildcards, rather than just one. The route: /news/%topic_by_name/%year/%month-day/%item_by_slug is not separately loading a topic, a year, a month-day, and an item. It's loading a single item that matches the topic, year, month-day, and slug that came in via the URL. Page Manager allows you to load multiple entities based on multiple arguments, and Views lets you filter down big lists with multiple arguments until you have just one match, but nothing really approaches this full functionality.
  3. If our menu system supported a 'top level' wildcard -- IE, a path with a wildcard that matches any path not handled by an explicit router -- we'd be able to approximate full-path aliases as a special case of #2. having /news/%topic/%date/%slug load most articles while /%promotional_url_alias loads articles with a field whose value matches JUST that incoming string. This is nontrivial in terms of changes to the menu-routing system, but it could get us to a point where we don't have to use full-path aliases for ABSOLUTELY EVERYTHING.
  4. In all of these situations, I think it's reasonable to say that the routes and patterns discussed in items #1-#3 are configuration. The actual slug values, field values, and full-text aliases are all data, and can't be captured in configuration efficiently.

My take on it is that really improving the situation for paths can't be done in the CMI without tackling the menu/path routing issue. Anything else is just rearranging the chairs, IMO.

chx’s picture

I see only a very small problem with doing news/%topic/%date/%slug in D6/D7 and it's fixable in D8 for sure. So topic_load would basically check whether the $arg is a valid topic and if it is, then just return it. Same for date. Then slug would get the topic, the data and the slug as load arguments and just do the loading. Easy, I think. (The small problem is that all loaders would get all arguments because load arguments is not behaving nicely for multiple loaders)

eaton’s picture

Addendum: In systems #1 and #2, there also needs to be an explicit way of printing out a link to a given entity USING a particular route pattern. For individual entity paths, that can mean designating one particular route pattern as the 'canonical' pattern for it, much like the native path /node/%node is today. For non-entities, that means changing l() to something more expressive.

For non-entity paths, there's proabably not any way to avoid the concept of full-path-aliasing. Solving the issue for entity paths and standardizing on per-entity sub-url-strings (slugs, whatever) would certainly reduce the NUMBER of full-path aliases being dealt with.

I see only a very small problem with doing news/%topic/%date/%slug in D6/D7 and it's fixable in D8 for sure. So topic_load would basically check whether the $arg is a valid topic and if it is, then just return it. Same for date. Then slug would get the topic, the data and the slug as load arguments and just do the loading. Easy, I think. (The small problem is that all loaders would get all arguments because load arguments is not behaving nicely for multiple loaders)

Yeah, that's one way it could be done in the short term. Long term, that won't be as efficient as a system in which one loader function receives all of the arguments and does its own loading in one fell swoop. Using something like EntityFieldQuery to construct a single load from all of those incoming arguments would be closer to the rails/python approach.

sun’s picture

Component: configuration system » path.module
Issue tags: +WSCCI, +Configuration system

I never considered URL aliases to be configuration in the sense of the new configuration system.

At some point, someone started to say "It's either configuration or it's an entity." and the whole world kept on repeating. I object to this statement in it's pure form. There is a giant space between configuration and entities (as in Entity system entities).

  1. URL alias resolution must work early in bootstrap, within or even before the new router/kernel.
  2. A dependency on higher-level systems should be avoided.
  3. Sub-path-aliases are definitely configuration, and thus can be resolved very early.
  4. Random and entity-specific URL aliases, including context (e.g., langcode), are not (too many). They only need a database connection to resolve. Currently ongoing framework efforts will ultimately allow that to happen early.
  5. Those database records can be managed through high-lever systems; e.g., entity system, property/field system, etc (multiple possible; compare: files and file fields, taxonomy terms and term reference fields) — as long as that does not enforce the same dependencies on the lookup and resolution.
  6. The idea of custom router paths per entity (instead of URL aliases) sounds like a non-starter to me, as that would make the router significantly larger and more complex.
  7. This issue partially duplicates

    #1269742: Make path lookup code into a pluggable class
    #464164: Move URL alias functionality into a Path API module (still separate from the Path UI)

the arbitrary alias creation at admin/config/search/path should be removed entirely.

-1

eaton’s picture

The idea of custom router paths per entity (instead of URL aliases) sounds like a non-starter to me, as that would make the router significantly larger and more complex.

I may have misspoke, or I may not be understanding what you mean. When I discussed multiple router paths being able to resolve to the same loaded entity, I didn't mean to imply anything more complex than the system we have today -- /my-path/%slug can display node 1 just as easily as /node/%node.

gdd’s picture

Hey everyone, I'm just trying to solve the problem of 'What do we do with the things held in the url_aliases table?', not solve all the aliasing problems in core. Nobody (I don't think) is going to argue that pathauto patterns are not configuration, that is pretty simple.

@sun The reason I see this as belonging to entities, is that when you move content between servers, you will want this information to travel with it. If I export my about page, wouldn't I want my aliases to it to come along? Making this into a separate subsystem with its own rules seems like a step backwards to me, and it breaks a lot of the assumptions people will be making about how their content should behave. If an alias points to a specific node, then it is content as much as the title is. However it is implemented, this is true.

Regardless, I am not seeing anyone saying this belongs in CMI, which mirrors my feeling.

eaton’s picture

Hey everyone, I'm just trying to solve the problem of 'What do we do with the things held in the url_aliases table?', not solve all the aliasing problems in core. Nobody (I don't think) is going to argue that pathauto patterns are not configuration, that is pretty simple.

Errr. Sorry for that derail, then. ;-)

@sun The reason I see this as belonging to entities, is that when you move content between servers, you will want this information to travel with it. If I export my about page, wouldn't I want my aliases to it to come along? Making this into a separate subsystem with its own rules seems like a step backwards to me, and it breaks a lot of the assumptions people will be making about how their content should behave. If an alias points to a specific node, then it is content as much as the title is. However it is implemented, this is true.

I would agree. I think this is where it's important to distinguish between the following things:

  1. Full-path aliases that point to other paths - this is 'dumb content' that MIGHT become entity-content.
  2. Properties of entities used when a router finds them for loading - this is entity-content.
  3. The routes that define what properties are used to find and load the entities - this is either code or configuration, never entity-content
  4. The patterns and rules used to generate full-path aliases for existing paths - This is configuration

Does that sound about right?

chx’s picture

#13 sounds promising but the devil is in the details (and this is not a cute little devil like the BSD Beastie , no it's something really big and bad). #13.1 and #13.2 what are your entity types? My suggestion was "url_alias" as an entity type but from the distinction between 13.1 and 13.2 (and from the deployment requirements in 12) it sounds like as if #13.2 url_aliases would become a field and not an entity. Please clarify.

sun’s picture

If you want to stage content including their aliases between servers, then you do what you're already doing today for any other data: reference it. And add UUIDs.

{url_alias}.uuid can be added anytime.

I'd fully support to enhance Path module to provide a path reference field, which then can be tacked onto any kind of bundle.

Crell’s picture

sun tagged this for WSCCI, which is accurate since the issue here isn't path aliases per se; it's paths in general, and our path handling *is* changing as part of WSCCI, so the concept of aliases is likely to (or can if we so choose) change as well.

In the current implementation, I think we all agree that the values in the url_alias table are not CMI. What they are is debatable, but they're not something that belongs in CMI. Similarly, I agree that there is certainly value to some of those aliases having their canonical storage bound to an entity, since in many cases they are friendly-url for an entity. However, as others noted above that is not the 100% case, and there's enough other cases that we cannot eliminate.

Taking a step back, though, url aliases exist because Drupal hard codes its paths everywhere. Every node physically lives at node/$nid, and we then hard-code node/$nid as the path to every single node all over the code base. We have sort of in D7 started to fix that with the entity_url() method, but it's still haphazardly used at best, in part because it requires a loaded entity object. Given that, a first-second and last-second aliasing system is a quite logical way to deal with that problem.

However, one of the things that WSCCI is planning to change is that exact hard coding. Per the Denver Routing writeup, we intend to end up generating URLs by specifying a route machine name, and then you link to the machine name with arguments, rather than to a URL. The URL pattern used by a route can change, and your code will still work.

Now, Symfony full stack and Symfony CMF, as I understand it, use that to basically integrate what we call router items and URL aliases into a single system. Witness for instance Fabien's blog:

http://fabien.potencier.org/article/50/create-your-own-framework-on-top-...

The route there is something along the lines of /article/{article}/{title}, and the title is required, but then only the article is used by the system that generates the content of the page (as you learn if you try to hack the URL. :-) ). But that fact is not hard-coded anywhere; the Generator (equivalent of url()) handles that dynamically, and all articles could be moved to /stories/{article}/{title} with a config change, no code change. (Although it would certainly break bookmarks.)

Currently, Drupal's path handling is rather wonky, as we do not route based on the path. We route based on a complex derivative of the path, that includes de-aliasing, language prefixing, potentially spaces.module, and other insanity. I'm trying to figure out right now, actually, how we actually deal with that in the kernel.

One possibility is that we may not actually have a denormalized aliases table of doom. It is not out of the realm of possiblity that we have multiple request listeners that can extract the necessary information from the path, or manipulate it in a cleaner fashion than we do now. One such listener could look for the alias in the entity system directly (say querying a particular field?) while another looks in a manual aliases table, while another looks directly at pathauto configuration and works backwards. I don't know that is a good approach, but it is a possible approach.

Another option for things like language (I know, not aliases directly but relevant to this discussion) is to change the path route entirely. So if you're on a multi-lingual site, the node.view router item's path changes from node/{node} to {language}/node/{node}. Or something like that.

reads back

So I'm probably a bit off topic, it looks like. :-) It's late on a Sunday, cut me some slack. In any event, what I'm trying to get at is that we don't necessarily have to think in terms of retaining the giant url_alias table at all. There may be good reason to, but architecturally it's not a hard requirement. We can think further outside the box than that.

eaton’s picture

chx: #13 sounds promising but the devil is in the details (and this is not a cute little devil like the BSD Beastie , no it's something really big and bad). #13.1 and #13.2 what are your entity types? My suggestion was "url_alias" as an entity type but from the distinction between 13.1 and 13.2 (and from the deployment requirements in 12) it sounds like as if #13.2 url_aliases would become a field and not an entity. Please clarify.

Oh, there are devils in the overview, devils in the details, devils in the bullet points... Plenty of devils to go around. I was mostly just trying to clarify different kinds of data and mechanisms being discussed, and note where each one would fall RE the CMI work, even if we aren't currently using them.

In #13, the path-segments I was talking about are not what we currently consider path aliases. Using the current pathauto style alias generation mechanism, it would be closer to "A property or field that is generally used when constructing a path alias." The idea is to distinguish between 'small url-friendly strings that live on an entity and are used as PART of a path" from "the full-path alias that happens to point to a real router-path."

I... think I need a whiteboard.

Crell: However, one of the things that WSCCI is planning to change is that exact hard coding. Per the Denver Routing writeup, we intend to end up generating URLs by specifying a route machine name, and then you link to the machine name with arguments, rather than to a URL. The URL pattern used by a route can change, and your code will still work.

Yeah, that's a common pattern that works reasonably well in django, the system I have the most non-Drupal real world experience with. It's not perfect but it's a lot better than 1:1 string:string aliasing, like our current system.

I'm a fan of multiple request listeners, with the first one supplying an 'answer' winning the race. That would allow things like a fallback 404-handler style listener to handle 'full-path' aliases.

Of course, this is all secondary to what heyrocker was originally asking, which probably means that his suspicion is right: path aliases themselves aren't CMI at all.

gdd’s picture

So my question is, are we looking to solve all 4 of the use cases in #13 in Drupal 8? If so, then we should change this issue to 'Rearchitect path handling and move pathauto into core' because that is essentially what we are proposing. If not, then we should address 13.1 and 13.2 (what is currently in core) and move the rest into separate issues and/or punt it.

I think it makes a lot of sense to have the 13.2 use case be a field on entities, even though it complicates some of the needs as described by sun in #10. This is where the data belongs and where it makes the most sense for it to be located, so if we need to for instance maintain a denormalized table of aliases, then maybe that's what should happen. Then the only question is how/if we support aliases to non-entities in core.

Basically, lets figure out a way to focus this into something actionable.

catch’s picture

hook_url_inbound_alter() hasn't been mentioned yet. That's currently the closest thing we have to 'multiple listeners' since you can intercept any arbitrary path and translate it to a router item in there.

There was some discussion when trying to move path aliases out to a module that it should just implement hook_url_inbound_alter() and hook_url_outbound_alter() itself, which doesn't seem that far off to making it the last fallback if we moved to multiple listeners.

Crell’s picture

I think I have inadvertently ported hook_url_inbound_alter() to kernel listeners anyway as part of the kernel patch.

As it is now, the pathInfo value from the request (vis, the raw incoming path from HTTP) is virtually never used. Instead, it is mutated through 4 different listeners (exact structure subject to change) into $request->attributes->get('system_path'): Front page handling, de-aliasing, urldecode()ing, and language prefix extraction. That is the value that is then used for routing.

Once we figure out how to expose kernel listener events to modules (something we aren't doing yet but know full well that we have to) any module will be able to attach a KernelEvent::REQUEST listener and muck around with $request->attributes->get('system_path') before we get to routing. I've even implemented a simple utility base class exactly for that purpose.

hook_url_outbound_alter() would end up working in the Generator, which doesn't exist yet. That something I expect to tackle as part of the new router (or maybe the immediate next step, not sure); if it doesn't have events built into it, there's no reason I can think of right now that we couldn't include them in our own Generator implementation since we're going to have our own anyway.

So, perhaps this is already half-done? Or rather, will end up done as a result of later refactoring we can do in a few months? (I suppose, once we allow module to attach events, that would mean path aliases COULD, in fact, move entirely to a module. That would be both cool and disturbing. :-) )

gdd’s picture

There is some discussion of similar issues going on at #1479466: Convert IP address blocking to config system

dopry’s picture

To get back to heyrocker's original post, instead of the hey how do we fix the world ideas and make all things the same....

It sounds like you're trying to address the performance issues related to url_alias lookup in D6, D7. You seem to feel this is typically an issue when there are thousands of aliases....

In my experience, those thousands of aliases are generated by path_auto. I've seen very few sites with massive numbers of aliases that were manually input.

I would look more to path_auto's implementation, rather than the core path alias system. The core path aliases do their job rather well for the use case the were designed for. Setting up a few simple aliases to content for quick brochure sites and personal blogs.

Path auto leverages this system in a somewhat abusive manner in my opinion. A nice re-implementation of the path_auto functionality as an advanced router might be a way to approach solving this problem and proofing a few new concepts without having to work directly on core.

If Path Auto were re-implemented to act as more of a router by intercepting requests which match it's patterns, then converting them to canonical paths based on those patterns and recursing back into the menu system, you can bypass having to do the long string lookup on the url_alias table, not just for path_auto paths, but all paths. Just keep in mind that you start losing legacy paths. That could be addressed by keep track of old revisions to path auto patterns and redirecting them to the currently active path. Similar to the global redirect module.

It may be a new Dynamic Request Router module that competes with Path Auto, rather than disruptive changes to path_auto itself, that is appropriate for exploring such ideas.

gdd’s picture

I just submitted #1751210: Convert URL alias form element into a field and field widget to begin addressing this issue.

Dave Cohen’s picture

I found this thread searching for an unrelated bug. I'm not super well versed in the alias system - just enough to know its kinda broken in D7.

Anyway, I agree with dopry #22. A lot of this thread assumes only entities have aliases. Which isn't true. It's probably a great idea to have something like pathauto, only without the thousands of aliases. But that doesn't necessarily mean get rid of the current alias system.

Sun said, "URL alias resolution must work early in bootstrap, within or even before the new router/kernel.
A dependency on higher-level systems should be avoided." And I just want to point out that url rewriting does happen, and when the aliasing should happen after the rewriting, that implies that higher-level modules have been loaded. Right?

This is a tangent, but I suspect the order of rewriting and aliasing should be reversed, which I mentioned in #1801044: hook_url_outbound_alter has no effect on aliases.

gdd’s picture

Issue tags: +feature freeze

If we're going to do anything major here, it will have to be done before feature freeze, which means we need to get things in order.

gdd’s picture

Discussion around one major aspect of this is happening at #229568: Pathauto in Core

catch’s picture

Priority: Normal » Major

Bumping this to major, since #1269742: Make path lookup code into a pluggable class caused an unintentional API regression here.

gdd’s picture

Trying to figure out how to resolve this issue, since it still needs to have something done to it by feature freeze (or alternatively a consensus around not doing anything.) Here is some summary of the current discussion as I see it

  1. #1751210: Convert URL alias form element into a field and field widget would solve some of these problems by removing aliases from the table and moving them into entity storage. Pathauto could then store its aliases there, custom aliases could go there, aliases brought forward in migration, etc. However the issue also seems somewhat stuck to me. There is a patch but it doesn't appear to have consensus yet, with concerns around the UI. It also doesn't require anyone use it, someone could still just dump 10,000 node aliases into {url_alias} if they want.
  2. chx had an interesting idea in #4 with making aliases into entities that you reference from other entities using Entity Reference (now in core.) If we did this we could (I think) do away with the table, and convert the current admin UI to create entities. I don't know how this affects performance at all, which is a concern for something used so frequently, on every request.
  3. We can do something like I suggested in #1479466-66: Convert IP address blocking to config system: Maintain the aliases in CMI, but also expand them to a lookup table for performance.
  4. We can just punt it and say we aren't converting it to config or entities or anything else. We just leave the table and functionality as it is. I personally think that this is the wrong approach, because this is something people are going to expect to be able to deploy, regardless of whether it is config or content. You will want this information to follow along with your site's development. I consider saying "You can deploy everything in Drupal except your aliases" to be a failure of CMI, so I don't really consider this an option. I list it only in the interest of completeness.

I think that no matter what else happens, we are going to end up needing to do #2 or #3 in this list. I prefer #3 because I think it will be easier and maintains absolute feature/performance parity with what we've already got. However I definitely think this issue warrants more attention and discussion than its gotten. More of that please.

sun’s picture

Err, NACK on 1)

As the issue title suggests, #1751210: Convert URL alias form element into a field and field widget only converts the URL alias form element into a field widget provided by Path module. The form element is injected manually via hooks currently. It does not change the storage of URL aliases. The field stores the original/actual user input for historical reference - and that is, primarily in order to make it possible for PathAuto to change the URL pattern for a particular entity, which in turn might need to trigger alias updates for existing entities; since we have the original/actual user-entered value at hand, we're able to re-construct aliases regardless of how the pattern has changed.

In short, that issue does not change how aliases are stored. It only converts the form element that is manually injected (for nodes only) into a field widget, which leads to a massive simplification, and also allows you to attach it to other entity types/bundles. In general, that issue is rather high-level, and doesn't really care or change the low-level path/alias handling. Lastly, not sure why you think it's stuck — the issue has a concrete, working patch, and merely needs review.

With regard to the other bullet points/options:

#3 is still based on the assumption of PathAuto-alike URL alias patterns, not individual/custom aliases per entity. I think there's wide-spread agreement that, if there'd be PathAuto patterns in core, then those totally be config. However, overall progress on pathauto is kinda stuck - the conversion to a path field widget is only marginally related.

chx’s picture

Number three, while deplorable
will make it CMI deployable.

Finding alias
in all areas
can release
with relief.

gdd’s picture

#3 is still based on the assumption of PathAuto-alike URL alias patterns, not individual/custom aliases per entity. I think there's wide-spread agreement that, if there'd be PathAuto patterns in core, then those totally be config. However, overall progress on pathauto is kinda stuck - the conversion to a path field widget is only marginally related.

I don't get this comment. It only talks about the aliases in {url-alias} (whether generated or not, and whether directed at entities or not.) Of course the patterns should be config.

In the meantime, writing such a patch as #3 is blocked on #1886478: Bring back hook_config_import_CRUD() hooks unless there is thought that they should be Config Entities, which I won't rule out but I view this more as a list of strings rather than each alias having its own identity as an individual "thing".

catch’s picture

tbh the entity + reference suggestion sounds most realistic at the moment. I don't think CMI should need to worry about scaling to millions of records, at least not the core implementations. Also we're talking about caching config objects per-route, that's going to be tricky if path aliases end up in there in terms of cache pollution.

xjm’s picture

I don't get this comment. It only talks about the aliases in {url-alias} (whether generated or not, and whether directed at entities or not.) Of course the patterns should be config.

As far as I know, any alias for (e.g.) node/13 also goes into that table, and so (contrary to what I said on IRC earlier today) I think they aren't ready to become deployable yet, even leaving aside the scaling question.

Edit: Sorry, I should read the whole issue before commenting. sun mentioned UUIDs already, etc.

amateescu’s picture

tbh the entity + reference suggestion sounds most realistic at the moment.

Agreed :)

gdd’s picture

After some thought and discussion with various people I also think we need to revoke #28.3 from contention. I agree there's just no way to scale it reasonably. As an exceptional example, marcingy ran a query on Examiner's database and they have over 14 *million* aliases. So I guess we need to move in the direction of entities and references. I don't have a good picture in my head of how that would look architecturally.

Also remember that while the majority of aliases are referenced from entities ( over 99% of Examiner's aliases are this kind of reference) not all aliases fit this pattern. There are aliases to things like user/login, views, etc. We could just say "That isn't Drupal's problem, use htaccess" and I'd be OK with that. Or we could kick it to contrib. We need to come up with what to do with these non-entity-related aliases though.

Crell’s picture

Back at the Paris Entity API sprint, we came to a recommendation for extrinsic entities with entity reference-like behavior (#1801304: Add Entity reference field) to handle things like sticky and promoted, noting that such an implementation effectively put "flag engine in core". While URLs are certainly different than a boolean, I don't think they're THAT different conceptually and could be implemented the same way: A separate "url alias" entity that has a source path, destination path, and optionally entity reference. When editing an entity, you look up a possible corresponding alias entity by backreference and use it as needed. As long as the ER field is optional it should still handle arbitrary paths just fine; as long as there's a nicely denormalized lookup mechanism then the path alias system should be able to handle it. (And if entity-based storage can happen, that table doesn't even have to be a separate denormalization.)

catch’s picture

One thing with arbitrary aliases like user/login - those are reasonable to keep in the config system, so it wouldn't be impossible to have two implementations - the entity aliases, plus a global arbitrary alias system (which could be a single CMI file probably).

The other thing with those global aliases is if we actually use the generator for generating links at some point, say you have to use hook_router_alter() (or whatever it is by then) to move them in the router itself.

plach’s picture

+1 to #37, I was thinking the same. We may want to define a consistent API to "mask" the different storages.

gdd’s picture

The problem I see with having two implementations is that the one stored in config will now be open to abuse (whether intentional or not) the same way the current one is. I mean, we could then say "Your own fault" which is fine, but I'd rather have another solution.

I am perfectly happy saying that arbitrary aliases have to go some other route (be it an internal API or 'just put them in you htaccess' or whatever.)

Crell’s picture

I spoke with catch a bit in IRC about #37. While "just edit the route" will work for route-defined paths (which is exactly the advantage of using the route and generator), that won't help for placeholder-using paths, say Views arguments.

However, also keep in mind #1888424: Make Drupal's URL generation logic available to HttpKernel, and minimize code repetition/divergence.

Once that happens, we'll have two path-messing-with points:

1) kernel.request event, which we're already using for alias resolution and for the other futzing that we do.
2) A new generator event, which replaces hook_url_outbound_alter(), and gets cached.

And we can have multiple equally-privileged systems listening on both. That means, it should be possible to allow for a 4-part setup:

1) One listener pair (request and generate) that looks up based on entity alias fields.
2) One listener pair that does arbitrary runtime rules (most of pathauto). These rules are configuration.
3) One listener pair that uses some arbitrary list, the scaling of which is your problem. (Config or content, whatever, doesn't matter to me.)
4) One listener pair that does site-specific black magic that the alter hooks were for in the past. (custom code only)

So path aliases as they exist today would only be number 3, and would be a few orders of magnitude smaller than they are now. The others are new, but take over most of what the path alias table is used for now. The details of those can be hashed out in separate issues in parallel.

We can probably tweak the inbound logic somewhat to make it more performant, but we don't need to deal with that implementation detail here.

Would that satisfy all present needs?

gdd’s picture

Spent some time talking with crell, davereid, berdir, and linclark about this on IRC today.

Basically Crell's proposal above is saying "Leave the current functionality in core, but offer enough alternate solutions that nobody really has to use it except very rarely." Unfortunately we all agreed that his proposal for making Pathauto-style aliases work without actually storing the aliases themselves (just evaluating the patterns on the fly as in #40.2) isn't really workable.

So I think we're pretty well set on aliases being entities. This would mean that #1751210: Convert URL alias form element into a field and field widget would become an Entity Reference field, and the alias entity would be created when the node is saved. We should probably change the admin UI to also create alias entities, to keep all the aliases in one place. This has the downside of there being stuff in /admin that is not deployable through CMI which is kind of a bummer, but we're also about to have that problem with menu links as well (#916388: Convert menu links into entities) so it's something we'll just have to deal with.

Any objections?

Dave Reid’s picture

Nope, that sounds reasonable. As long as we can ensure that we're standardizing URL alises to work like other things (menu links), at least we're not using another one-off approach.

andypost’s picture

tim.plunkett’s picture

Issue summary: View changes

Is this still on the table for D8?

pwolanin’s picture

Version: 8.0.x-dev » 8.1.x-dev

Looks at this point like it would be a BC-compatible addition in 8.1.x at the earliest.

sammarks15’s picture

I'm very interested in the idea of implementing a dynamic routing system that would have various "route patterns" configured for entity bundles, and then stick something in front of the request (much like the Path Alias matching system already does) to match a specific URL to an entity based on its "route pattern." For example, you might have a route pattern configured for a node of the news article bundle that would be something like this: "news/{date}/{month}/{year}/{title}." When a request comes in with the following URL: "news/02/02/2014/test-article," it separates the arguments and runs an EntityFieldQuery to get the node that corresponds to the page, and then render it. Since this matching is obviously not very performant, the URL -> Entity relationship is stored in the cache.

In the reverse case, something like entity_url would take a node and a "default route" and fill in the arguments with the URL-sanitized values (running it through something similar to pathauto_cleanstring).

Implementing a system like this (as well as attaching path aliases to entities, which I believe is already implemented), would still preserve the ability to sync path aliases with their respective entities, as the path alias is not stored with the entity - it's generated.

Given the current state of pathauto's Drupal 8 support, I would like to take the opportunity to either:

  • Improve pathauto to implement this new advanced routing (not sure this is the best idea because it radically changes the functionality of the module, but it's all done in the backend so it won't really be noticeable from a user's standpoint).
  • Create a new module with this functionality implemented, or
  • Implement this functionality in core (definitely too late in the feature freeze to do it for the 8.0 release, but we might want to consider it for 8.1).

And given that, I have a couple of questions:

  1. Is there something blatantly wrong with this approach? I read through the thread above and assume that's what heyrocker and Crell were going for in a couple of their posts.
  2. Would this be better suited as a separate module to be integrated into core later if it's successful, or should we try to implement this in core before an 8.1 release?

Edit: Also, in my tinkering with Drupal 8 Beta 12, I've noticed there's emphasis placed on using the route name + arguments to refer to URLs instead of using the system path. Shouldn't the URL alias system be updated to use this scheme as well? While the user may be able to input the system path, shouldn't it store the reference to the route + arguments in the background so that the system path of a route can change without breaking things? Shouldn't we also update the path matching at the beginning of the request to map to a route + arguments instead of the system path? After a quick Google search, I found #2346189: Denormalizing paths into route names/parameters is brittle / broken which talks about this functionality in the Link module. Or maybe it already does this and I'm just confused with the mention of source URIs, in which case I apologize for my ignorance :)

mbaynton’s picture

This work might be a good time to pave the way for a resolution to #121362: Do not allow existing or reserved paths as aliases. See my comment #49.

pwolanin’s picture

@sammarks15 - so the trick is that different routes might serve the same path, but you'd want the same alis to be in effect. Basically, while using route names is the direction we have gone, it's really not suitable for everything

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

phenaproxima’s picture

I've been thinking about this issue, and after some discussion with @catch on IRC, I'd like to take a stab at it.

This is the approach I want to take:

  • Create a completely new module called path_alias. The ultimate idea is for this module to supersede, or merge with, the core Path module.
  • Path Alias defines a new content entity type, path_alias. It is translatable, but not bundle-able or version-able. It consists of two base fields: internal_path and external_path, both strings.
  • Entities that currently receive the path base field also receive a new entity reference field, called path_alias, that references path_alias entities. This is a standard entity reference field, but it uses a custom widget/item class that creates the alias on the fly as needed. (Since entity reference fields already offer that functionality to some extent, it shouldn't be too much of a stretch.)
  • The core path_processor_alias service is swapped out with a new implementation that determines the appropriate alias by querying for path_alias entities, rather than reading the url_alias table.
  • Eventually, a migration path is created that converts all existing entries in the url_alias table to path_alias entities.

I'll write a proof-of-concept version of this module and post it in here.

Fabianx’s picture

timmillwood’s picture

  • I think path_alias should be revisionable, unless you have a really good argument why it shouldn't.
  • Doubt you'll be able to add a path_alias reference field and have the module uninstallable, you'll have to do it as a computed field like content_moderation does.
  • Will this allows us to have two path_alias entities with the same path alias? If so can we have a plugin or tagged service so other modules can hook in and determine which alias is used.
moshe weitzman’s picture

Status: Active » Closed (duplicate)

Most everyone agrees with transitioning to content entities so lets work out details at #2336597: Convert path aliases to full featured entities

phenaproxima’s picture

I concur. Unfollowing this issue and focusing my efforts on that one.

Chris Matthews’s picture