If a node has two aliases in the url_aliases table, Global Redirect redirects the system path (node/25) to the first alias, but doesn't redirect the second alias to the first alias. Therefore there are two URLs that provide access to the same node -- the two aliases in the url_aliases table. This is not ideal for SEO, and hence the need for global redirect. This feature is required to fulfill 301 redirect needs for many large sites where one URL per content item is important and multiple aliases per node are required.

Ideally it would be possible to choose which alias is the primary URL nodes / system paths. This could involve another settings/pathauto option:

How does is the primary alias selected?

  1. most recently added alias
  2. oldest alias
  3. [others?]

[checkbox] can an author manually select the primary URL (and over-ride the auto-selection above)?

(maybe this one should be a permission?)

Comments

nicholasthompson’s picture

Assigned: Unassigned » nicholasthompson

Interesting idea - I like really it...

A node could have a primary alias and "n" number of secondary ones which could be an alternative way of redirecting old URL's to new content (eg, when migrating sites into Drupal from old CMS's or static sites).

This should be fairly easy - my only concern would be overhead. Currently Global Redirect causes up to 3 extra hits per page load. Hits for multiple URL's could/will cause even more. At least these hits are small though...

Thanks for the suggestion! I'll try to include it in a release soon. Although this is tagged as 5.x - I'll get this done for 4.7 and 5.x.

Bevan’s picture

Glad you like it! A simple early implementation could exclude the UI / permissions part and and focus on handling multiple aliases by always redirecting to the most recently added one.

Why does it cause THREE extra hits? There should only by one extra: the first hit to the system path, or a secondary URL, drupal returns a 301 permanent to the primary alias, and the real hit to the primary alias.

nicholasthompson’s picture

Sorry - I dont think I explained that entirely clearly. Currently Global Redirect causes 3 "extra" hits per page load in terms of there would be no extra hits if the module was disabled...

Hit 1 (COMPULSORY): Check current request against system path's for an alias.
Hit 2 (OPTIONAL): Check if the current request ends in a slash - if so, check system paths for an alias against the current request WITHOUT its slash
Hit 3 (IF NO MATCH ON 1 OR 2): Check if the current request matches the system's frontpage path - if so, redirect to the frontpage... This is currently causing issues with login's as reported in another issue.

If this feature request were implemented, there would be further hits... My current idea would be to scan all destination URL's for the current request, then - upon match - pull up the system path for that alias. Then lookup all aliases for THAT system path, ordering by alias ID so the first one is used. This would be an initial release.

I think the best way to implement this feature would be to form_alter the node's fieldset for path alias' and have a set of radio buttons below the textfield to show extra URL's. If there is only 1 alias then the radio button is disabled as there will be no need for a primary. If there is more than 1, then you can select which is primary using a radio button.

This would then go into a table generated by Global Redirect. This table would simply be a 1 column table with Node ID as the field.

The section of the init() function of this module which will be dealing with secondary alias' can do something like a left join with this new table onto a query similar to the alias lookup query and then do an ORDER BY so that the row which exists as primary goes to the top (not sure if NULL goes to bottom with ASC or DESC - trial and error will tell). You'd also need to do a secondary order on alias ID so that if no primary is set, it reverts to its default rule of lowest alias ID becoming primary.

This is just a quick mental sketch being "dumped" here - thoughts would be greatly appreciated.

I hope this makes sense!

RayZ’s picture

If I understand what you want to accomplish, it sounds like Path Redirect may provide a manually administered workaround.

nicholasthompson’s picture

That looks interesting - although it might be more convenient if all that was node-based rather than administered separately.

Plus this feature would help you control what happened if Path Auto decided to make multiple URL's for the same node (easily possible if someone creates a term, it gets indexed by google for 2 months and then they rename the term - you dont want to loose the page that has been indexed for 2 months).

Bevan’s picture

Ah! Now I see where we misunderstood. I thought by 'hit' you meant a client request to the server.

I don't have much programming experience, but what you said sounds like it would work. My only concern as that you'd use nid on the gloablredirect table. Remember that user profiles, terms, views, and other non-node objects can also have aliases, and multiple aliases. In fact pathauto can quite easily generate multiple aliases for terms if you change the term name, for example. Although nodes are the primary concern here, other pages are also important and get indexed.

RayZ; Path Redirect would do the trick with a lot of manual work (for large sites), however an automated solution is necessary for large sites that run pathauto. In our case, our content writers don't understand URL management enough to do all that manually, and I don't have time for laborous manual tasks like that.

nicholasthompson’s picture

Good point - nid is not the right identifier. It will have to be the system or source path... But this involves duplicating rows out of the alias table... Ideally I dont want to be modifying the url_alias table.

hass’s picture

Take "path_redirect" module...

Aside - if you have different url aliases pointing to the same node - you have "duplicate content". Don't do this!!!

nicholasthompson’s picture

Correct - you do, but what if you could set a primary alias for a node and all the others redirected to them?

How usefull would it be if you were implementing a new site based on an old one and you wanted to preserve URL's but redirect them to new, neater, nicer ones. By adding the old and the new URL and setting the new one as the primary, not only would node/123 be direct to the right place, but so would the old URL.

I will look into the path_redirect module - however I think I've seen it before and its kind of a halfway-house between what we're talking about and nothing at all. I think it just allows you to specify a URL to redirect to another URL. I'm not sure it gives you control on a per-node basis... But I'll look into it.

Tobias Maier’s picture

I know, setting an alias as the default alias is a missing feature in drupal.
But I don't think global redirect is the place to fix this.
Global redirect should be easy and do only its job: prevent duplicate content for SEO

If you move over from an old site to a new site path redirect is your module. I'm using it happily on all of my sites.
So please don't "over enhance" your module keep it simple!

If you would ask me, I would set the status to won't fix

Think about one thing: as more queries you have on every page request and as more complex this queries get as more time will it need to generate a site.

a true solution would fix it on the root: url() or better drupal_get_path_alias() are the places, where a patch should start at.

hass’s picture

yes, tobi. thats the case. keep it simple and what it is made for with as less SQL request as possible,

An interim redirection task from old url aliases to new ones can really be solved with path_redirect. i'm using path_redirect and it works well for such URL transitions.

Bevan’s picture

I disagree that Global Redirect is the wrong module for this feature. Global redirect's 'reason' and purpose, as taken from http://drupal.org/project/globalredirect, is:

Once enabled, an alias provides a nice URL for a node. However it doesn't remove the old URL (eg node/1234). This is a problem as you now have two URL's representing precisely the same content. You're getting into dangerous territory for duplicate pages which can get you sandboxed by the likes of Google!

This feature request fits perfectly in with this purpose, and in fact completes it. Globalredirect currently does not achieve the goal of one-URL-per-node for large sites requiring multiple redirects or aliases per node.

In the end it's up to the module owner if it gets included in this module or not.

Other rebuttle;

Aside - if you have different url aliases pointing to the same node - you have "duplicate content". Don't do this!!!

pathauto (with certain configurations) does this already (automatedly), as do certain business models (manually) for websites with specific marketing and SEO needs.

If you move over from an old site to a new site path redirect is your module. I'm using it happily on all of my sites.

pathredirect does that well for site 'imports' etc. -- but that's not the purpose of this feature. This feature would, in addition to it's main purpose, provide an automated way to do this, possibly a faster and easier way.

Think about one thing: as more queries you have on every page request and as more complex this queries get as more time will it need to generate a site.

Good point. What's a better way? This feature need not be 'on' by default.

I'm not a programmer or sysadmin, so maybe this is a stupid idea; what about adding lines to .htaccess when/as aliases are created/edited? This would offload the work to apache, which will handle it much better than drupal, and not require ANY extra SQL queries. Permissions to write to .htaccess, and security could be major issues with this method. Could they be overcome?

Good point - nid is not the right identifier. It will have to be the system or source path... But this involves duplicating rows out of the alias table... Ideally I dont want to be modifying the url_alias table.

Duplicating rows is not ideal -- neither is changing core tables. You could look at it as row-duplication, or you could view it as a relationship by which path/to/a/page is the unique ID, after all, the path is just part of a URI (URL), and a URI is an ID; Uniform Resource IDentifier. Additionaly, all aliases in the url_alias table should be unique. Alternatively, the .htaccess method above would mean that no database changes are required -- although they could be useful to restore broken .htaccess files.

Possible solutions I can see at this point:

  • the path is considered an ID
  • add an ID column to the url_aliases table
  • add the relationship to the url_aliases table itself (no new table)
  • use the .htaccess method
Bevan’s picture

I just noticed that table url_alias has an ID column: pid -- it has auto_increment and is the primary index. Therefore the issue of repeating rows or ''pretending' the path is an ID, is not an issue. A relationship table, specific to globalredirect, that doesn't repeat rows, or change core tables could be as simple as two columns:

  1. one for the pid, the primary index, that references the row in the url_alias table, and hence any url alias
  2. the pid for the default url_alias for the pid in the first column

Such a small table would probably make more sense as one extra column on the url_alias table, but that would best be done in the path module, which would have to wait for a major release version of that module.

I think I'm really getting out of my depth here. Probably someone with more drupal and programming experience needs to give me some feedback and tell me why I should learn how to do this properly...

Tobias Maier’s picture

I don't have much time yet, but here comes my suggestion for you, hass:

write a _seperate_ module, which runs by cron.
This module looks at the url_alias table and moves the duplicate ones with the lowest pid (=oldest entries) to the path_redirect module.
But please let the admins define exceptions, which should not be moved and stay as the default path.
Then talk with the pathauto maintainer to include an option, which moves old paths directly to the path_redirect.module

Bevan’s picture

That sounds like a very tidy solution indeed. There would be full utilization of existing code that way. I need to check out path redirect to see how it works.

Can someone tell me why writing to .htaccess would or wouldn't work? I understand it's duplicating the purpose of path_redirect. But the server-load advantages might warrant it's advantages. Does anyone agree or disagree?

Bevan’s picture

I've started a new topic as this one is changing direction and form: http://drupal.org/node/118575

nicholasthompson’s picture

As much as using .htaccess seems like a more efficient way of doing things (ie, Drupal is slow), bear in mind that the htaccess will be used (AGAIK) for EVERY hit, including images and static files - even CSS. Thats quite an overload. It'd be fine if you have, say, less than 100... But for a large site (eg www.sportbusiness.com) with something short of 27,000 aliases - its not really feasible.

Bevan’s picture

Status: Active » Closed (won't fix)

chx is working a feature for drupal 6 that will do this. http://drupal.org/node/106094

There is no point in developing this unless it is for, and only for drupal 5. Any objections to closing it?

grendzy’s picture

Version: 5.x-1.0 » 6.x-1.x-dev

It looks like this never made it into Drupal 6. I think this would be an awesome feature, especially for those sites that use pathauto's setting for "Create a new alias. Leave the existing alias functioning."

Edit: nevermind, it looks like pathauto already does this when path_redirect is enabled.