First, let me say that I'm not sure if this is just bug report or feature request. But since the code doesn't behave in what I think is a correct manner I'm calling it bug report.

So we have canonical urls, that's great, that means that if the client actually wants duplicate content for business reasons we can have it with out killing SEO. Right???

Not quite.

Take any drupal 7 site with global redirect, and with the canonical url setting turned on.

Create a node for argument sake, let's say node/13, though this is completely repeatable no matter what node nid number you use on your site.

Give your node an url path setting "fashion/products/gloves"

Now via menu: ( Configuration ) -> _SEARCH AND METADATA_ -> Url Aliases -> + Add Alias

enter our node again "node/13" and a new alias: "fashion/products" and Save.

Everything works, in the basic sense that if a user hops on /fashion/products or on /fashion/products/gloves in both cases they get to see the content of node/13 but the problem is that global redirect is re-writing the original alias "fashion/products/gloves" with a drupal_goto() to become "/fashion/products", because the particular alias that the user used "fashion/products/gloves" has a lower path->pid than the other one (added second) "fashion/products".

Lovely, -- or not quite :P

The client's idea being that the products "landing" page [fashion/products] would show the "gloves" content by default [node/13]. Via an unrelated menu, the user could browse to the "pants", "blouses", or other products sub pages. But should the user click on "gloves" from that menu they should see the url fashion/products/gloves.

In my opinion, the error lies in the globalredirect's hook_init. That function relies on the include/path.inc function: drupal_get_path_alias() around line 193 of the file: sites/all/modules/globalredirect/globalredirect.module. Which, as the comment says, simply populates $alias with *any* alias. Then, this *any* alias, which is always the latest alias by current core code, is compared to the request around line 220 of the same file. -- no match, so the drupal_goto() on line 223 is invoked.

This usage specifically prevents the case the client's looking for.

Sure, I could hack the site, make a second node (node/14) that uses php for input type and that loads the content of the first, (node/13) and then I could give that new node the second url path???... I don't like it -- feels TOO hacky to me. :-/ And even if I do go that method, then I really will have duplicate content and I'd have to also hack the canonical code to get both node/13 and node/14 pointing at "/fashion/products/gloves".

What I suggest is, I think, simple. You just need another if wrapper around the drupal_goto() ... something like


// Since canonical urls might permit us to have multiple aliases for a single path
// we should allow a slightly less stringent testing when the canonical option is set
//
if ((!$settings['canonical']) || (!path_load(array ( 'source' => $current_path, 'alias' => $request_path ))))
{
  drupal_goto($alias, $options, 301);
}

To be honest I'm not 100% sure this will work in all cases, but I've tried it. And it solves the base problem for the client's site, that is with this extra wrapper, a user can load "/fashion/products/gloves" and see the content from node/13 without a redirect.

-- However ... :( this code I wrote also breaks the canonical meta tag -- as it produces:

for "/fashion/products" -> <link rel="canonical" href="http://mysite.com/fashion/products" />

and for "/fashion/products/gloves" -> <link rel="canonical" href="http://mysite.com/fashion/products/gloves" />

Once again ... duplicate content 0.o

Unfortunately I know that just using the $alias value, (calculated earlier in the globalredirect_init() function) is incorrect in this case.

However I note that no matter how many url aliases I add via the admin url alias interface, when I edit the real node, I always see the first alias in the form. I investigated this and found that the nodes' url path alias is set by the path module, which doesn't use drupal_get_path_alias() to get the alias. Instead it uses this code:

from: modules/path/path.module lines 98 through 114

  $path = array();
  if (!empty($form['#node']->nid)) {
    $conditions = array('source' => 'node/' . $form['#node']->nid);
    if ($form['#node']->language != LANGUAGE_NONE) {
      $conditions['language'] = $form['#node']->language;
    }
    $path = path_load($conditions);
    if ($path === FALSE) {
      $path = array();
    }
  }
  $path += array(
    'pid' => NULL,
    'source' => isset($form['#node']->nid) ? 'node/' . $form['#node']->nid :     NULL,
    'alias' => '',
    'language' => isset($form['#node']->language) ? $form['#node']->language     : LANGUAGE_NONE,
  );

I can add code for canonical that mimics this by changing the code that looks like this:

sites/all/modules/globalredirect/globalredirect.module lines 241 - 246

  if ($settings['canonical']) {
    drupal_add_html_head_link(array(
     'rel' => 'canonical',
      'href' => url(drupal_is_front_page() ? '<front>' : $request_path, array('absolute' => TRUE, 'query' => $query_string)),
    ));
  }

so change that to:

  if ($settings['canonical']) {
    // default canonical -- in case we're not looking at a node path.
    $canonical = url(drupal_is_front_page() ? '<front>' : $request_path, array('absolute' => TRUE, 'query' => $query_string));

    // Try to get a $nid value from the source url
    $parts = explode('/', $current_path);

    if (('node' == $parts[0]) && (is_numeric($parts[1])) &&
        (2 == count($parts)))
    {
      $nid  = $parts[1];
      $node = node_load($nid); 
      $path = array ( );
      $lang = isset($node->language) ? $node->language : LANGUAGE_NONE;

      if ($nid)
      {
        $conditions = array('source' => $current_path);

        if ($lang != LANGUAGE_NONE)
        {
          $conditions['language'] = $lang;
        }
        $path = path_load($conditions);

        if ($path === FALSE)
        {
          $path = array ( );
        }
      }
       $canonical = isset($path['alias']) ? $path['alias'] : $canonical;
    }
    drupal_add_html_head_link(array ( 'rel' => 'canonical'
                                  ,   'href' => $canonical
                                  ,  ));
  }

So, there you have it. I really think that all of this code must be worth considering -- maybe a new option for the module? Probably others can make it even better still. But I hope something comes of this because I've currently detached globalredirect from updates on the client's site to prevent loosing the changes that at least work for them. And that makes me a bit uncomfortable.

But at the very least, anyone else struggling with canonical urls in the same manner has *a* fix here available.

Comments

davebrenner’s picture

Excellent fix. Thank you this was exactly what I needed.
Dave

michaek’s picture

The newest alias may not be the best alias - for instance, if you're adding an overlooked legacy URL to redirect to the "current" alias.

I think a better pattern is to provide a weight field for URL aliases, and allow users with the proper permissions to manage the weight of an alias to set it as the default (and if it's removed, what the next default is, and so on).

michaek’s picture

Last week, I spent a little time working through the use case I described in the last comment, and my version of the globalredirect module respected weights very nicely - always redirecting to the lightest-weighted alias - but it wasn't a complete solution, as outbound links (from the Path and path.inc) won't respect weight as they're ordered by pid (and language, unless language is LANGUAGE_NONE).

That means that the outbound links will still be to the newest (not necessarily lightest) URL alias, unless we get involved with some outbound rewriting (http://api.drupal.org/api/drupal/modules%21system%21system.api.php/funct...) which seems like a recipe for re-implementing all the performance optimization in path.inc. Yucky!

It seems like we're left with two options:

  1. Patch core to include weight in the Path module and path.inc queries.
  2. Reset pids for aliases reflecting their weights upon save.

The first option is cleaner, but it seems very unlikely to make it into core. The second option doesn't have a good smell, but it'll allow Path to continue to operate without changes. In light of that, I'm going to take a look at rewriting aliases to keep their pids in weight order.

I'll be pushing my changes here: https://github.com/michaek/drupal_globalredirect (Currently, it's just the 7.x-1.x HEAD.)

michaek’s picture

I've pushed the changes I mentioned. I'm pretty happy with how it's turned out, though I'm sure there are cases I haven't considered. YMMV!

trante’s picture

If we have two aliases for same node, what Global redirect does?
It gives http code 200 for both of the aliases.
Am i wrong?

Or old alias gives 301 to new alias?
(Which is i prefer)

Thank you

michaek’s picture

@trante, Global Redirect responds with a 302 to the alias with the highest pid (which is probably the most recently created). However, that question is a little off topic on this issue - probably a good idea to create a new issue with such a question.

Dave Reid’s picture

Category: bug » feature
Status: Active » Closed (won't fix)

We can't have Global redirect have a different behavior from what drupal_get_path_alias() does - which always select by the 'newest' alias for a source. If you want to change this behavior that you need to file an issue against core.

wheelercreek’s picture

This module fork is almost exactly what I needed... I need to control the weight of the aliases, because I want to force my main menu links to go to one set of aliases.. but I have another set of footer links that I need to have go to another set of aliases (same content - but different theme colors & elements on the page, based on URL path).

Problem is the global redirect mod is changing the URL to the newly adjusted alias - no longer the "newest" one, but I don't want the URL to change at all.. I want to use the alias I created.

The need doesn't happen too often, but sometimes its the best solution when you don't want to manage multiple versions of the same content that needs to be accessed through different paths.

I was able to use the fork to adjust the URL alias weights (pids), but then I have to disable the module to get the normal drupal behavior of not changing my URL.

abaumer’s picture

@michaek - awesome. This was exactly what I needed. Thanks. Working great!

jefkin’s picture

Title: Multiple url aliases for same node, globalredirect picks newest one, others redirected to that one. » Invalid Cannonical Urls: Multiple url aliases for same node, globalredirect picks newest one, others redirected there.
Category: feature » bug

I updated the title for clarity, and I put the status back to bug report because the canonical url functionality absolutely fails for it's express purpose, allowing two urls to be accessed as their exact two urls (not redirected) but refer to a single content and actual canonical url.

@Dave Reid

While I appreciate that this might need to be refiled on core as some sort of magic exception. I'm not convinced. For two big reasons.

  1. What would I say? drupal_get_path_alias() is wrong? ... I don't beleive that! drupal_get_path_alias() gets an alias, the latest alias. Ok fare enough, that's perfectly valid, and useful if some one is jumping on /node/33. But even this *does* handle and even allow multiple aliases, it just ignores them all in favour of the latest one.
  2. Even if I could stomach the idea that drupal_get_path_alias() needed some sort of patch, what functionality could I possibly give it that wouldn't conflict with the hundreds of uses of the function all over Drupal? So it would be non-starter.

But why *must* Global Redirect be limited by that? The code snipits above for Global Redirect don't use invalid aliases. Because the truth is there is nothing that stops a user from dropping in 17 aliases to a single node. And that's from core!

I'm really not convinced that the issue here is a core issue. I think it's the fact that Global Redirect *must* do more in order to truly implement canonical urls. If Global Redirect won't change to offer real canonical urls, then it should stop pretending to do them in the current /partial/ and in fact flawed way. (Flawed not because of what it's doing, but because the code is needlessly trapping itself within the artificial assumptions and constraints of that single function in core)

I can't speak to the forked version that michaek created. But if Global Redirect doesn't want to properly implement canonical urls, then I must continue to use my forked version of this project. And that doesn't seem very Drupalish. :(

I'm not out to *compete* with Global Redirect here, which is why I'm not creating a new Canonical Url Redirect, or Global Redirect Two project. And I really do think that for everything else Global Redirect is an awesome module.

I'm willing to roll up a real patch for this if you want Dave.

tanc’s picture

@Dave Reid, is your answer in #7 your final answer on the matter or are you interested in a solution within Global Redirect?

tanc’s picture

Issue summary: View changes

minor update -- removed unneeded code and clarified a sentence.