Hello,

to start, thanks for maintaining such a great module for drupal, but I've found, what i think to be an issue...

lets say you made a node with the title "Hello World", when using the path auto you get: "mydrupalwebsite.com/hello-world" but the canonical url uses the default clean drupal path as:
<link rel="canonical" href="/node/2616" />

and i think that the most path auto users block the /node/** path in there robots.txt to avoid double content.

Also i think its best to use the default url, and not the relative part, so it becomes:
<link rel="canonical" href="http://mydrupalwebsite.com/hello-world" />

this because not everybody has there base url defined..

i hope it's clear enough :)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

apaderno’s picture

FileSize
1.06 KB

It does look for the path alias, but it does it in the wrong place.

The attached patch changes the development snapshot code to fix this problem.

apaderno’s picture

Title: canonical uses drupal default clean url path » Canonical URL doesn't use the path alias
Version: 6.x-1.4 » 6.x-1.x-dev
Status: Active » Fixed

The code has been changed, and committed in CVS for the development snapshot.

ao2’s picture

Title: Canonical URL doesn't use the path alias » Canonical URL doesn't use the path alias (and URL is relative when base_url is empty)
Status: Fixed » Needs review
FileSize
1.11 KB

Hi,

I think that the fix is not complete with regard to the Original Poster requests (the absolute/relative URL issue).

If a base_url is not defined we should still use the global one to get an absolute URL; currently, passing an empty string in $options['base_url'] to url() makes it ignore the 'absolute' setting.

See this patch (also attached), it shows the problem, and the difference between a NULL value and an empty string on a check based on isset() (see url() implementation).

diff -u -r1.1.2.47 nodewords_basic.module
--- nodewords_basic/nodewords_basic.module	7 Dec 2009 22:35:02 -0000	1.1.2.47
+++ nodewords_basic/nodewords_basic.module	12 Dec 2009 21:04:28 -0000
@@ -242,7 +242,7 @@
   $base_url = rtrim(variable_get('nodewords_base_url', ''), '/');
   $options = array(
     'absolute' => TRUE,
-    'base_url' => $base_url,
+    'base_url' => empty($base_url) ? NULL : $base_url,
   );
 
   $tags['canonical'] = !empty($content['value']) ? check_url(url(drupal_get_path_alias($content['value']), $options)) : '';

An alternative could be to use a snippet like the following, functionally equivalent to the one-liner.

  $options = array(
  'absolute' => TRUE,
  );

  $base_url = rtrim(variable_get('nodewords_base_url', ''), '/');
  if (!empty($base_url)) {
    $options['base_url'] = $base_url;
  }

Regards,
Antonio

apaderno’s picture

Status: Needs review » Fixed

The code has been changed, and committed in CVS.

Thanks for the help, and the patch.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Sputnik’s picture

Status: Closed (fixed) » Active
Sputnik’s picture

Hello,

i reopend the issue because in the actual developer snapshot (2010-Jan-27) and also in the 6.x-1.12-beta2 the canonical url doesnt work for me.

thats what it produces:
<link rel="canonical" href="http://www.example.com/en/node/15864" />

thats what i expected :
<link rel="canonical" href="http://www.example.com/en/very-nice-content-here" />

i think this how it should work, or I am wrong ?

[Edited by kiamlaluno] to show the tags that were not show by the current input format

apaderno’s picture

Status: Active » Closed (works as designed)

I explained in #658630: Generated canonical URL is not correct why the code is not using the path alias as canonical URL.

Sputnik’s picture

@kiamlaluno so works as designed, do you think it will be helpfull to have a feature request for this ? and thanks for the advise

apaderno’s picture

@Sputnik: I would reply in the same way I replied here.

The canonical URL is a URL that doesn't change with the time, as it needs to uniquely identify a resource. This means that in Drupal a canonical URL is http//example.com/node/<nid> for nodes, http//example.com/user/<uid> for user profiles, etc.

Sputnik’s picture

Status: Active » Closed (works as designed)

@kiamlaluno From the technical point off view I agree with you.
On the other hand , from the SEO point off view, I prefer a URL like http//example.com/<titel> the alias of the Node URL as canonical URL, because the CTR will be significant higher than with the "Node" Version.
That's one of the reasons because we use modules like Pathauto , Global Redirect ... . And we try to suppress the access to duplicate content, like the author of this issue demon326 explained.

I think to have an option like "use alias URL instead of node URL for canonical URL" will be a great advantage for this module.
Or is there an error in my reasoning ?
TIA

pavlos’s picture

Status: Closed (works as designed) » Active

@kiamlaluno fully agree with Sputnik on this. The canonical tag should be controlled by the webmaster and not by drupal core's default url structure, as this is a basic SEO principle and this tag is supported by search engines for this exact purpose.

apaderno’s picture

Status: Closed (works as designed) » Fixed

The canonical tag should be controlled by the webmaster

In fact, it is possible to change the Canonical URL to the one you prefer. As there is nothing wrong using the Drupal canonical URL as canonical URL (considering also the purpose of that meta tag), I am changing back the status of this issue.

ao2’s picture

Status: Fixed » Active

+1

I also think that the webmaster should be able to decide about the Canonical URL.

apaderno’s picture

Status: Active » Fixed

See my previous comment.

apaderno’s picture

I have marked #707876: Canonical URLs get not Pathauto as duplicate of this report.

pavlos’s picture

@kiamlaluno my problem is that I have generated vocabularies using taxonomy_csv module and I need it to automatically enter the canonical url using the path for the term and not with paths like taxonomy/term/xx

Need the SEO-friendly one which is /category-name/sub-category-name/

eule’s picture

hello

i switch today from 6.x-1.12-beta2 to Nodewords 6.x-1.x-dev (2010-Feb-09)
the problem with the Canonical URL seems not fixed for me...all my canonicals are ./node/** or / and ./taxonomy/**
cache cleaned!

apaderno’s picture

@eule: The status of this report is fixed because of comment #3; the code has been changed to solve what reported about the canonical URL not being an absolute URL when the base URL is an empty string.

As already explained, in Drupal the canonical URL has the form node/<nid> for nodes; if you want to use a path alias as canonical URL (which I would not suggest), then you set the canonical URL to the one you prefer. Branch 6.x-3 will allow you to use a token to globally set the canonical URL to the path alias, if you really want to do so.

locomo’s picture

subscribing

+1 comment #11

@kiamlaluno - yes it's true that we can manually control the canonical url. but now this means i have to go though and touch hundreds of nodes.. i'm sure for many this is thousands of nodes.

eule’s picture

subscribing

+1 comment #11

DamienMcKenna’s picture

How about a simple option (e.g. variable_get('nodewords_canonical_alias', FALSE)) that decides if nodewords_basic_canonical_prepare() runs the path via drupal_get_path_alias() first?

DamienMcKenna’s picture

Status: Fixed » Needs review
FileSize
1.66 KB

Here's a patch that adds a new option "Canonical URL use alias" to the main settings form that will control whether internal paths or url aliases are used. By default the variable is FALSE so it will use the internal path until the variable is set.

apaderno’s picture

Status: Needs review » Postponed

The proposed patch doesn't actually change the way the canonical URL is generated. Using the path alias will be possible once that the code is changed as per #508496: Support for tokens, and #706630: Implement the hooks necessary to add more tokens.

+      if (variable_get('nodewords_canonical_alias', FALSE)) {
+        $content['value'] = $path;
+      }
+      else {
+        $content['value'] = $path;
+      }
DamienMcKenna’s picture

Status: Postponed » Needs review
FileSize
1.68 KB

Apologies, I hadn't fully merged the full change. Please check this new attachment.

While I laud the concern to provide a proper fix for this, Drupal via the core Path module (with PathAuto, Path_Redirect and GlobalRedirect) already has an existing structure to cleanly handle paths so it doesn't IMHO make sense to provide a redundant method of configuring the same thing.

ao2’s picture

@DamienMcKenna: thanks.

@kiamlalauno: tokens are indeed more flexible, but maybe you can accept this change for the time being. It is fairly trivial and most users will be happy NOW :)

In the meantime we can discuss (in another issue if you like) if using tokens for paths _directly_ in nodewords is sane, or if relying on tokens already used in Pathauto (which this patch should cover already) is better.

Just my 2 cents.

Regards,
Antonio

Duplika’s picture

Another issue that came to my mind is that if I enable canonical URL's and I only specify one on a certain node, I see that all the other nodes that doesn't have a custom canonical URL's are nevertheless, all nodes have it's canonical URL pointed to themselves.

Is there any way to output the canonical URL's only on the nodes that indeed have a canonical URL and not on the other ones?

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

This is simple and gets the job done. +1 for being off by default, but configurable. I tested this locally and its working great.

DamienMcKenna’s picture

For anyone who wants to try, I've made a version of v6.x-1.11 available with the patch applied, please let us know here how it works for you.

Anonymous’s picture

Status: Reviewed & tested by the community » Fixed

As Nodewords now supports tokens, and it implements two tokens though to be used for the canonical URL, I am marking this report as fixed.

As usual, read what reported in the project page, before to update the installed modules.

jcisio’s picture

Status: Fixed » Active
FileSize
1.4 KB

Thanks, looking into the code, I see the token is [metatags-url-alias], right?

However it is not listed on q=admin/content/nodewords/default. The parameter is not saved as reported in another issue (1.13beta4). So I reroll the patch for a temporary fix (sorry, I don't know about the token thing, so I can't help).

Anonymous’s picture

Status: Active » Fixed

The issue hs been already fixed. The hook to list the tokens were not returning the result as expected from token.module, and that caused the custom tokens to not appear.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

firebus’s picture

afaict (and maybe i'm missing something)

1. the patch in #25 no longer applies to 6.x-1.x-dev, nor does it apply to 6.x-1.11

it seems to affect a non-existent nodewords_basic/includes/nodewords_basic.nodewords.tags.inc file.

maybe it's meant to apply to nodewords_basic.module, or i'm doing something really dumb?

2. the token feature referred to by kiam is only in the 6.x-2.x branch.

i'm not sure if i should update to 6.x-2.x or not - would it be worth rerolling #25 to apply to the current 6.x-1.x-dev?

3dloco’s picture

@firebus, I applied #25 to nodewords_basic.module for 6.x-1.11 and it is working for me.

DamienMcKenna’s picture

Status: Closed (fixed) » Needs review
FileSize
1.56 KB

I've re-rolled & slightly simplified the patch from #25 to work on 6.x-1.x-dev.

DamienMcKenna’s picture

Status: Needs review » Fixed

I've committed the patch from #36 to 6.x-1.x-dev, it'll be available in the next release.

DamienMcKenna’s picture

Issue tags: +v6.x-1.12 blocker

This should have been tagged.

DamienMcKenna’s picture

FileSize
728 bytes

Per feedback from jwilson3, I added an extra comment to help explain what appears as backwards logic in nodewords_basic_canonical_prepare() for deciding whether to use an alias or not.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Z2222’s picture

The canonical URL is a URL that doesn't change with the time, as it needs to uniquely identify a resource. This means that in Drupal a canonical URL is http//example.com/node/ for nodes, http//example.com/user/ for user profiles, etc.

Sorry to post in an old thread, but I just noticed that a couple of my Drupal sites have spontaneously switched from having the node alias as the canonical URL to having the node/123 internal paths as the canonical URLs. I wanted to respond to the comment above:

The node/123 URLs are internal paths and are not the canonical URLs if there are URL aliases present. The canonical URL meta tag is a feature that was created by Google to help them index pages correctly. Its purpose is to tell Google what the correct URL is, especially in cases where a site is doing things like using Google Analytics campaign tracking as parameters in the URLs like http://example.com/my-webpage?utm_source=drupal-issue&utm_medium=issue-post&utm_campaign=drupalorg.

If you have a path alias and then set the canonical URL as the node/123 URL, it is very bad for SEO, because Google will start displaying the wrong URLs in the search engine results pages.

URLs like node/123 are just internal paths and should not be exposed to search engines or users when there is a URL alias present.

The ability to change this is in the module, but I just want to post a caution about changing the default settings of the module. The checkbox that sets the module to use URL aliases as the canonical URLs should be checked by default, otherwise a lot of websites will get messed up.

DamienMcKenna’s picture

The ability to change this is in the module, but I just want to post a caution about changing the default settings of the module.

The latest version does, in fact, use the alias by default.

WeRockYourWeb.com’s picture

J Cohen explains how this should work perfectly. Relieved that this is the default setting now, we had one of our sites completely screwed (pages dropped from Google index) by having the canonical URL set to /node/x for each page during a former update.

Anonymous’s picture

My team was also burned by this issue. We updated to a new version at some point and it set the internal URL to be canonical and of course it broke our SEO practices.

I see no reason why we should have been reverting already live sites that have aliases to use node/[nid]. It goes against the very purpose of this module.

I realize that being the technical lead on my project puts the blame on me for rolling this out, however, we are in the days of Ægir and other drush-based automated deployment tools where catching a little detail like this is difficult to catch until it is too late. Giving the developer control is good, but our defaults should be a bit more in line with what the intended goal of a module is.

DamienMcKenna’s picture

@Ryan: I am sorry you went through that. I had a disagreement with the module's last maintainer back in 2009 about what the default should be, he argued for using the internal URL while I (and some SEO analysts I worked with at the time) argued for the alias. Since v1.14-beta1 the default has been the alias.

Anonymous’s picture

@DamienMcKenna I think what happened was that we upgraded at some point, and then later (before the revision) someone on our team had made an update to that settings form, thus preserving the bad setting, so it persisted after the subsequent updates. I really appreciate the work people have been doing on this module, it certainly beats doing this by hand for each site!