1. You may run a site with two languages e.g. German/English with default German.
2. Now create a language neutral node.
3. German node, Canonical URL is de/content/foo.
4. Switch to English, Canonical URL is en/content/foo. BUG - this is duplicate content - Canonical URL must be [default site language]/content/foo e.g. de/content/foo.

Comments

drnugent’s picture

Is this specific to Meta tags? This is the way canonical URLs are generated by Drupal core.

http://api.drupal.org/api/drupal/modules%21node%21node.module/function/n...

You can override the behavior:

http://drupal.org/node/1068562

hass’s picture

I don't know, I only know - it is completly wrong as it does not solve the duplicate content issues it is made for.

drnugent’s picture

True, but that doesn't have much to do with this module. The canonical tag is generated the same way whether this module is enabled, or not.

colan’s picture

Status: Active » Closed (works as designed)
hass’s picture

Status: Closed (works as designed) » Active

Bug is not fixed.

willieseabrook’s picture

I'm currently launching a multi country site, and the issue is more complicated than just canonical.

Google actually understands a whole bunch of different things for different types of multilingual and multiregional websites.

See: http://googlewebmastercentral.blogspot.fr/2011/12/new-markup-for-multili...

Also
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
http://googlewebmastercentral.blogspot.fr/2010/09/unifying-content-under...
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182192 <--- Bottom of page says canonical isn't for multilingual

DamienMcKenna’s picture

Version: 7.x-1.0-alpha5 » 7.x-1.x-dev
DamienMcKenna’s picture

Status: Active » Postponed (maintainer needs more info)

Metatag now works with Entity Translation, please review the current functionality and let me know if the problems persist.

Kristen Pol’s picture

This is still a problem with:

Entity Translation - 7.x-1.0-beta2+17-dev (2013-01-27)
Meta Tags - 7.x-1.0-beta4+17-dev (2012-12-04)

DamienMcKenna’s picture

Please try the latest -dev release. Thanks.

q0rban’s picture

According to Google, we shouldn't be using canonical on multilingual sites. Instead, the attribute should be rel="alternate" with the hreflang set to the language. So, if you have 5 languages, you'd have 5 links, one for each language. I've done that on my site by using the following code:

/**
 * Implements hook_html_head_alter().
 */
function example_html_head_alter(&$elements) {
  // Unset the Metatag canonical URL if it exists. See lb.cm/mcQ.
  unset($elements['metatag_canonical']);

  // Create a list of alternate urls, one for each language.
  foreach (language_list() as $langcode => $language) {
    // Make sure path is absolute and language is set.
    $options = array('absolute' => TRUE, 'language' => $language);
    // Generate the URL from the current q.
    $href = url($_GET['q'], $options);
    // Create a key in the elements array for this language.
    $key = "example_rel_link_$langcode";
    // Add the link using theme_html_tag.
    $elements[$key] = array(
      '#type' => 'html_tag',
      '#tag' => 'link',
      '#attributes' => array(
        'rel' => 'alternate',
        'hreflang' => $langcode,
        'href' => $href,
      ),
    );
  }
}

More conversation:

q0rban’s picture

Issue summary: View changes

a

Carlos Miranda Levy’s picture

Issue summary: View changes

This issue persists.

DamienMcKenna’s picture

Status: Postponed (maintainer needs more info) » Active
Parent issue: » #2175021: META: Plan for Metatag 7.x-1.0-rc1 release

Lets try and fix this for 1.0-rc1.

DamienMcKenna’s picture

So...

This needs some custom handling as the usual combination of tokens would not suffice.

Is it worth adding a custom submodule for this, or maybe copy the code to metatag_html_head_alter() to be loaded if the Locale module is enabled?

DamienMcKenna’s picture

Crazy idea - should this be added by Entity Translation?

DamienMcKenna’s picture

Question: if there is no translation of a node for the site's default language, should the canonical tag still link to [defaultlanguage]/the/node/alias?

If the canonical tag should be excluded completely, this seems like it should be just handled by hook_metatag_metatags_view_alter() where it would adds the new language tags and removes canonical.

Thoughts?

BTW I'm removing this from the 1.0-rc1 release.

DamienMcKenna’s picture

DamienMcKenna’s picture

I'm taking this off the list for 1.0. Yes, it's really important for multilingual sites, but I need some feedback on comment #16, or a patch that fixes All The Things, before proceeding.

Charles Belov’s picture

I have a concern about this. The two pages would not quite be the same.

de/content/foo would contain <html lang="de">
en/content/foo would contain <html lang="en">

Additionally, any translated strings on the page would be in English, not German, on en/content/foo.

Since the page is language-neutral, one of these language markings would be wrong, presumably the one for en/content/foo for a default German site. But there is no guarantee that a search engine would index the correct language marking.

That is, if Google (or whoever, but at the present time Google gives us the majority of visits) last visited en/content/foo, then de/content/foo would be marked as being in English, so that de/content/foo would not show up as a result in searches restricted to German (wrong) and would show up in searches restricted to English (also wrong).

The solution I implemented [yesterday] was to leave the canonical URL the way it is and for non-default languages to be marked with a meta tag for robots noindex on language-neutral pages.

That is, in the context of the current issue:
de/content/foo has canonical de/content/foo
en/content/foo has canonical en/content/foo AND has robots noindex

(Eventually, I'll change that to noindex, nofollow, but I have to wait until at least Google no longer has any of the pages that were indexed under the wrong languages.)

Additionally, if the user is authenticated (staff), we do the equivalent of redirecting en/content/foo to de/content/foo. That of course would have to be by role if the site allowed the public to have user accounts.

While this is outside the scope of the current issue - but related - #1518224 would not be appropriate for us.

If #1518224 is implemented, it needs to be a setting, not a given. That is, I see it as potentially problematic if:

de/content/foo has canonical de/content/foo
en/content/foo has canonical de/content/foo AND has robots noindex

in that Google (and other search engines) might inadvertently remove de/content/foo from the index due to the robots noindex tag.

DamienMcKenna’s picture

I think we should promote usage of the Alternative hreflang module instead. FYI I've submitted a patch to fix its language selection to use the LANGUAGE_TYPE_CONTENT instead of LANGUAGE_TYPE_INTERFACE.

hass’s picture

That is not the same.

DamienMcKenna’s picture

@hass: What's not the same?

hass’s picture

This issue is about Canonical URL and not hreflang. These are completly different things.

DamienMcKenna’s picture

Looking at this for a project, and one idea we're looking at is removing the canonical tag if the current page has multiple hreflang values assigned, though we've not dug into the details yet.

hass’s picture

Again, what has hreflang to do with canonical url???

Canonical url is about the REAL url of content if "e.g. node/123" is shown. Than the canonical url need to point to the alias with proper hostname. The issue is here that the hostname is not correct in many cases. The root cause is the try to reuse core functions that do not generate the correct hostname in some situations. We cannot use the core functions or we end up with incorrect hostnames and duplicate content.

This has still nothing to do with a content language.

DamienMcKenna’s picture

Metatag uses tokens to generate the URLs, not custom code, so if there's a problem with how the page paths are being generated please bring it up in core.

@hass: Please read the Google docs listed above, they recommend to not use the canonical meta tag if the content is available in multiple languages and to use the hreflang meta tag instead.

Charles Belov’s picture

@DamienMcKenna: So how then do we implement Google's "Unifying content under multilingual templates" article using the Metatag module?

It's not as simple as just globally changing the metatags. Most pages on our website are not translated, although the template sort of is, which would apply to the Step 2 in the article. However, the content of a small percentage of pages (>1%, but high profile) are translated into two or more of the other website languages.

So the question then becomes how do we get Drupal to not put out the
<link rel=”alternate” hreflang="fr" href="http://fr.example.com/javier-lopez" /> tag if Javier Lopez's Spanish page actually has also been translated into French?

DamienMcKenna’s picture

@Charles Belov: Google's article from 2011 says not to worry about the canonical term: http://googlewebmastercentral.blogspot.fr/2011/12/new-markup-for-multili...

There's the hreflang module for entities and a sandbox for doing Panels, and I've opened up issues for both to (optionally) hide the canonical meta tag when they're outputting hreflang tags. Is there any need to handle this in Metatag?

DamienMcKenna’s picture

Lets refocus this.

Can someone please tell me if there are problems with this outside of the canonical-vs-hreflang issue, which will be handled in the two issues I just linked to? I'll be available all week on IRC in #drupal-seo and #drupal-i18n and want to fix outstanding i18n issues for 1.8.

kepford’s picture

After working on this issue for the hreflang module and I believe there is a bit of confusion regarding canonical tags. Google's article makes it sound like you should not use canonical tags but based on several other articles using canonical tags is fine as long as the canonical tag is pointing to the current page.

See http://www.rebelytics.com/hreflang-canonical/ as well as https://yoast.com/rel-canonical/#comment-285948.

DamienMcKenna’s picture

Right now we're just using standard tokens to output the meta tags, very little logic is provided to customize them for different scenarios because there are so many different possible use cases.

This issue is going to go nowhere unless we can discern exact rules for how this should work. Until then, there are always Metatag hooks you can use to customize the meta tags on a per-site basis.

DamienMcKenna’s picture

Title: Canonical URL is invalid on multilingual site » Change canonical URL handling on multilingual site
Category: Bug report » Feature request
Priority: Major » Normal

This isn't a 'major' issue right now, given that the module effectively works the same way as core.

If someone could please identify what Metatag could do better for this please let me know, otherwise I'm going to close this because it "works as designed".

matthewv789’s picture

Hopefully I can clarify about canonicals and what they are for.

Each complete translation of a page is effectively its own canonical. It is NOT duplicate content just because it means roughly the same thing in a different language!

Only in the situation where the MAIN CONTENT of the page is NOT translated, but other parts of the interface might or might not be and the duplicate page exists on a different URL (meaning the alternate URLs would all duplicate the same exact text for the most important bulk of the page, if not the whole page), should you point to one of them as canonical. This should be the version where the text of the page matches the claimed language for the page (which would generally be the originally-created source page). Often that might be the only version where all interface elements also match the language the content is written in. (If a visitor can read the content in that language, they can probably also read other interface elements in that same language.)

Again, this only applies to content that is NOT translated in the different language versions of the site. So yes, if a multilingual site automatically provides a URL in all languages for all content whether translated or not, then any content NOT translated (meaning there are multiple pages with the same exact text in the same language) should have a canonical at the other language URLs pointing to the preferred version, to help Google point search results to that page and not the others. However, as soon as an actual translation is provided, that canonical should go away!

The purpose of canonical tags is to point Google to one source where all or nearly all of the text on two or more pages is identical from a textual standpoint (but without using an actual redirect on the page itself). A translation of that text has 0% duplication of the actual text, so there is no need to use canonical to point to the original language. In fact, using it on a translation would be improper and confusing, as it implies that the canonical version is the preferred version for any visitor, and should be shown in search results instead of the alternates. Canonical is NOT for some kind of attribution as to where the intellectual ideas contained in the page originated from. It is to manage duplicate pages containing the exact same text, and thus show one of them preferentially in search results.

The simplest way to envision canonical and what it is for is to imagine it as a "soft" redirect that is not actually taken, but could be. Would you want this visitor to be redirected to the "canonical" version? If yes, then canonical is fine. If no, then don't use it to point to another page! This also implies that there is never any reason to have a canonical pointing to itself, which means there is no benefit to using it EXCEPT when there are at least two pages with the exact same content, and then use it on n-1 of the pages, to point to the preferred version. Most of the time there is no reason to have a canonical on a page at all!

Here's an illustration of the parallel between a redirect and a canonical tag:

Redirect: As a visitor who speaks English, would I want to be redirected to the Spanish page because that was the original language and is the "preferred" source for the ideas contained on that page? NO, that would be wrong! I don't speak Spanish, so that would do me no good - it is not the same content.

Canonical: Would I want Google to show the result for the (canonical) Spanish page in search results instead of the English translation when I typed in an English query to find the English version? No! So there should NEVER be a canonical pointing to the original language source for a translation of a page!

And of course, since the English and Spanish pages share no text, there is no duplication and no SEO penalty from Google, ergo no need for a canonical tag in the first place (if being penalized is your concern).