Hi,
A little background first:
I have a multilingual site and I use these rules to differentiate between interface language and content language:
- Language prefixes in URLs change the Interface language
- Two translations of the same content have also the title translated. So titles change the Content Language
With these rules I need that the language prefix is ignored in Canonical URLs because, for example,
- ao2.it/about-ao2
- ao2.it/en/about-ao2
- ao2.it/it/about-ao2
refer all to the same content, and I want to use the one without prefix as a canonical url (a real user will see the interface in the language negotiated by the browser, or in the default language).
The following change works in my case, it makes url() ignore the language prefix. Maybe you'd want to make it a more general setting?
diff -u -r1.1.2.47 nodewords_basic.module
--- nodewords_basic/nodewords_basic.module 7 Dec 2009 22:35:02 -0000 1.1.2.47
+++ nodewords_basic/nodewords_basic.module 12 Dec 2009 21:47:23 -0000
@@ -243,6 +243,7 @@
$options = array(
'absolute' => TRUE,
'base_url' => $base_url,
+ 'language' => '',
);
$tags['canonical'] = !empty($content['value']) ? check_url(url(drupal_get_path_alias($content['value']), $options)) : '';
Thanks,
Antonio
Comments
Comment #1
avpadernoWhat you report depends from the page you are visualizing.
In the generic case of a page, the content of http://example.com/it/admin/content/nodewords is different from http://example.com/es/admin/content/nodewords because the language set is different; thefore, title, and other meta tags content should be be different.
Comment #2
avpadernoActually, that is true also for nodes.
When you visit http://example.com/it/node/347, the language is set to Italian, and all the strings used by Drupal for the user interface are in Italian; when you visit http://example.com/es/node/347, the language is set to Spanish, and all the strings used by Drupal in the user interface are in Spanish.The node content is only part of the page content.
Setting the canonical URL of such pages to the same value is not probably something you should do, considering that the canonical URL should be used in Drupal to report to the search engines that http://example.com/node/347 and http://example.com/node-347-alias are two URLs for the same page.
Comment #3
ilo commentedDon't want to disrupt the discussion, just moving the branch.
Feature requests should go for the 6.x-3.x branch of the module, moving.
Comment #4
ao2 commentedTake the case of nodes under my own rules about content and language handling.
I decided to use the language prefix in URL to switch only the interface language, so in my case, two pages with the same title, but with two different language prefixes (and then interfaces) really represent the same content if we consider the node body as the content that matters, and not the whole page.
So the doubt I have about Canonical URL spec is: is it to report duplicated URLs for the same page? Or more generally to report duplicated content?
I am reading something more about it and I'll come back when I have a clearer picture.
Thanks,
Antonio
Comment #5
avpadernoWhat you report is exactly what Drupal does when URL prefixes are used to set the language.
When you visit, i.e, a node written in Chinese with an URL like http://example.com/it/node/345, the node will be in Chinese and the user interface in Italian (actually, it will be in Italian if the translation in Italian will contain all the strings used in that page).
The code must be as generic as possible, and take in consideration also the case where to each language is associated a different domain.
The purpose of the canonical URL is reported from Google; as Google Webmaster Tools always reported as duplicate pages that were associated to different URLs, they introduced the meta tag to avoid the problem.
Comment #6
avpadernoI am changing the status as per my previous comment.