This is a feature request I first asked a question about on i18n Group -http://groups.drupal.org/node/11590#comment-39714.
I also found this post with a patch #190414: I did some modification to deal with the duplicate url problem which deals with a similar problem - duplicate urls, but I now see my original idea in i18n group conflicts with the solution for duplicate urls that was patched for D5. I suppose it becomes a critical question whether 1) a requested node's language takes priority over a given domain/language prefix, or 2) a requested domain/language prefix takes priority over the specific node that is being requested.
Option 1: "Redirect specified node to proper domain/lang-prefix" is what was patched for D5. This redirects the specified node to the path that is associated with the nodes content. In the example given there this is the language prefix, but I image it works (or should work) with domains as well (based on what language negotiation is chosen).
Option 2 is the idea I had in mind at first: if you request a url of which the domain is associated with English, but the specified node is in Spanish, you are redirect to the English frontpage. To clarify things, let's see an example.
englishdomain.org/node/1 - Welcome
spanishdomain.es/node/2 - Bienvenidos
Node 2 is a translation of node 1, and any links you find on the website will point you to the node on the right domain. But a user, or search engine, might look for englishdomain.org/node/2. The patch that was made for D5 would prioritize node/2 as 'Spanish', let's go to spanishdomain.es/node/2. This also makes most sense if we have aliased urls: if i request somedomain.com/welcome I expect English text, and if I go to somedomain.com/bienvenidos it should be Spanish.
But what if I have untranslated content?
englishdomain.org/node/3 - Projects
If this node has no translation and user clicks the language switcher block to Spanish, it will take him to:
spanishdomain.es/node/3 - Projects
That is a page which should not exist, and Option 1's patch would then say: spanishdomain.es/node/3 is an English node, let's go to englishdomain.org/node/3. That means going back while the user just said 'lets go to the Spanish website. If you have untranslated content, this means Option 1 is no good. However, without the patch Drupal would show an English node on a domain associated with Spanish. Duplicate content, and also strange to the user.
Thinking it over again, giving the url path priority over node language makes most sense to me. So my feature request for i18n is:
- when a url is requested, check that the nodes language matches the language set for that url (whether it is domain name or path prefix).
- if it does not match, go the the frontpage of the requested domain/path.
- ideally, if node language does not match path's language, check for a translation of the node in the language associated with the domain, if found go there, if not available, go to frontpage. (but if this is too complex, I would settle for just going to the frontpage).
I hope this explanation makes sense and other people share my thoughts on this. I think it would be a big improvement regarding both duplicate content and user experience.
Regards, Arjan
Comments
Comment #1
ar-jan commentedComment #2
bforchhammer commentedI use the following code in a custom module to redirect a display an appropriate message and show a 404 error page if a node is not translated yet...
This could easily be changed to redirect to the homepage instaed of displaying and error page... the following should do it:
Redirecting to a translated node when available shouldn't be too hard either... I think you can lookup the node translation using translation_node_get_translations($tnid).
Hope this helps.. :)
Comment #3
bforchhammer commentedI quite like that idea to lookup translations and redirect where possible, so I quickly implemented it... works great, especially as it solved another issue I had with frontpages not showing up correctly as well :)
Here's the code...
Comment #4
ar-jan commentedCool! I tried your code, and it works for me too. For me it's not yet a full solution, though:
1) I use PathAuto, when I visit englishdomain.org/bienvenidos (sticking with my example from the first post) I get the standard 404 page. When I go to englishdomain.org/node/2 it redirects properly to englishdomain.org/welcome (i.e. node/1, aliased).
Do you have one more idea up your sleeve to deal with PathAuto url aliases?
2) The other thing is how to implement this for urls other than nodes. In my case they are Views pages (that also filter by language). Maybe taxonomy terms (listings) too, although I will probably recreate taxonomy listings with Views.
I also realise now that my assumptions in my initial post were not exactly right. Since both node/1 and node/2 were shown on one domain / with same current langugae, I thought that by specifying the path after the domain part, any content would always show. It now seems that using PathAuto (url aliases are language aware!) changed the behavior to displaying a 404 not found error when a url alias is requested that does not match the current language (so part of the duplicate content problem is already taken care of by PathAuto).
I also did not realize that Views pages, let's say /projects and /proyectos, might react differently than nodes. So I can now still have englishdomain.org/proyectos.
Well, thank you big time for sharing this code. I hope you have some more ideas for my issue. Chao
Comment #5
ar-jan commentedOh the second snippet in #2, goto frontpage, also works. (with node/123, aliased path gets 404).
Comment #6
bforchhammer commentedYes, I'm using a "frontpage" version now as well...
I don't if something like this has a chance to go into the i18n module... I guess it would at least need some configurable options; any comments from someone responsible? :)
Comment #7
jose reyero commentedReally no plans for that into i18n atm.
Plus I think if you follow a link to some content, you are suppossed to get that content, whatever language it is in.
Such links should never be produced in the first place unless you do it on purpose. Besides, if the language is decided from browser settings, the same link can take two different users to different pages...
One use case to illustrate that: you find a result in a search engine, follow the link... oh, no such content but the frontpage.... that's not good IMHO, and breaks SEO. And there's a lot of people out there which feels comfortable browsing sites in mixed languages (better than not getting the content at all)
So I think if you guys want to address the issue, the right solution is never producing such a link, thus it is the language switcher which should be fixed, not the landing page logic (This causes other issues, like not being able to have a proper language switcher for a page, or the switcher taking people to the front page, which is kind of breaking the navigation context for most users)
Comment #8
bforchhammer commentedI agree, a good solution would probably be to fix the translation block and only display translation links to pages where the translation actually exists... possibly by just replacing the link-tag (
<a href..>) by a span-tag (class="no-translation") for respective translation links...Comment #9
fletchgqc commentedVery interesting. I got here after a similar experience to the original poster. Executive summary of this post: eliminate duplicate content by returning 404s or at least working together with globalredirect. Remove links to untranslated content from language switcher block. It would be great if Jose or others could give their feelings on this so I'm setting status to active.
I see two problems with the current state of affairs (as 6.x-1.x-dev currently stands).
Problem 1: Duplicate content is frankly a 404 issue. If node 4 is English, then ex.com/es/node/4 should be a 404 - it simply is not a Spanish node. This is how a path alias works (eg. if only the English alias ex.com/contact exists, then ex.com/es/contact produces a 404). In my opinion this is simple, there are no 2 ways about it and the current behaviour (provided by core) is in error (but, I'm very happy to listen to other opinions). If it's agreed that I'm wrong - then I hope that the issue can at least be addressed by globalredirect - avoiding duplicate content is the idea of globalredirect anyway. However I'm not able to test that as globalredirect currently has a critical i18n-related bug (#216271: Endless loop with translation (D6)). I agree with the original poster - the node number or path takes priority in determining the correct language (e.g. if node/3 is Spanish it must be accessed using the Spanish domain or path). Only exception: using browser language negotiation - this would have to be worked in.
Problem 2 - Jose is right that modifying the language switcher would break navigational context for most users - however it would be fair to say that the current behaviour is also totally confusing from a user's point of view. The suggestion proposed in #8 seems quite good - or just add a class of "no-translation" to the actual link so "display: none" can be used to hide these if desired. However that only works if problem 1 is fixed using globalredirect, because Google will see all the links.
I like the behaviour of node translation links: if there is a translation, the link appears. If not, it doesn't. Is there a reason why we are showing links to untranslated content in the language switcher block? If not, just turn it off like for the node translation links and the problem 2 is solved. This surely makes the most sense? If there is a good reason to show these links, then the solution to this problem is to make it a configurable option for the language switcher block, whether links to untranslated content are shown or not.
Comment #10
fletchgqc commentedThis issue is being dealt with by pathauto. See #201675: Redirect to version in native language. Yay!
Comment #11
fletchgqc commented