Redirect to version in native language

greggles - December 17, 2007 - 13:59
Project:Global Redirect
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:needs work
Description

There are some scenarios in Drupal6's content translation system where you can be looking at the (for example) hungarian version of a page inside of the English version of the site. This leads to confusion for the user and also SEO issues (duplicate content).

It would be great if global redirect could handle these scenarios (note they are not entirely simple - see http://drupal.org/comment/reply/146084#comment-661963 ).

This is basically a question about whether that seems like it belongs in this module or not. If so I can provide more concrete steps to repeat and desired actions on the part of global redirect module.

#1

nicholasThompson - December 17, 2007 - 14:16

Greggles - thanks for the request. I dont know if you're aware, but there is a known incompatibility with GR and i18n. I assume the issues with D6 are going to be along the same lines.

To be honest, I dont really know where to start on this - or, indeed, how the systems handle the query. Does it take the URL (eg, 'de/node/1' for a German node 1) and simply pop arg(0) (ie, 'de') out of it and keep a note to render the german translation?

Global Redirect's initial spec was a lightweight module which can handle the redirection from non-aliased to aliased pages (ie, node/1 to node_with_alias.html).

This problem isn't really covered by that project goal as the translation pages are technically DIFFERENT pages... although they are the same content, they are translated and therefore targeting a different audience.

I'm MORE than open to ideas and (even better) solutions though! Languages (as in English, French & German, etc) are not really my strong point - I can just about handle English! In fact I might be better at PHP than English... Hmmm...

#2

greggles - December 18, 2007 - 12:30

1. install drupal6
2. add in a second language
3. on admin/settings/language/configure use path prefix only
4. enable content translation module
5. in workflow on admin/content/types/page enable multilingual support with translation
6. create a page in english (node 1), click the translate tab, create it in your second language (node 2)
7. Visit example.com/en/node/2 and you see the content in the second language and all the menu stuff in the first language

This is where it would be nice if global redirect could forward the user to example.com/hu/node/2

Here is some very weak code that does this:

  global $language;
  $node = node_load(arg(1));
  if ($node->language != $language->language) {
    drupal_goto(drupal_get_path_alias($node->language .'/node/'. $node->nid));
  }

It assumes that you are using a path prefixing mode and it assumes that you are on a node/NID page.

Does something like this even have a chance of working if we can fix those two assumptions?

#3

greggles - December 18, 2007 - 12:34

Also, if there were a version of the module compatible with 6.x I'd be interested in working on a patch to do this. I checked CVS and HEAD is missing a .info file and there is no DRUPAL-6--1 branch.

I'd rather see this get fixed/committed for HEAD/Drupal6 and then see about backporting it. In Drupal6 we have the benefit of i18n being in core where it is more stable/more of a known entity.

#4

nicholasThompson - December 18, 2007 - 13:22

Head is a little out of date, I tend to work on the dev-versions for each branch (being a bit of a CVS n00b)...

I'll get a DRUPAL-6--1 branch sorted out early this afternoon which will basically be a DRUPAL-5--1 copy.

Your suggestion about sounds sensible - although I'd make one suggestion... Instead of redirecting straight away to {language}/node/{nid}, make that the new source path and THEN do a lookup to make sure you 301 to the ALIAS first instead of directing from SrcA to SrcB then to AliasB.

#5

dvaernewijck - April 29, 2008 - 16:21

on motogp.Com when you click on another language you get redirected to the home page in that language, ex when you klik on "FR" you get redirect to http://www.motogp.com/fr instead of the page in an other language.

is there a special way to set this up?

brgds

#6

nicholasThompson - May 25, 2008 - 20:45
Status:active» postponed

I believe this overlaps STRONGLY with the i18n issue (#153950: Endless loop with i18n), even though the i18n one is for D5 and not 6. I'm marking this as postponed until the D5 issue is solved. Once the D5 one is solved we can port the fix into D6.

#7

netgenius - July 26, 2008 - 13:42

Re. #5, that I think, is a matter of which Languages Menu (block) you set up - there are two, one comes from i18n and the other comes from Locale (again, I think, not sure.)

Please see my comment here - http://drupal.org/node/153950#comment-937525 - about what should happen when the url specifies a language but points to a node written in a different language. To expand on that, I can see some situations in which you might want to automatically redirect, but other situations in which you definitely wouldn't want to. It would have to be configurable, probably at a user preference level.

Example:
User tries to goto www.site.com/en/page_in_french
... the big question is WHY? As a Site Admin, or Site Translator, they may want to check the French version of something, but not have all other content (menus, blocks, etc) switched into French - that's what would happen if you redirected them to www.site.com/fr/page_in_french. Or, you could redirect them to www.site.com/en/english_version_of_page - Again probably not what Site Admin or Translators want, but possibly best for normal users and Search Engines. But, what will you do if the translation doesn't exist? Display a "not found"? (in English I suppose) or to go to the French page (which url? the en/ one or the fr/ one?), or display another page offering the user a choice of the various options?

I think it's important to be clear about what the address www.site.com/en/page_in_french actually means. To my mind it means "Show me the page_in_french page regardless of what language it may be in, and keep showing me everything else in English." I think this definition makes most sense when dealing with a human visitor - they have used that address for some reasons - if it due to an incorrect link elsewhere that issue should be fixed at source. If it was from a Search Engine, then that should be fixed by making sure they don't index these "mixed language" urls.

So, for Search Engines, the address www.site.com/en/page_in_french is one you do not really want them to see nor index. I haven't got my head around whether such links would become visible or not via simple crawling - I don't think they are for a properly constructed site, but if they are then something needs to be put in place to stop them from being visible or followed.

Another point www.site.com/en/page_in_french should probably show a link at the top of the text to allow the user to get to the English translation if it exists. This can be added manually using my Language Sections module - http://drupal.org/project/language_sections . Maybe it would be better if fully automatic.

All of this seems to me to be way beyond the scope of Global Redirect, unless the author wants to take it on!?

#8

nicholasThompson - August 4, 2008 - 15:19
Version:HEAD» 6.x-1.x-dev

netgenius, thanks for that explanation. The issue is actually more complex than it sounds, isn't it. You are right to bring up the issue of what should actually be done... eg
www.example.com/en/french-page

Should this redirect to /en/english-page or /fr/french-page?

Following on from Greggles post in #2, here is a slightly expanded theory...

<?php
   
// If Content Translation module is enabled then check the path is correct
   
if (module_exists('translation') && (arg(0) == 'node') && is_numeric(arg(1)) && (arg(2) == '')) {
      switch(
variable_get('language_negotiation', LANGUAGE_NEGOTIATION_NONE)) {
        case
LANGUAGE_NEGOTIATION_PATH_DEFAULT:
        case
LANGUAGE_NEGOTIATION_PATH:
         
$node = node_load(arg(1));
          if (
$node->language != $language->language) {
           
drupal_goto(drupal_get_path_alias($node->language .'/node/'. $node->nid));
          }
          break;
      }
    }
?>

I added this just before the clean url test towards the end of the module. It checks translation is enabled, that arg0 is node, arg1 is numeric and that arg2 is empy (ie, we're not on an edit page for example).
It then checks that the language negotiation is on path or path/language default (rather than NONE or DOMAIN).

#9

Freso - August 6, 2008 - 15:01

Is this still "postponed"?

Anyway, regarding what should be done, here's my take:

Case 1.
1) You have en/node/1 with no translation.
2) You go to da/node/1.
3) GR should redirect to en/node/1.

Case 2.
1) You have en/node/1 with translation da/node/2.
2) You go to da/node/1.
3) GR should redirect to da/node/2.

I think those are pretty much the options as it stands. I'll go poke at the proposed code now. :)

#10

Freso - August 6, 2008 - 16:32
Title:redirect to version in native language» Redirect to version in native language
Status:postponed» needs review

This patch is based on the code in comment #8, but has been expanded somewhat upon. It should work as described in my comment #9. Thanks to agentrickard for trying to help me find stuff! :D

AttachmentSize
201675_redirect_to_version_in_native_language-10.patch 1.99 KB

#11

Freso - August 7, 2008 - 12:20

Also, the patch is live (applied to DRUPAL-6--1-0) on freso.dk if you feel like seeing how it works. :)

#12

netgenius - August 7, 2008 - 12:00

IMHO Case 2 is ok, but not Case 1 for reasons described in my longer posting above. Forcing a whole-page language change could be a real problem if the user doesn't understand the menu, can't see where to login or logout, etc. My golden rule would be: never redirect from a url which specifies a language to a url which specifies a different language - how content gets displayed is a separate topic.

The only thing that I think you can safely redirect is mysite.com/something to mysite.com/en/something (where en is the site default or user preference already set.) I'm currently using my own custom version of Global Redirect which does this. So this way, Google sees only one copy of a page rather than two. I also thought that maybe it would make more sense to do the reverse, i.e. redirect mysite.com/en/something to mysite.com/something (same content, different url). The only advantage I can think of is that the url looks shorter/neater. Problem is that if you ever changed your site default language (highly unlikely of course) then Google's view of things would get messed up.

I guess different sites and different users have different needs, so any generic solution is going to need a good level of configurability, and cater for some individual user preferences too.

#13

Freso - August 7, 2008 - 12:22

@netgenius:
First, my patch in #10 will use the prefix specified by the user. By default, English's prefix is (still, see #146084: Default path prefix for English (and DBTNG it)) '' (ie., an empty string). This means that English content would be available at "foo", while localised content would be at "lang/foo".

Second, you say So this way, Google sees only one copy of a page rather than two. while opposing my case 1. If you don't do case 1, you'll end up with having the same content multiple places: en/node/1, da/node/1, fr/node1, ... - depending on how many languages you've set up. This is what I want to avoid ("at all costs").

I do agree it's not a perfect solution (one has to navigate to a different page to change the language), but it's one that lets Google and other search engines see content at one address, and one address only. This could possibly be toggled by a site variable/admin configuration though, so each site can set it as they want it. This shouldn't be too difficult to add to the patch, if Nicholas agrees it is needed.

#14

greggles - August 7, 2008 - 15:25

Rather than discussing what it shouldn't do, let's discuss what it should do.

1) You have en/node/1 with no translation.
2) You go to da/node/1.
3) GR should...

Option A) Redirect to en/node/1
Option B) Redirect to example.com/da home page with a message "content not available in your language, do you want to see it in english? (where "english" is linked to en/node/1)
Option C) Display a 404
Option D) something else

Personally, I'd be happy with any of these. I agree with Freso's comment that the most important thing (both for SEO and usability) is that content is available at only one URL.

#15

greggles - August 7, 2008 - 15:41

One more thing:

I guess different sites and different users have different needs, so any generic solution is going to need a good level of configurability, and cater for some individual user preferences too.

We have to be careful about this in GR since it runs on every page load. Adding too much complexity to this module will slow down sites. I guess that if people have translation enabled then they are prepared for some slowness already, but it's worth considering.

#16

nicholasThompson - August 7, 2008 - 16:09

Greggles - very good points allround...

I would opt for.. err.. option B. This does a number of things...
1) It definately stops the likes of Google indexing that URL
2) It stops the user wondering why they are getting an English site when they requested a Danish page.
3) It gives control to the USER about what they do next

I also would prefer to keep this module lightweight... However bear in mind that:
a) the configuration could be put in an include which the new D6 menu system could include when needed
b) the locale stuff (ie all this issue is talking about) only applies when module_exists('locale')

#17

Freso - August 7, 2008 - 17:24

And since the patch in #10 pulls in a list of all translations, they could(/should) all be listed.

Hm. Is there a way to do something akin to drupal_set_message() that will always be shown? (Ie., that won't be hidden if the user is anonymous or similar.) If so, that might be another alternative. Setting a message that the user has been redirected to another language. At least, I'd rather that than option B. But then Greg raised the point of the possibility of the content being indexed with this notice. Hohum. Is it possible to make a new page available, without first defining it outside the if {}?

In the mean time, here's an update of the #10 patch. I realised there was a small bug when there wasn't a $language->language translation available when looking up $node_translations[$language->language]. Also, it's not changing the entire $language variable instead of only the prefix. Just in case.

AttachmentSize
201675_redirect_to_version_in_native_language-17.patch 1.97 KB

#18

Freso - August 7, 2008 - 17:28

And here's a patch which includes a drupal_set_message() before it changes the language. Just in case you want to try and play around with it. =)

AttachmentSize
201675_redirect_to_version_in_native_language-18.patch 2.35 KB

#19

netgenius - August 7, 2008 - 20:48

Ok, agree, "B" is best, so I'll update my "golden rule" to include "without the user's prior agreement" :)

Fresco, I was not in any way saying you're *wrong* only that my needs are different!

Performance - keeping it lightweight.... perhaps the answer is caching... what I mean is, given url ... do lots of processing to figure out where to redirect it to, then cache the result so next time it's very lightweight - url A redirects to url B thats all. Ok, there are issues, need to clear the cache under some circumstances - probably on node update, maybe other cases.

In summary, I don't think GR can be a "one size fits all" solution without offering configurable options, at least at the system level and possibly at the user level.

#20

netgenius - August 8, 2008 - 10:04

Keeping it lightweight yet flexible, how would this be? ...

1) You have en/node/1 with no translation.
2) You go to da/node/1.
3) GR should...

-> Redirect to da/special-page?url=da/node/1

... where special-page is either a hard-coded url or configurable. Either way, it's up to each site admin to put something there (probably some PHP) that handles the rest. So, there you could provide whatever message you want to display to the user (in the appropriate language), a link to the English version of the originally requested page, or some complex routing.

This to me would seem to meet the needs of "lightweight" and "fully-configurable", and moves the complexity away from this module. I'm probably missing something?

#21

nicholasThompson - August 8, 2008 - 10:22

I cant see what's wrong with having a settings page? The menu system in D6 can include files based on callback path. We could use this to include "globalredirect.admin.inc" or something like that...

#22

Freso - August 8, 2008 - 21:51
Status:needs review» needs work

@ Nicholas:
If you're referring to my comment about a page, I'm not referring to an /admin page, but the page where one can choose (to either go back or) to go to a different language to see the content. But... that might very well be possible to add in a menu callback as well.

#23

sparkey85 - August 14, 2008 - 08:19

This patch is really nice, if I use the direct links to the nodes, but if I call the node with an alias /en/Germansite, it could not going to /en/Englishsite, is that possible to extend the patch with alias handling?

#24

Freso - August 14, 2008 - 08:53

@sparkey85:
I'm assuming you use Pathauto here, but it might apply even if you're just using core's Path module: When you create a node and select a language for it, Pathauto (and/or possibly just plain Path) will save the accompanying alias with the language code – this is to prevent aliases for e.g. English "foobar" and German "foobar" to conflict which each other, as they would have the same alias! (See #269877: path_set_alias() doesn't account for same alias in different languages for a conflict due to Path not using this language information.)

So, in short: to solve this, either the aliases needs to be saved without language information, so they apply "universally", or Global Redirect has to hit the database and get information about aliases in all languages.

The first version is obviously problematic, as I started out by explaining, and the second approach is problematic in a similar way. Say we have a node with a Danish alias "foobar" and another node with German alias "foobar" (for the purpose of this example, no English node with this alias exists, and the Danish and German nodes aren't translations of the same content), you then go to "en/foobar"... should it redirect to the Danish or German node?

#25

sparkey85 - August 14, 2008 - 12:03

@Freso

I have pathauto, and I use aliases without language prefix. Its possible to give the same alias for the translastions of a node, because the Localisation module from the System handles it with its prefixing (6.3) and it works fine, as i need already. Only translated aliases makes my life difficult.

In your exaple #24 (en/foobar) it should make no redirect, because there are no definite translations for this item.

The e.g. Settings:
Home › Administer › Site building > URL aliases:

Alias System Language
start node/1 Deutsch
start node/2 Englisch

inhalt node/3 Deutsch
content node/4 Englisch
some independ Aliases (no translation relation):
foobar node/5 Deutsch
foobar node/6 Englisch
foobar2 node/7 Deutsch

Lets see the uses cases with the actual modul:
Browser/selected language: English
Expected:
start -> en/start (node/2) (ok)
de/start -> de/start (node/1) (ok)
content -> en/content (node/4) (ok)
foobar -> en/foobar (ok)
de/foobar -> de/foobar (ok)
foobar2 -> Not exists (ok)
Translate en/start to de/start (ok)
Translate en/content to de/inhalt: (ok)
Translate de/foobar to en/foobar and back: ok (no translation)

Whats not working:
1. inhalt -> en/content (not ok)
2. de/content -> de/inhalt (not ok)

My imagined algorhythm:
1. check the nodes referenced by the alias
2. are they translations to each other?
2. if yes select the node in the requested language (prefix), if no selected language existent select the node in the default language.

I think it should work, but supposably i have error in reasoning :)

#26

Freso - August 18, 2008 - 21:48

@sparkey85: supposably i have error in reasoning - Yeah. The language prefix isn't part of the alias. If you go to http://freso.dk/en/2008/08/13/9_years_on_the_web the alias you're accessing is "2008/08/13/9_years_on_the_web", not "en/2008/08/13/9_years_on_the_web".

Anyway, apart from this, I figured, since we're all pretty much agreed on my case 2, I figured I could do a patch just with this for now, while we agree on the proper course of action for case 1. No? A hand-edited (and non-tested, but I can't see why it shouldn't work) version of the patch from #17 is attached, which only does the case 2 scenario and does nothing (ie., the same as the current behaviour) in case 1 cases.

AttachmentSize
201675_redirect_to_version_in_native_language-26.patch 1.46 KB

#27

fletchgqc - August 30, 2008 - 21:44

This might be against the feelings of most commenting here, but regarding Greggles' comment #14 options, I am strongly in favour of Option C) Display a 404.

If you stop, step back, and think about it logically for a second, then this is the answer that makes most sense. There is no Danish content called "node 1". It really is as simple as that. There is no reason that anyone should ever visit that URL, and that URL should never return anything. If you want to make it possible for your users to switch the language of what they are reading, then leave the default language switcher node links on (or whatever) - don't rely on them attempting to hack the path!

The core aim of GR is to avoid duplicate content - to ensure that URL paths don't exist on your site that could somehow be found out and have a negative bearing on SEO. Let's stick to that aim. GR's aim is not to attempt to figure out what users were trying to do when they entered a weird URL path, nor to make all sorts of weird URL path combinations available to redirect to a sensible page. The simplest and most correct way to solve this i18n problem is to issue 404s for anything which is not a valid URL.

#28

Freso - August 31, 2008 - 16:44
Status:needs work» needs review

@ fletchgqc: Try and take a peek at http://freso.dk/. I'm guessing you will land at the English version, so the first listed/most recent entry is node/25 - which is Danish content. (And using a patch I've uploaded earlier, going there will redirect to the Danish version.) Also note that once you're on the Danish version, it will no longer be called "node/25" but, eh... "2008/08/23/spam_anti_nigeria_scam". I do agree though, that getting a listing of translations might be preferable to being ruthlessly redirected to another language.

Also, it would be great if people would review my patch at #26 to have at least part of the solution (on which no one seems to disagree with the approach) committed.

#29

fletchgqc - September 2, 2008 - 18:31

Hmmm... I wasn't aware that on the English front page, a link to Danish content will link with the path /en/node/25 rather than linking directly to /da/node/25. That seems to me like core handles things the wrong way, but I don't know enough about i18n to debate that. I don't understand all the complexities yet. In any case I can definitely understand why you would not want to send 404s.

The bottom line for me is that we don't have duplicate content. If you guys want to put in tricky solutions to redirect certain pages to other places with 301s, you obviously know why you are doing it so go for it and don't let me stand in the way. I.e. I'm therefore OK with all this redirecting discussed above.

Freso I will test your patch (the one that you are running on freso.dk) if you expand it to deal with domain language negotiation. According to the comments above domain language negotiation has been ignored (It then checks that the language negotiation is on path or path/language default (rather than NONE or DOMAIN).). This definitely needs addressing (and I can test it) - i.e. the problem that: de.ex.com/node/1 = en.ex.com/node/1. Is there any chance that your patch could be expanded to address this?

#30

wrwrwr - September 4, 2008 - 22:12

@Freso: I don't understand why wouldn't you use "Language neutral" setting for this content that you'd like to be accessible for both English and Danish users. The way as it is now you can choose whether you want a "404" or a redirection to another language version, with such a patch you can't.

#31

fletchgqc - September 12, 2008 - 23:55
Status:needs review» needs work

@Freso are you still on the case and able to do this? The idea is definitely valid.

#32

Freso - September 13, 2008 - 08:47
Status:needs work» needs review

I'm not "assigned", but yeah, I'm sitll on the case. I just got back from Turkey a few days ago, bringing back home with me a messed up stomach... so I'm currently trying to deal with that, meaning that I haven't spent a lot of time on the computer(s).

Anyway, there's still the patch in #26 which I believe should be reviewed and possibly checked in, while we're discussing the other use case. Nicholas?

Also:
@fletchgqc: The reason for the language specific linking is most likely that it doesn't look for alias using language, and yes, this might be considered a bug, as it is doing this in other places. (And if you file or find a bug on this, please do send me a link. :))
I also think it should be doable with adding support for multiple domains, though I have no experience with this... so I'll definitely feedback from testing such things.
Also, I don't like the 404 idea. :) I really like the "Multiple Choices" page idea though - perhaps with a HTTP codes 303 and 300, depending on whether there is one or multiple options? If not, then at least 307 instead of 301, as the content may become translated in the mean time. I should implement this in the current patch...
@wrwrwr: Ideally, all my content would be available in both English and Danish. I just don't have the time/energy to both write the entries and translate them. But then, I might one day. Making them language neutral would mean more work once I'm translating the content. It would also not allow Drupal to make HTML that tells the browser (and search engines) the language of the content in question (e.g., Danish content appearing on the English section, would be misinterpreted as English).

#33

fletchgqc - September 13, 2008 - 13:04
Status:needs review» needs work

@Freso:
I tested patch #26 and it has no noticeable effect when domain language negotiation is being used. Therefore I don't think that it should be applied. On the other hand, domain language negotiation is not a core feature AFAIK, only an i18n module feature so based on that argument perhaps you could get away with it. But personally I think the patch should address both cases... last I heard they want to put domain negotiation in core 7.x anyway. So I'm marking as "code needs work".

Freso, I don't really agree with your rationale for issuing patch #26. The way I see it, lots of people will have a say but very few will do the actual work of writing and testing code. That's fine - everyone's opinion is appreciated and there's no obligation to contribute code to Drupal; however I think the opinion of people that do actually write code should have a lot more weight. Otherwise momentum to get things done will get lost amongst heaps of feature suggestions that no-one has the motivation to write.

Therefore, since you are the only one that has managed to write any code to address this issue yet, I think we should push ahead with your patch #17. The behaviour which you say it creates sounds excellent and does solve the problem. I think we should get it committed, and any other ideas discussed here (some of which may indeed be very good) should be addressed as future feature requests.

I applied patch #17, but the result for domain language negotiation was exactly the same as the above patch - no effect. I think we should try and fix this and then get it committed (forget #26).

Anyway, that's just my opinion. Am of course open to hear what you or others think.

#34

gagarine - September 18, 2009 - 11:57

track

 
 

Drupal is a registered trademark of Dries Buytaert.