MSIE sends accept language headers that look like this "en-nz,en-us;q=0.7,sv-se;q=0.3" but language_from_browser() wont match that to either 'en' or 'sv' which is what is in the language table
This drop-in replacement code for language_from_browser() works better. (I may try to send a patch later).
function language_from_browser() {
// Specified by the user via the browser's Accept Language setting
// Samples: "hu, en-us;q=0.66, en;q=0.33", "hu,en-us;q=0.5"
// MSIE 7: "en-nz,en-us;q=0.7,sv-se;q=0.3"
$browser_langs = array();
if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
preg_match_all('"(((\S\S)-?(\S\S)?)(;q=([0-9.]+))?)\s*(,\s*|$)"',strtolower($_SERVER['HTTP_ACCEPT_LANGUAGE']),$browser_accept);
for ($i = 0; $i < count($browser_accept); $i++) {
// The language part is either a code or a code with a quality.
// We cannot do anything with a * code, so it is skipped.
// If the quality is missing, it is assumed to be 1 according to the RFC.
if(!empty($browser_accept[2][$i])) $browser_langs[$browser_accept[2][$i]] = ($browser_accept[6][$i]? (float) $browser_accept[6][$i] : 1.0);
if(!empty($browser_accept[3][$i]) && empty($browser_langs[$browser_accept[3][$i]])) $browser_langs[$browser_accept[3][$i]] = ($browser_accept[6][$i]? (float) $browser_accept[6][$i]-0.01 : 0.99);
}
}
// Order the codes by quality
arsort($browser_langs);
// Try to find the first preferred language we have
$languages = language_list('enabled');
foreach ($browser_langs as $langcode => $q) {
if (isset($languages['1'][$langcode])) {
return $languages['1'][$langcode];
}
}
}
Comments
Comment #1
Gábor HojtsyWell, a patch would indeed be great.
Comment #2
iliphil CreditAttribution: iliphil commentedhere is the patch
Comment #3
Gábor Hojtsy- There are some coding style issues in the patch (no spaces after commas, bad indentation, no brackets for the ifs)
- You match on \S\S-\S\S which means two char language and variant codes. Are we sure that we can only see these? (I am not)
- Is it guaranteed that the language codes are in descending priority order? The current code expects this (while the original code expected unique list of language codes).
Comment #4
iliphil CreditAttribution: iliphil commentedApologies for the style
The match is on (\S\S-?(\S\S)?) so the second \S\S is not required for a match. en, en-us, en-gb will all match.
It doesn't require that languages be in descending priority order so it doesn't matter if they aren't.
I can only say try it out.
arsort($browser_langs);
print_r( $browser_langs);
Comment #5
j.somers CreditAttribution: j.somers commentedI filed a duplicated entry for this bug.
Any changes of it getting fixed in a next release?
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedWell, both the parsing and the logic of
language_from_browser()
is broken. According to RFC 2616 section 14.4:So we have to go thru all enabled languages, assign them a qweight using the matching logic described above and select the language that fit the most.
Here is a full test case for language_from_browser() that outline the issue in the current implementation, along with a new implementation that fully pass the test case.
Comment #7
Damien Tournoud CreditAttribution: Damien Tournoud commentedComment #8
j.somers CreditAttribution: j.somers commentedCan I safely use this patch on my D6 installation or is it going to cause conflicts.
Right now I started using the previously provided patch which seems to work pretty OK for what I need it.
Comment #9
Damien Tournoud CreditAttribution: Damien Tournoud commentedHere is the same patch for D6.
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedOh, and the comparison between language tags should also be case insensitive (RFC 2616 section 3.10). Here is an updated patch.
Comment #11
j.somers CreditAttribution: j.somers commentedThank you.
You have just been promoted to my monthly hero!
Comment #13
lilou CreditAttribution: lilou commentedSee: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476
Comment #15
Damien Tournoud CreditAttribution: Damien Tournoud commentedReroll.
Comment #17
Bodo Maass CreditAttribution: Bodo Maass commentedHi Damien,
Great work on this issue. I just realized that this is still broken in D6 when I played with the language settings of my browser and always got the wrong language. While your code is a great step in the right direction, I think it still doesn't work correctly.
I set my browser (Firefox 3.5) to accept languages in the following order: de-de, en, de, en-us
This results in this header: "de-de,en;q=0.8,de;q=0.5,en-us;q=0.3"
My site has the languages 'de' and 'en' enabled. According to the setting, my site should be displayed in German (de). What happens is that your code iterates the languages available on the site ('en' and 'de'). When it finds a match for the single language code, it skips the double language code, which has a higher qvalue but gets ignored. So in this example the code incorrectly ignores de-de and finds that en has a higher value than de. Hence the incorrect result is 'en', while it should be 'de'.
Comment #18
Bodo Maass CreditAttribution: Bodo Maass commentedsorry, didn't mean to change the title of this issue
Comment #19
Bodo Maass CreditAttribution: Bodo Maass commentedHere is a slightly modified patch that works correctly for double language codes.
Comment #20
Damien Tournoud CreditAttribution: Damien Tournoud commented@Bodo: if you think you found a bug in my patch, please extend the test case.
But in that case, I believe you are wrong: you are saying that your site offers 'en' and 'de'. Not 'en-US' and 'de-DE', the generic 'en' and 'de'. In that case, the correct match for...
... is really 'en', because your site doesn't offer the 'de-DE' specific language.
Comment #21
Bodo Maass CreditAttribution: Bodo Maass commentedHi Damien,
I haven't had time yet to look at the D7 testing framework, but will do so and then update the test.
About the issue: As far as I understand, drupal sites don't normally offer compound locales such as 'en-US' or 'de-DE', but always just simple locales such as 'en' and 'de'. Is that correct? I guess it is possible to set up compound locales, but all the translations that ship with drupal are only simple locales.
If so, the compound locales come from the browser settings of the user.
I believe the intention of the code was that when the users browser requests a compound locale such as 'en-US', and the drupal site supports the main locale for this (in this case 'en'), there should be a match. Is that correct? I think the original code didn't work like that, but I agree that the tests should be used to define the behaviour.
Comment #22
Damien Tournoud CreditAttribution: Damien Tournoud commented@Bodo: the patch above is supposed to support serving compound locales.
Anyway, if you only have a main locale 'en' and 'de', the correct match a strange language configuration like:
is definitely 'en', because your site doesn't provide the specific 'de-de' language.
Comment #23
Bodo Maass CreditAttribution: Bodo Maass commented@Damien:
Fair enough about serving 'en' in the previous case.
What should happen if the site only has 'en' and 'de', and the client browser is set to accept 'de_de' and 'en_us' in that order, and nothing else? (Accept-Language: de-de,en-us;q=0.5)
My user expectation would be that the site will be served in 'de', although there is no direct match of locales. Is that what the spec would imply?
I guess the answer is NO, but this is unintuitive for a non-technical user. IE and FF also offer no help whatsoever for this setting beyond 'select your preferred languages here'. FF makes this worse by offering a multitude of compound locales that might get accidentally selected instead of the main locale.
Comment #24
Damien Tournoud CreditAttribution: Damien Tournoud commentedRFC2616 section 14.4 (language-range is the thing returned by the browser, language-tag is the server provided languages):
ie. 'en' sent by the browser matches 'en-US' provided by the server, but 'en-US' sent by the browser doesn't match 'en' provided by the server.
In the case you described, no language would match, so the server will its default language.
Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commentedIf you are not convinced already, RFC2616 has the following node at the end of the same section:
(emphasis mine)
Comment #26
Bodo Maass CreditAttribution: Bodo Maass commentedHi Damien,
I'm already convinced, and your code seems to work according to the spec. It just took me a while to understand that selecting any of the compound locales without the main locale did not do what I expected (as the RFC helpfully points out).
As I said, all browsers I checked (FF, IE, Opera) offer no help at all for this, so I might not be the only one who got this wrong. The browser should probably add the main locale (or at least suggest to do so) if it is not already present, but that is not a Drupal issue.
In any case, thanks for your explanation. I guess your original code is correct after all.
Comment #27
Bodo Maass CreditAttribution: Bodo Maass commentedI just realized how I got started with this whole confusion. My installation of IE7 doesn't offer two letter locales at all. Instead, only compound locales are available for selection, which is bound to produce unexpected results as discussed above. I'll have to check if this is fixed in IE8, and if this is normal for IE7 or if my install is somehow corrupted.
Comment #28
Plascual CreditAttribution: Plascual commentedI'm running into this problem. Is there any new development ?
All browsers work as expected, but IE7, IE8 and Safari.
Even if IE or Safari has fr-CA as prefered language, it displays the front-page in the default language (English).
Does this patch apply to D6 (14) ?
Comment #29
markus_petrux CreditAttribution: markus_petrux commentedFor those insterested in a fix for D6, without the need to patch core, so until the fix is committed, I'm using the following code snippet at the bottom of settings.php
In IE8, I've found language entries that start with es_xxx, or es-ES, and the above snippet rebuilds the variable $_SERVER['HTTP_ACCEPT_LANGUAGE'] to ensure everything is lower cased, and the main language prefix exists in the list (ie. when it finds 'es-ES', it makes sure 'es' exists), before it reaches language_from_browser() in Drupal core.
Comment #30
GiorgosKHad the problem described above with IE8 and drupal6
this code in settings.php seems to solve the problem
Comment #31
Damien Tournoud CreditAttribution: Damien Tournoud commentedRerolled one last time.
Comment #33
Damien Tournoud CreditAttribution: Damien Tournoud commentedOups. I was returning a language object instead of a language code (and the unit test did the same).
This one should pass :)
Comment #34
markus_petrux CreditAttribution: markus_petrux commentedI think the preg_match_all() operation and the next foreach loop could be simplified a little if strtolower() is applied to the Accept-language header.
Comment #35
chx CreditAttribution: chx commentedWe talked on IRC and Damien will rewrite his C code in Drupal-style PHP.
Comment #36
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis should do it.
Comment #37
chx CreditAttribution: chx commentedisnt
shorter?
Comment #39
drubage CreditAttribution: drubage commentedSubscribing. For my purposes just adding this line works for now:
Although better logic would be to first search on the full langauge (like for Brazilian Portuguese it would look for "pt-br" and then, if not found, it would search for just the base code "pt").
-Drew
Comment #40
c960657 CreditAttribution: c960657 commentedIt took me a while to grasp the RFC, but I think the patch follows it closely—with one exception: A language-tag may contain any number of subparts, e.g. en-GB-oed (Oxford English Dictionary spelling), and there are a few of these in the official language registry. I don't know if they have any use in practice, though. I guess we can ignore them for now—any sane user would supply en-GB in addition to en-GB-oed.
Elsewhere in core (with one exception) we write “RFC 2616”, i.e. with a space.
This should use
elseif
instead.I think $weight is a bad name for a variable containing a q-value. The term weight is used in a comment in the same function to describe the priority of the enabled languages.
In the database, {languages}.weight is an integer column. I think we should use integers here too.
How about a few tests that contain both fr and fr-CA. Contrary to en/en-US, the most specific has the highest priority.
Comment #41
Bodo Maass CreditAttribution: Bodo Maass commentedI'm still not convinced that Internet Explorer 8 implements the RFC correctly. On my new install of a german Windows 7, IE8 lists only the following language preference:
Deutsch [de-DE]
That's it. No [de] to fall back on. The interface will now let me add Deutsch [de] if I go to the language settings of the browser, but the average user probably won't do that, and is probably even less likely to have read (and understood) the RFC.
Interestingly, my IE8 install on German Windows XP, which was upgraded from IE6 and then IE7, does not list the single language entries as options in the settings.
In conclusion, I'm all for supporting standards, but the reality is (as we all know), that few site owners would insist on following the standard at the cost of breaking the site for IE users. Hence I strongly request (if that's not happening yet) that some of the tests reflect the default installations of IE8 to make sure that those users get to see the correct language.
Comment #42
markus_petrux CreditAttribution: markus_petrux commentedYes, I also saw non-standard things in IE8 with Spanish related options. Quoting myself from #29:
I think the parser should be prepared to accept "_" as a separator between "major-minor" language codes, also, it should be ready to deal with "aa-XX" or "aa_XX" with a higher q than "bb", and fallback to "aa" when necessary.
Comment #43
tomsm CreditAttribution: tomsm commentedsubscribing
In Belgium we have the same problem. We have "nl-BE" or "fr-BE" as preferred languages in our browser.
Drupal does not recognize this and shows the English frontpage.
I added the code of #29 to settings.php. Works great with Firefox 3.6 and IE8.
Comment #44
Damien Tournoud CreditAttribution: Damien Tournoud commentedFollowing c960657 review (thanks!), here is an improved patch with (1) fixed code style, (2) support for language codes with multiple dashes, (3) improved test coverage.
I replaced the strict regexp with a more lenient (but more readable) one. I'm not sure we need to add special cases for IE8, at least not until we have proof that such broken behavior is actually in the wild.
Comment #45
CKoch CreditAttribution: CKoch commentedI can say that my fresh install of Windows 7 with IE8 does not use any single language codes, only fully qualified locales.
My browser sends:
["Accept-Language"]=>
string(5) "en-CA"
Testing out on a VM for testing,
I get the following:
["Accept-Language"]=>
string(5) "fr-CA"
The patch will need to address IE Behaviour of sending only locales and not any pure language codes.
Comment #46
plachretitling
Comment #47
xecstasyx CreditAttribution: xecstasyx commentedComment #48
tomsm CreditAttribution: tomsm commented#29 works fine on my site (running on IIS), but when I enable Performance >> Page Cache >> Normal the language is not changed correctly. When I clear the cache it works one time, then it is broken again.
When I disable page cache, it works fine.
How can resolve this?
On my local WAMP server I do not have this problem. So it has something to do with caching and Windows IIS.
Comment #49
plach@tomsm: are you are using D6? If so you are probably experiencing a known bug: #339958: Cached pages returned in wrong language when browser language used.
In D7 core browser language detection is skipped when page cache is enabled.
Comment #50
tomsm CreditAttribution: tomsm commented@plach: Yes, I am using the latest version of Drupal 6.
I have tested it again on WAMP locally. I also have the issue with WAMP.
So yes, I think that this is the reason.
Disabled in D7? Isn't page cache an important feature to improve performance?
I thought that page cache was only used for anonymous users in D6.
My default language is English, but I have many Dutch and French anonymous visitors.
Is disabling the page cache the best solution?
Comment #51
plachSure, but browser language is a form of user preference which does not play well with the very concept of page cache, AAMOF we cache pages only for anonymous users. See #339958-85: Cached pages returned in wrong language when browser language used for details.
AFAIK it's the only solution available if you want to use the browser preference. You might want to check #339958: Cached pages returned in wrong language when browser language used to find more answers.
Anyway, please let's continue in the other issue, we are off topic here.
Comment #52
tomsm CreditAttribution: tomsm commented@ plach: thanks for the info!
Comment #53
Damien Tournoud CreditAttribution: Damien Tournoud commentedNow implementing a fix for Internet Explorer: when the browser send a specific language tag (fr-CA) without the corresponding generic tag (fr), we implicitly add this generic tag. We take the lowest value of the specific tags as the value of the generic tag.
Comment #54
toemaz CreditAttribution: toemaz commentedD6 backport of #53 Thx Damien!
Comment #55
idflood CreditAttribution: idflood commentedHere is a patch file from D6 backport of #53. The backport seems to be working well but I haven't tested the 7.x-dev version. Thanks to Damien and toemaz!
Comment #56
GiorgosKtested #55
tested with cache off
OPERA 10.62 - en / el ("el-GR" not recognized as "el")
Chrome 6 - en / el
IE8 en / el / el-GR
IE7 en / el / el-GR
but this behavior was same as before the patch (me thinks)
tried with cache on normal
first visit to homepage determines language shown to every other visitor
is the caching compatibility handled by another patch ??
ALSO
how do you test #53 when i18n is not working yet for D7, i18n_variables are not working in D7 ?
Comment #57
plachBugs are fixed in the development version first, backported then.
The patch needs a reroll.
Comment #58
plach#53: 221712-fix-browser-language-detection.patch queued for re-testing.
Comment #59
plach@GiorgiosK:
See #339958: Cached pages returned in wrong language when browser language used.
Comment #60
loganfsmyth CreditAttribution: loganfsmyth commentedHere's a rerolled version against current HEAD and with -p1.
Comment #61
loganfsmyth CreditAttribution: loganfsmyth commentedTagging D8MI because we want to reuse this logic for #1260716: Improve language onboarding user experience.
Comment #62
Gábor HojtsyThanks for the reroll. This is an issue I've seen referenced at many places, and reviewing the code it seems to have covered the above concerns and includes a very good set of test coverage. IMHO good to go and still needs to be committed to D7 too (which has the same code, so should be easy to commit there too).
Comment #63
Lars Toomre CreditAttribution: Lars Toomre commentedLooking over the patch in #60, I see that the new test has been added to a DrupalWebTestCase type. It really belongs in a unit test type. If such a class does not exist yet for the locale module, I would suggest we create one here; otherwise, this test should be moved to an existing unit test class.
Comment #64
Gábor Hojtsy@Lars: perfect, can you help with updating the patch for that? (Locale module does not seem to have unit tests yet)?
Comment #65
Lars Toomre CreditAttribution: Lars Toomre commentedSorry I cannot roll a revised patch right now. I am trying to help by at least reading submitted patches and submitting comments on how they might be improved.
Comment #66
Lars Toomre CreditAttribution: Lars Toomre commentedLooking at the tests in #60 again, I am confused about how the weight variable is defined and being used in tests. I thought that the Drupal way to use weights was that items with heavier weights dropped in importance.
Hence, in the test in this patch, I would expect that 'en-US' would be preferred over 'en'. That the test results all pass suggest that my previous understanding was wrong. Is this use of weight correct?
Comment #67
Gábor Hojtsy@Lars Toomre: yeah, good find. The weights in the $languages array are in reverse order compared to how Drupal would apply them. *However*, if you look, locale_language_from_browser() will not look at the weights themselves, it will just assume that the weights are in the "right priority order". And the $languages arguments that this function gets comes from language_provider_invoke(), which uses language_list(), which returns languages in ascending weight order. So by the time a real language list arrives at the function, its in the right weight order. The test indeed lists weight numbers in reverse order, however, at this point only the order of the array items matter for it to work well, so it still proves it works well.
I've also noticed that ordering the browser languages by qvalue is not really useful with the current logic + we put in elements (at the end) after we ordered it, so we break up the order anyway. The code later is not dependent on order of $browser_langs (as far as I see), so we can just skip ordering it.
So looks like todo here:
1. Add a DrupalUnitTestCase extending test class to locale.test and put this method there.
2. Reverse the order of weights in the $languages list in the test or remove the weights altogether to avoid misunderstandings.
3. Remove the arsort() on $browser_langs.
@loganfsmyth: can you look into these?
Comment #68
Gábor HojtsyAll right, here is a quick update with those changes and also removing the enabled property on languages, which the function does not check either. It just assumes it gets the right data (because it does :), which I've documented in the phpdoc now. Good now? (Tests pass on my machine).
Comment #69
Gábor Hojtsy@Lars: any more feedback?
Comment #70
Lars Toomre CreditAttribution: Lars Toomre commented@Gábor - Sorry, I did not notice the revised patch before. I did not test this so only commenting from a logical read standpoint. This is looking quite good. I would suggest a couple of things here to polish the patch off:
1) Type hinting for docblocks are now encouraged and will become mandatory soon for D8. I would suggest this patch include type hinting for locale_language_from_browser() function.
2) Leave in comment explaining what $_SERVER['HTTP_ACCEPT_LANGUAGE'] is.
3) I believe you want to initialize $best_match_langcode to FALSE instead of NULL to match documentation. If $languages = array() in unlikely event were passed to routine, NULL now would be returned.
4) Change s/Unit tests for the language_from_browser() function./* Unit tests for the locale_language_from_browser() function./
5) As whole, unit test looks really good and is faster than a DrupalWebTest class. I would suggest adding tests for 'en ', ' en', 'EN', and 'english' for completeness. I am thinking that the first two should pass because we should be using a trim($_SERVER['HTTP_ACCEPT_LANGUAGE']) in the function.
6) I have not checked what various browsers pass back for $_SERVER['HTTP_ACCEPT_LANGUAGE']. Do they always used mixed case ASCII strings? Should we be doing comparisons always on lower-case strings?
7) Looking at the 'de,pl' case, I am now wondering if the desired return from the function is NULL instead of FALSE as documented.
8) Also now thinking start of function should be:
As I said above, the current patch looks good. These are just final polishing to make air-tight.
Comment #71
Gábor HojtsyAll right, here is an updated patch:
1) Please link in agreed standards. If its still in discussion, I prefer not to apply them, we have enough to debate about in this patch, looks like.
2) The patch does not remove any comments on that, in fact it adds several lines already above to explain what is there. I've added even more explanation, hope it looks good now: "The Accept-Language header contains information about the language preferences configured in the user's browser / operating system."
3) Initialized it to FALSE and changed tests to expect FALSE too as documented.
4) Done.
5) Added trim() and test coverage for case differences and whitespace even though the spec does not allow whitespace like you assume, we can cover that.
6) Yes, language tags are defined as mixed-case, see http://www.w3.org/International/articles/language-tags/. We do lowercase both the values coming from the header and the values in our language array, so we can compare both lowercased.
7) See 3)
8) Drupal does not babysit broken code like that. If we'd do that, we can just as well put a type hint in the function signature, right? We don't check for the internal structure of the individual language objects either. I did change the !isset() check to empty() though which should speed this up if it was not provided or was provided empty.
What do you think of this now?
Comment #72
Lars Toomre CreditAttribution: Lars Toomre commentedThanks Gábor. This patch now looks great!! I think it is ready to be committed assuming all comes back green.
The coding standards change for type hinting is detailed in #711918: Documentation standard for @param and @return data types. It is pretty much resolved, but reading that issue again I see that it will be only recommended for a period before being required. Everyone is encouraged to start doing so in the meantime.
Comment #73
Gábor HojtsyAll right, let's get this in then.
Comment #74
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe arsort() was required here, because of the logic that is described in the comment (we want "the lowest value of the specific tags"). Please restore it and extend the test cases to cover this.
Comment #75
Gábor HojtsyCan you suggest a test case to cover that?
Comment #76
Gábor HojtsyDiscussed this with Damien in IRC. We figured out that the arsort() was indeed "bugos" and that it picked the least important q value for the language instead of the most important one (which is IMHO what we want). Damien suggested some excellent test cases for this, which I've included in the code and added back the arsort() with updated comments.
Updated patch attached.
Comment #77
Gábor HojtsyDamien figures out with even more digging that we should take the lowest value for generic tags to conform as close to the 1.1 HTTP spec as possible. Updated comments, test and code to comply. Comment looks like this now:
Comment #78
Damien Tournoud CreditAttribution: Damien Tournoud commentedNice to see this moving forward. This looks like a good compromise between a fully conforming implementation and the real world :)
Comment #79
webchickWow, this issue is crazy! :)
This is an API change, but it looks like it is required in order to fix the problem here.
Committed and pushed to 8.x and 7.x. Thanks!
Does this deserve an API change notice?
Comment #80
Gábor Hojtsy@webchick: I don't know where this would be an API change. There are certainly no arguments or parameters changed anywhere. The patch fixes language detection to be more accurate to browser settings, that is it. No new APIs provided, no APIs made different.
Comment #81
webchickOk cool. I was confused by:
but it looks like I just read it too fast. Thanks!
Comment #83
Gábor HojtsyTagging for base language system.
Comment #84
Gábor HojtsyTagging for language negotiation too.
Comment #85
JonMcL CreditAttribution: JonMcL commentedI'm just wondering if anyone has had success applying this to D7? Maybe with some manual finessing to get it to path?
Comment #86
JonMcL CreditAttribution: JonMcL commentedNever mind my question above. It appears the patch is in place with v7-12!
Comment #87
Sephi77 CreditAttribution: Sephi77 commentedIf I have the following languages defined in my browser (in the following order):
en-zw
fr
en
Then the function returns fr instead of en, which is incorrect imo. Also this case is not covered by the tests.
Comment #88
Gábor Hojtsy@Sephi77: the order in itself is not enough to decide. Do you have the same q values for them too? In short, what is your full language preference header in a request?
Comment #89
Sephi77 CreditAttribution: Sephi77 commentedThis one causes "fr" to be taken (in my browser, en-zw is set to take precedence over fr):
en-zw,fr;q=0.7,en;q=0.3
But this one also causes "fr" to be taken instead of "en":
en-zw;q=0.8,fr;q=0.7,en;q=0.3
Comment #90
Gábor HojtsyWe have some follow-up issues for Chinese and language precedence in general. Please see #365615: Language detection not working correctly for most Chinese readers (and add a user interface for all browser language mappings) and review! Thanks.
Comment #91
Damien Tournoud CreditAttribution: Damien Tournoud commentedWe should consider backporting this to D6.
I just marked #1037900: language_from_browser failes on Chrome as a duplicate.
Comment #92
lmeurs CreditAttribution: lmeurs commentedIn the Accept-Language specification I could not find whether the order of language tags represents the importance of one language tag over another, but it seems logical to me for language tags with the same qvalue.
When ie.
Accept-Language: nl,en
(occurred on stock Android browser on Samsung Galaxy Tab), both languages get a qvalue of 1 inlocale_language_from_browser()
(includes/locale.inc, Drupal v7.19). Then the site's languages are iterated in the order of the languages' weights and the first language with qvalue 1 will be set as$best_match_langcode
. This way the order of Accept-Language language tags is ignored.I suggest Drupal should not assign qvalues of 1 to both languages, but ie. 2 for the first language tag and 1.9 for the second. I understand that the specification states that 1 is the top value, but with a higher value of 2 it will take a while before it might interfere with language tags that do have a qvalue.
The next code:
could become:
Does this make sense?
Comment #93
Gábor Hojtsy@lmerus: the spec says the qvalues are used to specify different priorities; with languages in the same priority, I did not find any specs either. I assume since you equally find those languages good, we can pick based on what the site prefers, which is what we do. If others believe this is not a good approach, we need to go back all the way up to Drupal 8 and fix there and then port downward.
Comment #94
lmeurs CreditAttribution: lmeurs commented@Gábor Hojtsy: I understand your point: I hadn't thought of when languages without a qvalue weigh equally, it's up to the website to decide which language to serve.
The reason why I had not thought of it and the reason why I read into this subject is that a client's tablet had a language tag of 'nl, en', without any qvalues. His tablet's UI was completely in Dutch, but his website was being served in English. It appeared to me that the order of languages should play part, but it makes reason not to.
My guess is that his tablet should have provided a qvalue for at least the 2nd langcode in his language tag. Is this behavior seen often?
By changing the order of languages (for languages of equal qvalues, 'nl' is preferred) and setting 'en' as default language (when a visitor has 'nl' nor 'en' in his language tag, 'en' is served) at
admin/config/regional/language
my problem seems sorted out without my modifications in the code.Thanks for the effort!
Comment #95
Garrett Albright CreditAttribution: Garrett Albright commented#77 causes some interesting but perhaps not incorrect behavior, as #89 mentions. To elaborate, given an Accept-Language header of "en-us,ja;q=0.7,en;q=0.3", which is silly but not impossible (particularly since Firefox lets you define language preference order in its preferences instead of just using the operating system's settings, and still has this option in its preferences even though it threw out the checkbox to disable JavaScript because we don't want those silly users to "break the web," scheez even Safari for iOS lets me toggle JavaScript on a whim, but I digress, and this isn't even under the Advanced tab BUT I DIGRESS)…
AND MOVE UP AND MOVE DOWN BUTTONS FOR MOVING LIST ITEMS IN THE TINY UNRESIZABLE LIST WIDGET BECAUSE DRAG AND DROP IS FOR CHUMPS.
Anyway, if your user agent does this and you visit a site that can serve pages in English ("en") and Japanese ("ja"), you'll get a page in Japanese. Now I suppose this is technically correct, since the site can't serve you a page in en-us and the browser is specifying that you would prefer to see ja more than en. Nonetheless it's sort of surprising behavior.
Should we bother doing something about it?
Comment #96
xjm