Problem/Motivation

There's inconsistency in the way core modules are rendering the language code LANGUAGE_NONE.

  • The locale module renders LANGUAGE_NONE as "All" (in locale_language_name(), locale.module line 739) or "All languages" (in locale_form_path_admin_form_alter(), locale.module line 315). This shows up e.g. when viewing path aliases at admin/config/search/path, where aliases with language code "und" are listed as having language "All"; or when adding a path alias, where the selection "All languages" corresponds to option code "und".
  • The node module renders LANGUAGE_NONE as "Language neutral", e.g. at admin/content. In fact it adds special cases (e.g. in node_filter_form(), node.admin.inc line 161) to use "Language neutral" in favor of locale_language_name()'s label "All".
  • Even the locale module adds exceptions to render LANGUAGE_NONE as "Language neutral", when it's associated with the node module (locale_form_node_form_alter(), locale.module line 372).
  • In at least one place (locale_translation_filters(), locale.admin.inc line 828), the locale module uses an ad hoc language code "all" to stand for "All languages".

The D7 API says that LANGUAGE_NONE is used to mean a language of "no language specifically assigned", or "undetermined". This seems to me to be distinct from "all". In the first case the language is unknown; it may be e.g. Vietnamese, we don't know. In the second case, the content is known to apply to all languages.

Proposed resolution

I think there should be separate codes for "not specified", "not applicable", and "multiple".

Parent issue

#1260488: META: Introduce real language APIs

User interface changes

There are no UI changes in this issue. The UI part will be handled in follow-up issues. Here's a related issue where the content type's settings UI is being reworked: #258785: Provide more flexible settings for initial language on content types

Also related: #1314250: Allow filtering/configuration of which languages apply to what (UI, nodes, files, etc)

API changes

None except the constant changes.

Remaining tasks

The patch in #36 just changes LANGUAGE_NONE to LANGUAGE_NOT_SPECIFIED, and introduces, but does not yet use LANGUAGE_NOT_APPLICABLE and LANGUAGE_MULTIPLE. After that patch lands, the next step will be to convert places that use LANGUAGE_NOT_SPECIFIED, but should use one of the other two, to do so.

Follow up issues once this one lands:

#1471432: Rework language_list(), let people use more special languages
#1471444: Review use of LANGUAGE_NOT_SPECIFIED vs. LANGUAGE_NOT_APPLICABLE vs. LANGUAGE_MULTIPLE

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Freso’s picture

Version: 7.0-beta2 » 7.0-rc1

Perhaps the ISO 639-2 code "mul" could be used for LANGUAGE_ALL? "all" shouldn't be used at least, as there is a language assigned to that code already.

droplet’s picture

Version: 7.0-rc1 » 7.x-dev
plach’s picture

Gábor Hojtsy’s picture

Category: bug » task
Issue tags: +D8MI

Tagging this under the umbrella of the Drupal 8 Multilingual Initiative. I agree LANGUAGE_NONE is confusingly used for two purposes at places. It would be useful to separate the two meanings. #1266318: Make English a first class language also introduces LANGUAGE_SYSTEM (with the value of 'system' solely to be used for interface translation purposes). However, LANGUAGE_NONE has no meaning in terms of interface translation, does it?

plach’s picture

However, LANGUAGE_NONE has no meaning in terms of interface translation, does it?

No, atm is used only for content.

Gábor Hojtsy’s picture

Ok, well, I've been thinking about this quite a bit recently. Depending on how you see it, LANGUAGE_NONE (undetermined) is what we mean in both cases mentioned (nodes and paths). Or rather, neither case :) for nodes, we mean "no language is applicable to associate", that is "there is nothing in this node that is in some language". For paths, we mean "this path applies to all languages", which is one kind of saying "no language is applicable to associate", right? So maybe we can say "not applicable" or "undetermined" in both cases? For the paths, we do use them for all languages, for nodes, it totally depends on your views and other content displays, not really in control of core at all. So maybe some people would use LANGUAGE_NONE as "all" for nodes, just like paths, depending on how they set up their views and apply this meaning. It is really not in our control. So I think using more neutral wording (like we did with "language neutral") is best. However, people don't really get "language neutral" IMHO.

What do you think?

Gábor Hojtsy’s picture

Copying @plach's feedback from :

@bojanz: for the use cases where you have language independent entities, that is what LANGUAGE_NONE is for and the language would default to that unless told otherwise I think, so you should not be bothered that it is there.

Slightly OT but I think we will need also a LANGUAGE_UNSUPPORTED (zxx) value for this, see http://www.w3.org/International/questions/qa-no-language#nonlinguistic.

Introducing one or more "no language" constant will surely let us distinguish between the different cases of no language, such as applied to "all languages", "independent of language", "unknown language", that might or might not be explained with the same concept. I'd expect this will bring some problems for DX (and also UX when we expose multiple "no language" options on the UI such as for fields).

What is the set of "no language" values we need? The following values were suggested (with some constant names that floated around):

- LANGUAGE_UNDEFINED?, LANGUAGE_NONE? - und
- LANGUAGE_NOT_APPLICABLE?, LANGUAGE_UNSUPPORTED? - zxx
- LANGUAGE_MULTIPLE?, LANGUAGE_ALL? - mul

?

Gábor Hojtsy’s picture

#1414314: Make node and path depend on language module only for language support, get rid of locale_language_name just standardized paths and nodes without language to simply say 'None' for 'und' (and '- None -' in the options). Should we repurpose this issue for introducing more no-language varieties now that the naming is standardized or do we already have an issue for that (we might, I could not find one).

@plach?

plach’s picture

I'm not aware of any open issue about this. I think we should introduce the concept of unsupported language for the reasons explained in http://groups.drupal.org/node/197848#overview (bullet #2), however probably we won't need to expose multiple no-language choices to our users, at least for the use case outlined there.

Gábor Hojtsy’s picture

Title: LANGUAGE_NONE is sometimes "Language neutral", sometimes "All" » Introduce more no-language options for different cases
Issue tags: +language-base

Repurposed this for that direction now. I think we might be better off opening a new issue though.

Gábor Hojtsy’s picture

Status: Active » Needs review
Issue tags: +sprint
FileSize
157.59 KB

Ok here is an initial patch to stir some conversation. Changes:

- changed LANGUAGE_SYSTEM => LANGCODE_SYTEM (same value)
- changed LANGUAGE_NONE => LANGCODE_UNDETERMINED (same value)
- added LANGCODE_NONLINGUISTIC with value zxx (but did not use it yet anywhere)
- added LANGCODE_MULTIPLE with langcode mul (but did not use it anywhere)

Just did the name changes whoesale in the code as per the two renames. I think most of the time we had LANGUAGE_NONE before, we actually meant the zxx non-linguistic meaning (ie. we specifically say it does not have language meaning) vs. the und langcode which means we don't know about its language. So sounds like most places we'd want to actually use LANGCODE _NONLINGUISTIC instead of LANGCODE_UNDETERMINED (and update existing data accordingly)? (Also sounds like we'd want to expose some UI for that right? and describe it in language_name() for the UI).

The patch was mostly a mass search and replace, and with places using LANGUAGE_NONE identified, we can think about how to change them.

effulgentsia’s picture

My first reaction to #11 is a big +1. Seems like this is a great first step towards following the ISO standard better, and I recommend we try to get this patch to land as quickly as we can, and then open follow-ups for changing from LANGCODE_UNDETERMINED to LANGCODE_NONLINGUISTIC where appropriate. Here are some minor things I found, but leaving as "needs review", because I think this patch still needs more substantive feedback from plach and/or others heavily involved in the D8MI work.

+++ b/core/includes/bootstrap.inc
@@ -162,18 +162,32 @@ const DRUPAL_AUTHENTICATED_RID = 2;
 /**
- * System language (only applicable to UI).
+ * Special system language code (only applicable to UI language).
  *
  * Refers to the language used in Drupal and module/theme source code.
  */
-const LANGUAGE_SYSTEM = 'system';
+const LANGCODE_SYSTEM = 'system';

This comment is an improvement, but still wasn't clear to me. I had to ping Gabor, and he explained to me that this refers to "Drupal English", and that a site's "English" language might be a customization of this if the locale_translate_english variable is set. Anyway, I wonder if we can add this information to this comment concisely.

+++ b/core/includes/bootstrap.inc
@@ -162,18 +162,32 @@ const DRUPAL_AUTHENTICATED_RID = 2;
- * The language code used when no language is explicitly assigned.
+ * The language code used when no language is explicitly assigned (yet).

I think this comment can be further improved too.

+++ b/core/includes/bootstrap.inc
@@ -162,18 +162,32 @@ const DRUPAL_AUTHENTICATED_RID = 2;
+/**
+ * The language code used when multiple langugaes could be applied.
+ *
+ * Defined by ISO639-2 for "Undetermined".
+ */
+const LANGCODE_MULTIPLE = 'mul';

Would it be useful for us to add an example to this comment (e.g., a PDF file of a manual written in multiple languages) or should we defer that until the issue where we first use this new constant in core?

Also, "Undetermined" is not correct here, is it?

+++ b/core/modules/field/field.multilingual.inc
@@ -17,10 +17,10 @@
- *   installed languages with the addition of LANGUAGE_NONE. This default can be
+ *   installed languages with the addition of LANGCODE_UNDETERMINED. This default can be

Line now exceeds 80 chars, so please wrap.

+++ b/core/modules/field/field.multilingual.inc
@@ -87,7 +87,7 @@ function field_language_delete() {
- * languages will be returned, otherwise only LANGUAGE_NONE will be returned.
+ * languages will be returned, otherwise only LANGCODE_UNDETERMINED will be returned.

Line now exceeds 80 chars, so wrap.

+++ b/core/modules/locale/locale.test
@@ -2699,7 +2699,7 @@ class LocaleCommentLanguageFunctionalTest extends DrupalWebTestCase {
-      $langcode_none = LANGUAGE_NONE;
+      $langcode_none = LANGCODE_UNDETERMINED;

s/$langcode_none/$langcode_undetermined/.

+++ b/core/modules/menu/menu.test
@@ -622,7 +622,7 @@ class MenuNodeTestCase extends DrupalWebTestCase {
-    $language = LANGUAGE_NONE;
+    $language = LANGCODE_UNDETERMINED;

s/$language/$langcode/

+++ b/core/modules/translation/translation.module
@@ -140,7 +140,7 @@ function translation_form_node_form_alter(&$form, &$form_state) {
-      $options = array($groups[1] => array(LANGUAGE_NONE => t('- None -')));
+      $options = array($groups[1] => array(LANGCODE_UNDETERMINED => t('- None -')));

Is this where we want LANGCODE_NONLINGUISTIC instead? If that needs to be done in a separate issue (e.g., upgrade path complications), can we at least add a @todo here?

2 days to next Drupal core point release.

plach’s picture

@effulgentsia:

I didn't review the patch yet, but from the summary I'd say it's what we want :)

[...] and then open follow-ups for changing from LANGCODE_UNDETERMINED to LANGCODE_NONLINGUISTIC where appropriate.

Many places currently using LANGUAGE_NONE should not become LANGUAGE_NONLIGUISTIC, AAMOF many fields which in D7 are (initially) marked as non translatable, and thus getting the 'und' language code, are actually linguistic fields, for instance taxonomy terms.

I tried to explain how I would approach this matter in http://groups.drupal.org/node/197848#overview.

sun’s picture

... become LANGUAGE_NONLIGUISTIC, ...

Totally no pun intended on @plach ;), but the constant name is not only hard to remember, it's also very hard to type correctly. I totally know it's the official label for the langcode, but could we find a simpler constant name? :)

Gábor Hojtsy’s picture

@sun: I was thinking of a simpler name for a while but did not find one + did not want to repeat the mistake we did with LANGUAGE_NONE where it is understood both as "the user did not tell" and "cannot be related to language". If this constant would be NO_LANGUAGE let's say, it would have the same double meaning no? Maybe LANGCODE_NOT_APPLICABLE? I'm not sure we are best served by going that far from the standard.

effulgentsia’s picture

I'd be okay with LANGCODE_UND and LANGCODE_NA for distinguishing undetermined from not applicable, if people think that's better DX. I'm not yet fully convinced that the reduced typing is worth the slight increase in obscurity though.

Gábor Hojtsy’s picture

Any other opinions on constant names?

plach’s picture

Maybe LANGUAGE_NOT_SPECIFIED and LANGUAGE_NOT_APPLICABLE?

I'd stick with the LANGUAGE prefix instead of the LANGCODE one as LANGCODE_NOT_SPECIFIED (and friends) sounds odd to me: it's the language that is not specified, not the language code.

Gábor Hojtsy’s picture

I agree LANGCODE_NOT_SPECIFIED looks odd, since you specify it as a langcode :) However these might also look pretty odd no?

$langcode = LANGUAGE_NOT_SPECIFIED; 

something(LANGUAGE_NOT_SPECIFIED);
function something($langcode) {
}

foreach ($something as $langcode => $value) {
  if ($langcode == LANGUAGE_NOT_SPECIFIED) {
  }
}

Looks like we need to choose non-ideal naming anyway. One plus argument is that LANGUAGE_* would keep the constant names in line with other names in the language system.

plach’s picture

I agree that on a first sight the line below

  if ($langcode == LANGUAGE_NOT_SPECIFIED) {

reminds our old inconsistent pattern, but in my mind this reads as "if the language code is/equals to/means 'language not specified'" which in turn makes perfect sense. Moreover the LANGUAGE_* prefix automatically defines the constant as part of the language module/namespace.

Gábor Hojtsy’s picture

Ok, I'm fine with going LANGAUGE_*, we still need to fix the remaining parts of the constants though :) Let's see what others think about your proposals.

effulgentsia’s picture

Hm. In the {users} table we now have a "preferred_langcode" column. That sounds less natural to me than "preferred_language", but I though being consistent with when to use langcode vs. language was important.

Moreover the LANGUAGE_* prefix automatically defines the constant as part of the language module/namespace.

The constants are defined in bootstrap.inc (not language.inc), and used all over the place, whether language.module is enabled or not.

effulgentsia’s picture

That said, we still have LANGUAGE_LTR and LANGUAGE_RTL constants that do not refer to a $language object. Maybe for constants, it's implied and therefore okay for LANGUAGE to not refer to an object.

plach’s picture

The constants are defined in bootstrap.inc (not language.inc), and used all over the place, whether language.module is enabled or not.

Yes, that's why I mentioned the language namespace: the only reason those constants are currently defined in bootstrap.inc is that they need to be available very early, but ideally they belong to language.inc and thus to the language subsystem.

Maybe for constants, it's implied and therefore okay for LANGUAGE to not refer to an object.

I think this could be a sensible way to present the distinction between the two patterns.

Gábor Hojtsy’s picture

Ok, let's keep LANGUAGE_* then. Here is a new patch with LANGUAGE_NOT_APPLICABLE and LANGUAGE_NOT_SPECIFIED as per the discussion. Also made the changes @effulgentsia suggested (lot more docs on the constants, more reference to W3C docs - also cross-checked the W3C docs about these language codes). Also wrapped lines where he noted and changed variable names.

I did not change any of the use of the 'und' value, I think that we could add @todo all around core, basically we should revisit every use of LANGUAGE_NONE as it was before to see if our extended support for special language codes now still make that the best choice or not. So I think the new constant name in itself is a TODO, no? :) Also, we'll need to make UIs available for people to assign things like 'mul', 'und' and 'zxx' separately.

Also found some newly added LANUAGE_NONEs since the last patch, converted those too. It is a huge patch so would be nice to get it in sooner than later, but agreement would be vital first :)

Gábor Hojtsy’s picture

Found two typos in the 'mul' phpdoc block. Heh :)

Status: Needs review » Needs work

The last submitted patch, more-special-langugae-options-26.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
161.54 KB

Don't know how I missed node.test, likely brand new stuff :)

effulgentsia’s picture

Title: Introduce more no-language options for different cases » Change LANGUAGE_NONE to LANGUAGE_NOT_SPECIFIED; add LANGUAGE_NOT_APPLICABLE and LANGUAGE_MULTIPLE
Assigned: Unassigned » jhodgdon
Status: Needs review » Reviewed & tested by the community

Thanks. RTBC and assigning to jhodgdon for final sanity check of the new constant names.

effulgentsia’s picture

Issue tags: +Coding standards

Also tagging with "coding standards", since I think changing a widely used constant name relates to that, sort of.

Dave Reid’s picture

I feel like this could use an issue summary as well since it doesn't seem to be at all current to the latest patch.

jhodgdon’s picture

Assigned: jhodgdon » Unassigned

I think the constant names look good.

I took a quick scan throught the patch, at least the top. One thing I noticed was that these types of lines look very suspicious to me:

+        elseif ($langcode > LANGUAGE_NOT_SPECIFIED) {
+          elseif ($langcode < LANGUAGE_NOT_SPECIFIED) {

These codes are all strings... At least if you are going to compare them with < you might add a code comment explaining why that makes sense?

jhodgdon’s picture

Issue summary: View changes

Add parent.

effulgentsia’s picture

Issue summary: View changes

Updated issue summary.

effulgentsia’s picture

Re #31, I updated the last paragraph of the summary to clarify.

Re #32, those are pre-existing comparisons not being changed by this patch, only the constant name is being changed. The drupal_lookup_path() function they're in does have inline comments explaining the rationale:

// Always get the language-specific alias before the language-neutral
// one. For example 'de' is less than 'und' so the order needs to be
// ASC, while 'xx-lolspeak' is more than 'und' so the order needs to
// be DESC.
effulgentsia’s picture

Issue summary: View changes

Updated issue summary.

effulgentsia’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Assigned: Unassigned » xjm

This change makes way too much sense. :)

I found a couple of lines of documentation over 80 chars. I'll reroll to fix them:

+++ b/core/modules/field/field.multilingual.incundefined
@@ -163,13 +164,13 @@ function _field_language_suggestion($available_languages, $language_suggestion,
+ * The languages that may be associated to fields include LANGUAGE_NOT_SPECIFIED.

@@ -254,7 +255,7 @@ function field_valid_language($langcode, $default = TRUE) {
+ * display language to be used is just LANGUAGE_NOT_SPECIFIED, as no other language code

+++ b/core/modules/simpletest/tests/upgrade/drupal-7.language.database.phpundefined
@@ -534,7 +534,7 @@ db_update('system')->fields(array(
+// is a simple node with LANGUAGE_NOT_SPECIFIED as language code. The second node is a

There's also lots of assertion messages being translated, but most of those are just assertion lines being updated with the new constants, so leaving them alone.

xjm’s picture

Oh and tagging per @Dave Reid's request.

xjm’s picture

Assigned: xjm » Unassigned
FileSize
3.28 KB
161.92 KB

Rewrapping the documentation lines as above, plus one other small fix to the affected lines.

effulgentsia’s picture

Thanks, xjm. Removing tag, because prior to #33, I already updated the last paragraph of the issue summary.

Gábor Hojtsy’s picture

I've added #1471432: Rework language_list(), let people use more special languages and #1471444: Review use of LANGUAGE_NOT_SPECIFIED vs. LANGUAGE_NOT_APPLICABLE vs. LANGUAGE_MULTIPLE as followups to do once this lands. This is a huge patch, so would be great to see in so we can proceed using/exposing these :)

Gábor Hojtsy’s picture

Issue summary: View changes

Updated issue summary.

klonos’s picture

Thanx Gábor. I've updated the issue summary:

- "Conform" to the issue summary standards (no "Original report by [username]" section though because there were only minor changes and some moving around to place things in the proper sections AFAICT).
- Linking to related & follow-up issues.
- Changed patch reference to link to the latest patch in #36.

Please check to see if I got everything right and also someone update the "API changes" section (I couldn't possibly know what to write there). Thanx in advance.

klonos’s picture

Issue summary: View changes

...look at #39 for the changes.

Gábor Hojtsy’s picture

@klonos: updates to summary look good, noted in the API changes that except the constant changes there is nothing else (but the constants are part of the API).

klonos’s picture

Thanx for taking the time to go through it.

...but the constants are part of the API.

Just to get this one straight: the ultimate conclusion is that this is not to be backported to 7.x. Right?

Gábor Hojtsy’s picture

@klonos: right, Drupal 7 does not allow API changes like this. Contrib modules are full of LANGUAGE_NONE and assumptions about what it means in D7.

catch’s picture

+++ b/core/modules/book/book.testundefined
@@ -237,7 +237,7 @@ class BookTestCase extends DrupalWebTestCase {
-      $this->assertRaw(check_markup($node->body[LANGUAGE_NONE][0]['value'], $node->body[LANGUAGE_NONE][0]['format']), t('Node body found in printer friendly version.'));
+      $this->assertRaw(check_markup($node->body[LANGUAGE_NOT_SPECIFIED][0]['value'], $node->body[LANGUAGE_NOT_SPECIFIED][0]['format']), t('Node body found in printer friendly version.'));

This patch would be so, so much smaller if we were using field_get_items() to check field values consistently, I'll open a follow-up. The actual change looks fine though.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: -sprint

@catch: we have an issue for introducing getters/setters on entities for that, so moving to an interim API does not seem like the best idea. #1277776: Add generic field/property getters/setters (with optional language support) for entities is the issue to follow/help for that.

I've seen you committed this at http://drupalcode.org/project/drupal.git/commitdiff/b9e924417127c7e0e3d0... so added changelog note at http://drupal.org/node/1473200 and closing this one.

catch’s picture

Issue tags: +sprint

Gabor pointed out we have #1277776: Add generic field/property getters/setters (with optional language support) for entities. Committed/pushed to 8.x.

I think we could actually backport the LANGUAGE_NOT_APPLICABLE constant to 7.x so people can start using it if they want - just set it to the same value as LANGUAGE_NONE.

Gábor Hojtsy’s picture

Issue tags: -sprint

Yes, maybe we can backport the introduction of the LANGUAGE_NOT_APPLICABLE constant itself, but leave LANGAGE_NONE around and used as always. I think just leaving LANGUAGE_NONE and not introducing the new constant might be better though since then we'll not have differences in different modules which would lead to misunderstandings, "this module is only compatible with Drupal 7.Z+" type of issues, etc...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Gábor Hojtsy’s picture

All right, well, LANGUAGE_MULTIPLE tested horribly and it served to confuse users in all kinds of different ways. Therefore: #1869292: Remove confusing "multiple" language from core. (Yay for experimentation, space for testing ideas and correction when failures are found).

geerlingguy’s picture

Status: Closed (fixed) » Needs work
Issue tags: +Needs change record

Could we get a change notice for this? I spent about 20 minutes trying to figure out why a particular test case was failing before finding that the constant was changed—there seems to be no mention of the change currently: http://drupal.org/list-changes/drupal?to_branch=8.x

Gábor Hojtsy’s picture

Status: Needs work » Closed (fixed)
Issue tags: -Needs change record

@geerlingguy: a change notice was added almost a year ago, see above (http://drupal.org/node/1473200). Anything missing from there?

Gábor Hojtsy’s picture

Issue summary: View changes

Fill in API changes part.