Problem

  • Too many things are called "language". That's very confusing for someone changing or using the APIs.

Goal

  • Use clearly and consistently separated names everywhere, by renaming some of the names.

Details

  • $language is a global object that contains "all details" for the (negotiated) language of the current request.

    This global might go away with the D8 context initiative. But we need to have a name for it in the meantime.

  • $language is a language identifier string in some locations. E.g., locale.module, locale.test, and some Field API functions.
  • $language->language is a language code if $language is an object.
  • $langcode is a language identifier string in most existing places.

    Alternatives would be $language_code or $language_tag. But the variable name should ideally be identical to schema column names.

  • 'language' is the language identifier string in database schema columns. E.g., {language}, {node}, {locales_target}, {date_format_locale}, etc.
  • $language is a language object but not the global $language in some locations.
  • Competitive analysis showed that most other frameworks call a language identifier "locale" -- which may be technically incorrect, since it is a "locale identifier" as well.
  • The W3C calls language identifiers "language tags". This terminology could be easily confused with HTML tags.
  • We need a proper analogy to the following pattern:
    $uid     => {language identifier}
    $user    => {global language}
    $account => {a language}
    

    as well as:

    $account = user_load($uid);
    $language = language_load($language);
    
    $account = user_load($node->uid);
    $language = language_load($node->language);
    
  • We need names for the following pieces of data:
    Language identifier
    Currently called $langcode, $language, $language->language, and $object->language in code, and always 'language' in the database schema. Has implications for most objects in Drupal; entities, fields, nodes, users, etc.
    Global language object
    Currently called $language. Prevents using $language with different meaning in the same local code scope.
    Language object
    Currently called $language. Too easily mistaken with global language object.

Proposed resolution

  • // Language code ('en', 'de', 'pt-br'):
    $node->langcode;
    
    // Loading language object (object based on {language} table columns by language code):
    $language = language_load($langcode);
    
    // Language code on various objects in Drupal:
    $language->langcode;
    $node->langcode;
    $user->langcode;
    
    // Language object for global consumption
    global $language_interface;
    

Related issues

#1215716: Introduce locale_language_save()
#1222194: Rename global $language to $language_interface
#1220964: Number field prefix/suffix get t()'ed through format_plural()
#1218650: Separate language and locale. Proper languages for content translation.

Some of these issues depend on the outcome of this discussion.

CommentFileSizeAuthor
#4 langcode-to-locale.patch408.53 KBGábor Hojtsy
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jose Reyero’s picture

I'd suggest:

- $language for a language object
- $locale for a language code

Jose Reyero’s picture

On top of that two we could use a function to refer to global language object so it can take some parameters and return the right object on each case.

language(), global language object
language('content')
language('interface')

So if we want to get the language code for the interface language it would be:

language('interface')->locale

Gábor Hojtsy’s picture

On $locale, wow so simple, how could we have missed that so far? :) /me banging head onto desk

Gábor Hojtsy’s picture

Status: Active » Needs review
FileSize
408.53 KB

Here is a quick langcode -> locale mass replace patch for kicks. I did fix one or two ocassion of $languages where it was actually $locales, but this patch is 99.9% automatically generated with search and replace in 2 minutes. Wondering how it fares for tests :)

Note that this *does not* update the db column and the language code in language objects, which is called language, that is not as easily automated, and this is already an impressive one :)

sun’s picture

- $language for a language object
- $locale for a language code

Sorry, but that's completely off for me.

  1. Language means a spoken or written language (i18n).
  2. Locale summarizes many different aspects of localized formats (l10n); i.e., dates, numbers, currency, etc.
  3. Both have a language code. Whereas for Locale, the country part in a language code is more important (if any).
  4. A language code only means a string literal denoting a certain language name, as defined in ISO 639.
  5. Locale rules can be the same for multiple languages.
  6. A language can be the same for multiple countries and locales.

Thus, calling a language code "locale" is utterly wrong IMHO.

I'd agree however with renaming all instances of $language to $langcode where only a language code is passed. Especially Field API introduced lots of inconsistencies here.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

@sun: ok, that would mean you'd rename the language column I guess to langcode in the languages table as well?

One of the reasons I have opened this issue is that one of the things Drupal does not do if at all possible is to shorten names. There is no "usrname" or "accnt" or "pgtitle" or "ntype", etc. But there is a langcode. And numerous places in the code avoid using langcode because it looks dumb, and its against rules Drupal uses consistently at other places. It is also way not consistent with how the database names it 'language' in the languages table, in the locales_target table, in url_alias, field tables, etc. There is no langcode anywhere in schemas that I found on a quick review (which made the above ginourmous search and replace pass tests just so nicely :).

There is some "locale" in schemas, like the table name date_format_locale (amazingly), but that then has a "language" column. Haha.

The fundamental problem is that we call "language" both the short string value (the language code) in the database and many places in the APIs and the full language object. We don't do this anywhere else in Drupal that I know. A node is an object, with an identifier that is nid. A vocabulary is an object with identifiers vid and machine_name.

Theoretically "langcode" (language as a short string) is just a machine name for the language, right? So if we'd follow how recent modern subsystems were done, we'd call it a machine_name. That is only workable in the language table though, and we in fact need a short name to call this everywhere a language might appear, which is increasingly more and more places, but is already far and wide (and currently called "language" at all places). The conceptual problem is that we read in a "language" value form the database but then we still need to load the "language" to get a "language" object out of it, right?

Finally, the problem is worsened by the global $language, which makes it dangerous to actually use the variable $language to a language object you loaded from the database (like the classic $user and $account problem). There are quite a few places this could be an issue in code.

So now we have a language column in the database, which we read in, but any data that we compare it to should be called $langcode to be "consistent". Then to get an actual language, we still need to load the language object, but we should not call that $langauge either due to the global. Looking for a good name for the language code and then either changing the global or looking for a suggestion for local langauge object names. Just trying to replicate/apply the $uid, $user, $account pattern here. Currently for languages the equivalent of $uid can be $language or $langcode now, $user is $language and $account can be $language.

(BTW I do think locale modules' locales_source and locales_target does not make sense either as names, even if you put aside the grammer problem embedded, but that will be a different issue).

Gábor Hojtsy’s picture

Chatted with @sun about this in IRC and we did agree that we do not fit the definition for 'locale' well even with locale module. I wanted to rework locale module's 4 sides for a while as well, but that will in itself will not solve this in any way. I do not agree 'langcode' is a good name, 'language_code', 'locale_id', etc are possibly but also ugly names :) We do not usually call machine names code and we do not usually call string names IDs in Drupal. I've opened a discussion and put this into perspective with overall plans for locale module that I'm thinking about at http://groups.drupal.org/node/161589. Hope to get more feedback here.

janusman’s picture

IIUC this issue is mostly about Developer eXperience =)

I think sun is correct in that "locale" means a different thing than "language code". We should use "$langcode" arguments and local variables for cases when it means the ISO-????-compliant string for language. Maybe we should even call it $langcode_isoXYZ (I forget the ISO number for that) if that indeed is what it is, and to make the distinction between other common language code formats (like "es_ES").

Re: the column 'language' under the languages table... I lean towards renaming to "language_id" or "lid" (like we have nid and vid for node & taxonomy vocs.) as gábor mentions that it's just a way to identify a language object. However, we also have 'lid' in locales_source/target tables.

Gábor Hojtsy’s picture

@janusman: lid is incorrectly used in the locales_* tables. I'd say it is unlikely that the locales_* tables will keep their names or their columns as-is in Drupal 8, so we can consider that a non-issue. For calling the language code an ID, other places where Drupal says something is an ID, it is a sequential number. Nid, uid, vid, tid, etc. Language codes are not like that. So it sounds bad for DX to say it is an ID isn't it? Some other named identifiers are called "machine name" while others like the 'node_type' table call it 'type' and then that is used in the node table as 'type' as well. It is not a node type id or ntid or something :) It's just type. Maybe that helps illuminate some of the background.

kika’s picture

Both $langcode and $language_code make sense, the latter is longer but bearable.

plach’s picture

If we limit this discussion strictly to the matter introduced by the OP, my suggestion to match the $uid, $user, $account pattern is to start using the name the w3c uses for language codes, i.e. language tags. We could then have a tag column instead of the language one. This would lead us to $tag and $language->tag. The global name clash is still there but, since language types are module definable, we could simply switch to an official language_ prefix and have something like: $language_ui, $language_content, $language_url. To recap:

$uid, $user, account
$tag, $language_ui, $language

Anyway, I'd like to see us introducing (at least very basic) built-in locale support, and love to have a $locale global allowing us to do something like:

$locale->language->tag
$locale->date_format->short
$locale->currency
$locale->timezone
...

Probably the WSSCI will need address this.

plach’s picture

While working on translatble fields I always wondered if being able to store field languages as numeric language ids could boost performance: this might be another reason to introduce a $languages.lid column, which would lead us to:

$uid, $name, $user, $account
$lid, $tag, $language_ui, $language

Btw, I think that the {languages} table should be renamed to {$language} for consistency.

Gábor Hojtsy’s picture

@plach: Yes, the table should be renamed. On introducing a numeric id, well, that might indeed speed up lookups a bit, however, you'll need to look up the language data for the id then in userspace (of PHP), which might negate the effect (and also complicate DX), no? Pretty hard to tell and a sizable patch to work on just to be able to tell the performance implications.... Hm. At least in terms of DX I think always needing to look up $lid vs the language code in one way or the other sounds like a pain. You usually want to work with language codes in templates, etc. Also what do you mean with the $name vs $tag pair? You suggest $langcode becomes $tag?

Edit: missed you had two comments, disregard the $tag question here, seen it above.

Gábor Hojtsy’s picture

@plach: for that locale support question, do you have an issue for that? Sun said "you are working on it" which might mean a wide range of things :) I'd love to have some questions on that if you have a place.

plach’s picture

@gabor:

Pretty hard to tell and a sizable patch to work on just to be able to tell the performance implications.... Hm.

Yes, the gain might not match the effort after all.

You suggest $langcode becomes $tag?

Or $language_tag which is one char shorter than $language_code ;)

sun’s picture

There are two factors that make the 'language' as integer abstraction obsolete (or not measurable):

  1. Field data is cached. We don't query/filter the data on most requests to begin with.
  2. The 'language' column is indexed, and there's only small amount of possible string values. Storage engines can make full use of the index, reducing result sets in no time.

    I don't have exact numbers at hand, but you can try it out yourself: Insert one billion rows with a small variation of (short) 'language' values into a table and use a language in WHERE. Then update the data to add many arbitrary 'language' values -- at some point, the index will blow up and the WHERE reduction will resort to a filesort.

plach’s picture

Ok, let's skip lids then, what about tags? ;)

@gabor (#14):

No code, just ideas like the one in #11, I can post an issue if you think this might fit into d8mi.

Gábor Hojtsy’s picture

Some competitive analysis:

- ezPublish's component library ezComponents calls language codes locales: http://ezcomponents.org/docs/tutorials/Translation (note that this is now Zeta components under the Apache Foundation: http://incubator.apache.org/zetacomponents/documentation/trunk/Translati... - same code)

- Zend Translate also says locales: http://framework.zend.com/manual/en/zend.translate.using.html

- Wordpress calls them locales: http://codex.wordpress.org/Translating_WordPress

- CodeIgniter does not seem to have a name for it at all.. http://maestric.com/doc/php/codeigniter_i18n

- CakePHP call them locales but use ISO 629-2 (three letter) language codes: http://www.sanisoft.com/blog/2007/06/09/multilingual-apps-with-cakephp/

- Symphony 1 calls the *cultures*, yes: http://www.symfony-project.org/book/1_0/13-I18n-and-L10n

- Symphony 2 (current version) calls them locales: http://symfony.com/doc/current/book/translation.html

So looks like the ones that use locales agree that locales are language code + '_' + (optional) country code. Drupal supports that just as nice, there was only a request for 3 variants of Portuguese so far and British English. So in this sense en_GB is not a langcode at all either :) It is not a locale technically, it is an identifier for a locale, but sounds like the competition leans heavily to call it a locale as is (and if you look at the links most of them has no support for currency, decimal points and such in there in fact).

Jose Reyero’s picture

I tend to like the idea of having a $language object and a $locale object.

So, new suggestion:

$language_code for a variable containing a language code.
$language->code for the Language object's language code.
'language' for any other db field being a language code because it references that, a language.

Then there's the question of whether we would like to have different content for locale variations of languages or just for languages. Maybe life would be easier if we had both concepts clearly separated and:
- English content 'en'
- English/US interface language (locale, en_US)

Since we are reworking the whole thing, let's get it right.

sun’s picture

Then there's the question of whether we would like to have different content for locale variations of languages or just for languages. Maybe life would be easier if we had both concepts clearly separated and:
- English content 'en'

Note that language codes including country also apply to content language -- for example, a content can be written in American English (en) and British English (en-gb). While that might sound edge-casey and nitpicky, I've seen some excellent sites in the wild that actually provide this.

Also note that, to my knowledge, the country extension is officially (as in RFC) separated with a hyphen, not a underscore.

Jose Reyero’s picture

@sun,

Yes, that (American / British English) content may be a feature. Which would be nice if we could support the more usual case of having English content and American/British localized UI.

That is not the case though and it looks like we'd need to choose between both options.

Supporting edge cases is nice if you already support the most common ones.

(I've never seen such a site btw, nor Argentina/Spain Spanish content either).

plach’s picture

I don't understand why tag is not being even taken into consideration: it's the closest thing we have to a standard specification. The fact that most other platforms call it locale does not sound as a compelling argument to me:

Terminology
In this article we refer to the value of a language attribute such as fr-CA as a language tag. The fr and CA parts are referred to as subtags when described as parts of a tag. When described as members of an ISO list of languages or countries, fr and CA are referred to as codes.

(from http://www.w3.org/International/articles/language-tags/)

Some more links:

http://www.rfc-editor.org/rfc/bcp/bcp47.txt
http://www.w3.org/International/core/langtags/rfc3066bis.html
http://en.wikipedia.org/wiki/IETF_language_tag

I'd like to read a good reason to choose another name.

Gábor Hojtsy’s picture

@plach: do you mean 'tag' or 'language_tag' would be the column name in the node table, fields tables, url_alias, etc? Would this be clear to developers? I do see how the W3C standardized on this terminology, but have not seen it in implementations anywhere. That sounds like it affects the DX for people coming from any similar software. We can of course be the leading force here if people agree, but we should accept the education component involved then IMHO.

plach’s picture

@gabor:

do you mean 'tag' or 'language_tag' would be the column name in the node table, fields tables, url_alias, etc? Would this be clear to developers?

As I was saying above, IMO the language tag terminology should replace the language code terminology. For the sake of brevity I used $tag everywhere, that's a mistake: I should have used $language_tag, since that's not ambiguous and IMO fairly understandable even for non-i18n people.

That said, we might want to change all the 'language' columns in 'language_tag' columns for consistency, since they are identifiers (we used to have a 'nid' column column to reference nodes and not a 'node' one), but that might be overkill since a language tag is far more readable than a nid. If I had to choose I'd be with Jose and leave language reference column names alone.

Perhaps we might want to make an exception for the {language} table and have a tag column there, since there the scope would be unambiguous. This would allow us to have $language->tag which would be shorter than what we have now.

Long story short: replace 'code' with 'tag' in #19.

I do see how the W3C standardized on this terminology, but have not seen it in implementations anywhere.

This is slightly misleading, but http://en.wikipedia.org/wiki/Locale says that Microsoft is standardizing on BCP47 for locale identifiers. POSIX platforms still go for the usual [language[_territory][.codeset] form for now.

We can of course be the leading force here if people agree, but we should accept the education component involved then IMHO.

Hey, we would not be switching to klingon :) I think 'language tag' is more or less as much understandable as 'language code'.

Edit: Obviously if we switch to tags we have to properly support them, validating user input for custom languages and changing the language tag column length and this can be annoying.

Jose Reyero’s picture

While I don't mind that much the terminology I think the issue of locale vs language is worth discussing, so I've created this new thread #1218650: Separate language and locale. Proper languages for content translation.

sun’s picture

Speaking of language and locale information on monolingual sites... or rather, using language/locale meta info for other purposes than t() string translation:

#1220964: Number field prefix/suffix get t()'ed through format_plural()

Gábor Hojtsy’s picture

As per @plach's suggestion in #11 and #12, that nobody disagreed with to my understanding, I've opened and started #1222194: Rename global $language to $language_interface which should at least help with coming clean with our language object names in #1215716: Introduce locale_language_save(). It does not yet solve the language identifier problem (that he suggests we call tag, Jose suggests we call locale). That would still nee to be resolved to be able to do a _load() API for languages, so we know how to call our identifier :) Would be great to get past on agreeing on the basics :) The underlying terminology for a language_load() should not be this hard, guys :)

Gábor Hojtsy’s picture

All right, based on discussion with @sun in IRC, I've marked #1215716: Introduce locale_language_save() and #1222194: Rename global $language to $language_interface both postponed on this one. Until we can figure out what to rename global $langauge to and how can we call the language identifier, we cannot code a clean language CRUD (there is no ID to call it with and $language is a reserved variable). Without a clear way to name our language ID, we cannot propagate that to other APIs in the system, and we clearly want to spread language awareness far and wide, so this is a fundamental thing we need to solve.

In my opinion, we don't need to agree on one set of names to set in store *forever*, I think we could just as well come up with unique names to use, which would (a) enable us to work on the real problems instead of pondering over naming schemes (b) have unique names to rename later, which would make our life much easier. Of course if/once we decide to rename the language column in schemas to something else, we are in for some upgrade code and if want to rename them again... That is why I started off with just renaming global $language in #1222194: Rename global $language to $language_interface, but @sun has a great point that we should try to not fragment the discussion.

catch’s picture

Apart from machine_name being Drupal jargon, is there a particular reason not to use that instead of $langcode/$language?

en, en-GB etc. should all be unique string identifiers, which is how machine_name works too in other places.

That would be inconsistent with other platforms, but seems very internally consistent within Drupal itself - sometimes we need to choose one or the other.

$langcode always seems like a natural abbreviation to me - like using $nid instead of $node_id, but I know we try to avoid in general and couldn't find evidence that anyone else uses it.

$tag - the main issue is confusion with tagging, but if it's part of a spec then that overrides the overloading (which is not a Drupal-specific issue at all).

I'd also probably be fine with $language_code as variable name and {language}.code as column name, but calling a db table 'code' seems a bit off.

plach’s picture

I'm ok with #28, and I'm not holding my position about language tags at any cost, it just looks like the best choice to me if we have to address this subject cleanly. However if we just want to proceed and perhaps revisit this later with clearer minds, I'm totally ok with retaining the 'code' terminology proposed in #19:

$language_code
$language->code (local object)
$language_suffix->code (global object)

No special preference about how the global ui language should be called, the only teo alternatives that come to my mind are:

$language_interface (which was in D7 core for a while)
$language_ui (which is shorter and perhaps more clear)

plach’s picture

Crossposted with @catch:

$tag - the main issue is confusion with tagging, but if it's part of a spec then that overrides the overloading (which is not a Drupal-specific issue at all).

Please note that the correct way to adhere to the spec is using $language_tag, which is longer but unambiguous.

catch’s picture

OK so for me:

machine_name - unless there's a specific reason not to, seems imperfect but workable.

$language_tag - it is good to comply with specs, and seems OK.

$locale - not keen on this so much, other projects are using it but do they have a whole module call locale?

$language->code, not keen on having {language}.code, {language}.{language_code} is a bit verbose, $language->code itself seems fine. That's minor nitpicking.

catch’s picture

Issue summary: View changes

Better summary

Gábor Hojtsy’s picture

@catch, @plach: let's consider how you'd call $node->language and $user->language as well. $node->code and $user->code does not work. Neither $node->tag or $user->tag. Ideally we'd had a complete name that we can put on (nearly) all objects in Drupal core to tell which language they are in (or in case of user prefer) as well as use internally in the language system. Qualified names like 'language_tag' or 'language_code' or 'language_id' work. We can call it different on the $language, however $node does not call ->nid or ->uid different to how $user calls ->uid or others call ->nid. I think for DX, we'd ideally find a name which can be used both internally in $langauge as well as externally as a property of other objects, and it would still be unambigous. This is atm ->languauge, which is part of the problem. Eg. we have $language->language. Machine_name is something that could not work as an external ID on other objects.

Where Drupal needs an externally referencable name for something global, it uses either uid, nid, tid (term), vid (vocabulary), etc. or as another more local example it uses type (node type) to reference a type name in node type tables and node tables. It does not have a globally referenced textual identifier kind of thing, that we could look at as an example. All the globally referenced identifiers like uid, nid, etc. are numbers and have these very short and unique names. Now we want to spread language support so we want to spread the language identifier even more so as a globally referenced identifier.

sun’s picture

Everyone, please note the updated issue summary. There's one essential question in it, namely: Is there a difference between global $language and a loaded $language? Aside from that:

AFAICS, we need at least a two-phase process; stop-gap fix in order to make progress with the other issues, and a more far-reaching long-term fix (i.e., major API change, possibly requiring schema changes + lots of lots of lots of code to be rewritten).

Stop-gap fix proposal:

- $langcode for language identifier
- $language for global $language, as well as a loaded $language
- $language->language (== $langcode)

$node->language
$node->language == $langcode
$language = language_load($langcode);
$language->language
$language->language == $langcode
global $language_interface

Still ambiguous regarding global $language and $language object, but at least no far-reaching API change for now.

Long-term fix:

Regarding DX, retaining the current 'language' schema columns and object properties might be favorable. It's easy/trivial to understand, and also makes sense as an $object->language property/value. You don't expect a language object there. And if you call something a "language" in written/spoken language, then you normally mean a language identifier. In other words, this would make some sense and resolve a big part of the inconsistency already:

$langcode => $language

But obviously, this leads to the question of how to name a language object then. No ideas for that. Except for perhaps explicitly using $language_content and $language_interface everywhere (...which, might, clarify some confusion even...)

We should revisit the "locale" proposal though. Recent discussions changed my perspective. While "locale" indeed summarizes many language and territory specific factors, these locale rules seem to map 1:1 to language codes, as defined in _locale_get_predefined_list(). In case multiple choices for individual locale rules are possible for a certain language, then those will be configurable on the language object. Just like date formats are bound to $langcode already.

So meanwhile, I could as well imagine the following long-term fix to work:

$node->locale
$language = language_load($locale);
$language->locale
global $language
Gábor Hojtsy’s picture

@sun: for local $language vs. global $language, the problem is you have this very nice and logical piece of code:

foreach ($languages as $language) {
  // do something with language
}

Now you realize you need to treat the current UI language different in the code and use it in some conditions for example:

global $language;
foreach ($languages as $language) {
  // do something with language
  // oops this overwrites global $language for the rest of the request
}

Now you either need to rename the global $langauge or the local $language. (Once again, this it the global $user vs. local $account thing, we use $account consistently for local users for the same reason - we need to use local users sometimes different to the global user). Of course global $language and local $language can be very different! Of course we have lots of code that iterates through languages, that use other languages which are not equal to the global (such as when sending emails), etc!

So what about?

$node->locale
$language = language_load($locale);
$language->locale
global $language_interface
sun’s picture

Status: Needs work » Needs review

Sure, the global $language_interface adjustment in #35 would work for me, too.

What do others think about this? Given an agreement, we could start to compile a list of @todos that would have to happen based on the proposal (ideally in the issue summary). But I don't want to start thinking about technical implementation specifics before reaching an agreement.

catch’s picture

Given so many projects use locale, I'd also be happy with:

$node->locale
$language = language_load($locale);
$language->locale
global $language_interface

Also the argument about locale and things like en-gb also make sense - not only spellings but date formats are different, however they're neither different languages nor dialects so what else do you call that?

catch’s picture

Issue summary: View changes

Updated issue summary.

wmostrey’s picture

Isn't that simply locale? We've worked on this before in #310520: Introduce country-appropriate date handling and #318008: locale based handling and rendering for example.

catch’s picture

Sorry that was a rhetorical question, that's why I think $locale is a decent choice.

The only problem with it is that 'locale' module doesn't actually deal with this stuff, but that's not a good reason to avoid using it here.

fgm’s picture

The global $language is one place where we could reuse the culture term since it has been avoided, and this is an fully loaded object with lots of locale/culture-related information, and this is not an unfamiliar terms for other uses cases beyond Drupal.

Also, there were several mentions of ISO639, but we need to take into account, should we want to use something like $iso639 instead of $langcode, that :

  • the familiar "en", "fr" langcodes are only one of the facets of the standard, which also defines other codes, (and specifically 3-character ones)
  • the language/locale to iso-639 mapping is not completely 1-to-1:
    • several spanish languages are mapped to ast (iso-639-[23]) and have no iso-639-1 code
    • some languages have more than one mapping, like armenian having both arm and hye for iso-639-[23], but only hy for 2-letter iso-639-1
    • all language without an official code are lumped together under mis in 639-3 and have no official 639-1 code
Gábor Hojtsy’s picture

@fgm: no, Drupal has long abandoned calling its language codes ISO-639 codes. This is the description on the Drupal 6 and 7 languge code field:

t('<a href="@rfc4646">RFC 4646</a> compliant language identifier. Language codes typically use a country code, and optionally, a script or regional variant name. <em>Examples: "en", "en-US" and "zh-Hant".</em>', array('@rfc4646' => 'http://www.ietf.org/rfc/rfc4646.txt'));

RFC 4646 is the latest "standard" for defining language codes (yes, RFC 4646 calls them language tags in concert with the W3C that considers RFC 4646 as the current up to date standard for language identifiers as @plach referred to).

sun’s picture

Briefly talked to @fgm in IRC in order to understand and clarify #40:

He thinks that Locale should denote an object having various properties to deal with specifics to a culture.

His proposal would thus be:

$node->language
$locale = language_load($language);
$locale->language
global $culture

---
My personal thoughts and feedback on that:

We additionally have content language (in $language_content), which is mainly about language, but might possibly also contain "culture" related locale information in the future. For example, content language applies to field values -- Input filters on text fields, as well as field formatters on number and currency fields require locale/culture information to format field values correctly. So while $locale as an object would make some sense to me, the $culture proposal does not.

andypost’s picture

subscribe

Gábor Hojtsy’s picture

I think $culture is way too ambitious. It is like if we call our views $worlds :) I think it is a few levels up and sideways from what we do.

wmostrey’s picture

Ok then I think we agree on what $language and $locale entail. I would propose this change though:

Instead of:

$node->locale // "en-gb"
$language = language_load($locale); // object containing all language information
$language->locale // "en-gb"

To have this:

$node->locale // "en-gb", could also be named ->lid
$locale = locale_load($node->locale); // object containing all locale information
$locale->language // "English" or "en"

And possibly:

$language = language_load($node->locale); // "English" or "en"

$locale would then contain variables like language, region, deciamel mark, date/time format, ...

The idea is that one specific locale will always contain the same values (so $locale->langue will always be the same), while one language can come from multiple locales (so $language->locale can be different whether the locale is en-gb, en-us or en-ca).

sun’s picture

The idea is that one specific locale will always contain the same values (so $locale->langue will always be the same), while one language can come from multiple locales (so $language->locale can be different whether the locale is en-gb, en-us or en-ca).

By normalizing language from locale (i.e., mapping multiple locales to one language [without territory/culture]), we'd essentially remove a currently existing feature; namely, being able to translate content into "localized language".

Might not make too much sense for American English, British English, and Canadian English; but I can only guess it makes a huge difference for localized languages like Simplified Chinese and Traditional Chinese, as also discussed in #1218650-2: Separate language and locale. Proper languages for content translation..

OTOH, I can see the point of setting up two different locales, say, en-us and en-gb, but only one English (en) language. That would allow you to use proper localized formats for dates, currencies, etc. for each locale, while only having one language to translate content into.

In turn, you could have only two language switcher links, say, English (en) and German (de), which may decide on interface and content language, but then you might be able to additionally use URL or browser language negotiation to show the interface and content in English (en), but everything that can be localized in the appropriate locale (en-us or en-gb).

But regardless of that, you'd need a default/standard locale per language, so in the end, that entire functionality would merely lead to extending the language/locale object:

$language->language = 'en'; // Pure convenience, not used anywhere.
$language->locale = 'en-us'; // Default language/locale.
$language->optionalLocales = array('en-gb'); // Alternative locales.
wmostrey’s picture

I like the idea of a default locale per language, with optional locales attached. I also believe this is a solution for #1218650: Separate language and locale. Proper languages for content translation., I'll follow up there.

So my vote also goes to

$node->locale
$language = language_load($locale);
$language->locale
global $language_interface
Gábor Hojtsy’s picture

Looks like we are getting agreement on

$node->locale
$language = language_load($locale);
$language->locale
global $language_interface

#1222194: Rename global $language to $language_interface deals with the last item, while the second is dealt with in a monster patch above. The other two will need massive schema changes, and should be in other (follow up) issues, once these two land I think.

plach’s picture

Status: Needs review » Needs work

Honestly I don't like where this discussion is heading, IMHO if we take the direction outlined in the latest posts as the long-term fix, we are going to create further confusion wrt terminology and inconsistency with official specifications.

What really annoys me of the latest proposal is using the term locale to identify a language:

$node->locale
$language = language_load($locale);
$language->locale

A locale is a totally different thing from a language, the fact that they may use a similiar identifier, just because both use a scheme involving geographic information, does not mean they can be used interchangeably.

Moreover locale and language identifiers don't always match, there are lots of attempts to classify them and language tags are only one of them, although apparently the most popular atm. Suffice to say the Posix systems use Locale identifiers that differ from language tags.

Apparently only Microsoft is using language tags (i.e. language identifers) to identify locales, which makes much more sense than the opposite.

This is the wikipedia definition of Locale (emphasis mine):

In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language identifier and a region identifier.

I cannot see this apply to nodes in any way: would that imply that a node has a country, a date format or a currency? Nodes, like any other content, have a language not a locale, we cannot make things this confusing to solve an ambiguity in our current terminology.

If we want to introduce locale support in core (and I would be totally fine with that), we should have a global $locale object (or any context-provided equivalent) holding all the properties that a locale is supposed to provide, including language. Identifying locales through languages would make much more sense to me, since the language is perhaps the most characteristic aspect of a locale (and that's why probably their identifiers tend to be so similar).

This would lead to something like:

<?php
$locale = locale_load($language_tag); // or
$locale = locale_load($language_code); // and then

$locale->language->language_code
$locale->language->prefix
$locale->timezone
// ...
?>

I'm not stressing on the language tag terminology anymore, although it's the currently recommended standard, because, as the wikipedia article above points out, language code is a more general terminology that include also language tags.

Another thing I'd like to outline is that locale pertains to user interface, hence in the scenario above I'd imagine the $locale global to be initialized from interface language:

<?php
global $language_interface;
$locale = locale_load($language_interface->language_code);
?>

About what @sun said in #42:

We additionally have content language (in $language_content), which is mainly about language, but might possibly also contain "culture" related locale information in the future. For example, content language applies to field values -- Input filters on text fields, as well as field formatters on number and currency fields require locale/culture information to format field values correctly.

This does not make any sense to me: content language should be used only to select which content has to be displayed, in the case of a currency field we should be using content language to determine the amount to display, while interface language should be used to choose the currency to be used (if not explictly negotiated).

About the need of sharing content among different languages, I'd like to quote what I was saying in the related issue:

I'd argue that since two languages albeit very similar might actually differ in some forms (I'm not taking into consideration the scenario in which different content has to be provided for different locales), this is a degnerated case of fall back: if I understand italian perfectly and I understand (british, american, whatever) english reasonably well, I might want to read content in english if not available in italian. With the same scenario in mind, an editor not wishing to provide a "british english" version of an "american english" content (for obvious reasons) might want both locales to access it: an american user would see it "natively", while an english user would see it because some smart fallback rule decided that was the right content to display.

British english and American english are different languages if we are using a language identifer that involves geographic areas. If the scenario above is the exception, IMO it should be solved with the fallback approach or some advanced content language detection method, if it's the rule the language identifier is simply too granular.
Alternatively this could be solved by having the ability to enable different languages for each language type, we then might be able to say that our interface languages are [en-US, en-GB, es-ES, es-AR] and our content languages are [en, es].

This makes me think that if we introduce locale support we might simply want to drop the interface language and merge it into the $locale object.

To sum up, IMO the way to go should be:

$node->language_tag // or language_code
$language = language_load($language_tag);
$language->language_tag
global $language_interface

we might also have:

$node->language->language_tag

since in many places now we require a language object and not a language code.

plach’s picture

After reading #49 again, also

<?php
$locale = locale_load($language_tag); // or
$locale = locale_load($language_code); // and then
?>

does not make much sense to me, since language identifiers might not hold a country identifier. If we have locale support we must use separate locale identifers. However those could be determined through the same information used to negotiate language.

gbentley’s picture

Whilst discussing semantics, remember what many developers want: the ability to create multi-lingual sites with specific content for particular country/language combos. I can do this simply in eZ Publish because they chose a solution, and it works. I'd like to do the same in Drupal.

Gábor Hojtsy’s picture

@gbentely: you've stumbled into a low level API discussion, and these type of discussions are needed to solve pressing developer experience problems. That does not make higher level (real user facing) issues less relevant at all.

@plach: in short, you are saying that most other system listed in #18 (ezPublish, Wordpress, CakePHP, Symphony, etc) do it wrong entirely by calling their language codes locales? We've already discussed above that $locale would be technically wrong for a language identifier/tag/code, since it is in fact a *locale identifier* not a whole *locale* data set per say. I think your two layered language/locale architecture looks interesting but is not something *I* was planning to work on for Drupal 8. I think we have a lot more pressing problems in terms of content translation and especially configuration translation, that should be our larger focus, and *I* did not plan on a complete rearchitecture of how Drupal handles locale/language specific information. Do you? (Judging by how slow we are progressing with even these basic low level things, I'm really trying to be conservative about the goals and try and focus on the real user facing problems that served the reason for this initiative).

The basic reason I've opened this issue is that we call language codes, langauge objects and the global language object all $language. I'm trying to find a terminology fix that makes sense *in this framework*, so we can move on to the still less interesting problems of no clean APIs to work with these objects and then the most interesting problems of language storage and translation for all kinds of Drupal data, which is the primary goal of the initiative to solve.

Of course I'm not a boss to enforce people to work on stuff, so I'm not going to tell anybody not to work on exciting improvements not necessarily in the plans if they want to. (This makes it hard to publish *the* plan, since it is being formed by what people actually will end up working on). Are you interesting in solving the two layered language/locale issue as you've explained, and how can we work together on that? If this is more like a conceptual suggestion from you, how can that be brought closer to the reality of the current framework and to the topic of this discussion to find better terminology for the pieces?

The systems cited in #18 are clearly not afraid of calling their language/locale identifiers 'locales' (and then mostly not supporting any details from the definition of locale, like currencies or date formats). Should we be?

plach’s picture

@Gábor Hojtsy:

in short, you are saying that most other system listed in #18 (ezPublish, Wordpress, CakePHP, Symphony, etc) do it wrong entirely by calling their language codes locales?

I'm not an authorirty in this matter but from the documentation I read this is exactly my position. It may be silly or wrong, but I did not read a single line to prove that yet.

I think we have a lot more pressing problems in terms of content translation and especially configuration translation, that should be our larger focus

I think we agreed to have a short-term a fix to allow us to go on and long-term fix to solve things in a consistent way. IMHO #48 does not fly as a long term fix, and I cannot see how it can be a short term fix, so that's what I'm discussing.

However I'm not so interested in this issue to make everyone waste their time on it, I just meant to post my feedback since I'm listed in the maintainers list for this very subject. I'm sorry I cannot agree, but I don't think this should block development if everybody else do.

A stop gap fix could be:

$node->language
$language = language_load($langcode);
$language->language
global $language_interface

Any other clean up could wait for a more defined context.

fago’s picture

$uid     => {language identifier}
$user    => {global language}
$account => {a language}

I don't think this is an example we should aim for. Imo the user example is not at all sound and consistent either. The entity type is 'user' but instances are $account, still there is a certain special instance we call $user (but is not really a regular user entity..) .. wtf?

We should not call the variable different just because it is the global one. I know why we do that, but wouldn't the need for "global $foo" go away anyway with the context initiative? So, imo every language object should be just called $language.

For now we could just go and use $GLOBAL['language'] as we have it already, but invent a simple getter function to avoid variable naming issues, e.g. $language = language_global();. That way overwriting $language would not overwrite $GLOBAL['language'] + we have a name less to care for + 1 inconsistency less.

Then, for $language I think we should just follow the pattern we use elsewhere. Imho we have two patterns:

1. Invent abbreviations for properties (nid, uid, ..)
-> The same way we could just invent langcode (as we partially have already done) and use $language->langcode and $langcode - just as we use $node->nid and $nid.

2. Use a general property name ('name') + prefix it if used stand-alone
We have some properties named generally, like vocabulary 'machine_name', or term 'name'. Once we are passing them as separate variable we usually prefix the "container-name", like $vocabulary_machine_name or $term_name.
-> In case languages I think the general property name should be probably 'code' not 'name. So what about using $language->code and $language_code if used standalone.

Personally, I'd not mind if we have language codes that look like locale identifier. If they are really just used as language identifiers, we should call it that way. If we want to support more localization options (decimal separation, ..?), we might want to call the whole object $locale though ($locale->code and $locale_code?). Mixing $language and $locale doesn't make much sense to me.

Gábor Hojtsy’s picture

I still think having $language->language == $langcode looks crap (and it will keep us debating when to call it language and when to call it langcode). Is that really the best we can do in Drupal 8?

catch’s picture

Using GLOBALS to avoid the collision is a good suggestion, much prefer that to declaring global $foo and we already use that some places on core, also that the global should go away with context object.

plach’s picture

+1 for #54 (especially point 2)

Gábor Hojtsy’s picture

@fago, @catch, @plach: exactly on the assumption that globals will be replaced by context object values and will work much different, my view is that we should get rid of $language as a reserved word for now, rename it to something that makes sense for the global scope and then leave its integration and rework in contexts to the context initiative flow. By freeing up $language to be used at other places, we can freely use that variable name at places. #1215716: Introduce locale_language_save() is full of places where this makes sense. I have not seen $GLOBAL['user'] or $user = user_get_global(); as a pattern in core, can someone show where is that introduced? I don't need a fight to fight here, since the only thing we need to do there is to free up our namespace, and I don't think we care about the exact details for now, since we all assume it will just go away later with context. It is all about enabling us to use $language as a name in the meantime, so we don't need to wait for them. Looks like a simple global rename is the quickest path to there, no?

plach’s picture

Totally agreed.

Let's recap pros and cons of what has been proposed in #54 corrected with #58:

Language identifier
We choose the 'code' terminology, so:
$language->code
$language_code
$language->code == $language_code
$node->language_code == $language->code

Pros: it's easy to read, it's not too long and close to what we have now.
Cons: it introduces a difference between the object field and the variable name, although globally they look very similar. @fago points out that we have similar cases in core already. Node syntax is not very friendly but neither $node->nid is after all. Personally I'd be ok even with:

$node->language == $language->code // more readable but less consistent and ambiguous
$node->language->code == $language->code // more consistent but more verbose
Local object
We keep the $language form.
Pros: This is perfect, exactly what we want. Everyone is happy, angels sing and so on.
Cons: no one known.
Global object
Short term fix: rename it $language_interface. Long term fix: get it from the request context.
Pros: it's consistent with the other language globals.
Cons: It's ver long (who really cares?).

Pros and cons of what has been proposed in #48:

Language identifier
We choose the 'locale' terminology, so:
$language->locale
$locale
$language->locale == $locale
$node->locale == $language->locale

Pros: it's easy to read, it's short and very consistent.
Cons: it might clash with a (local or global) $locale object, if we end up introducing it; it's misleading and error prone: locale identifiers are required to have a region identifer (at least by reading the wikipedia definition), where language identifers are not.

The rest is the same.

plach’s picture

IMHO

It is all about enabling us to use $language as a name in the meantime, so we don't need to wait for them. Looks like a simple global rename is the quickest path to there, no?

should be enough to unblock the language CRUD issue. The language identifier issue is minor compared to the global one.

fago’s picture

>exactly on the assumption that globals will be replaced by context object values and will work much different, my view is that we should get rid of $language as a reserved word for now, rename it to something that makes sense for the global scope and then leave its integration and rework in contexts to the context initiative flow.

Yes, but the point is we do not need to discuss and come up with another name for the global as it's about to go away anyway. Also having another name for it because the object stems from $GLOBALS is just silly and something inconsistent new developers need to get used first.

>I have not seen $GLOBAL['user'] or $user = user_get_global(); as a pattern in core,...

Me neither. But we could use it until $context arrives, so we can save the whole discussion about the alternative name of $GLOBALS['language']. Search/replace "global $language" with "$language = $GLOBALS['language'];", demonize the use of "global $language" and $language got freed. I'd personally favour a helper like $language = language_get_global()" or maybe language_current() though.

>I still think having $language->language == $langcode looks crap..
I haven't suggested that. The variant 1 I outlined was using $language->langcode and $langcode, see #54.

catch’s picture

catch@catch-laptop:~/www/8$ grep -r "GLOBALS\['user" *

includes/common.inc:  $GLOBALS['user'] = drupal_anonymous_user();
includes/common.inc:  $GLOBALS['user'] = $original_user;
includes/form.inc:    if ($GLOBALS['user']->uid) {
modules/tracker/tracker.module:  return $account->uid && ($GLOBALS['user']->uid == $account->uid) && user_access('access content');
modules/user/user.tokens.inc:    $account = user_load($GLOBALS['user']->uid);
modules/user/user.test:    $global_account = user_load($GLOBALS['user']->uid);
modules/user/user.module:        if ($account->uid == $GLOBALS['user']->uid) {
modules/user/user.module:      if ($GLOBALS['user']->uid) {
modules/user/user.module:  return !$GLOBALS['user']->uid || !empty($GLOBALS['menu_admin']);
modules/user/user.module:  return (bool) $GLOBALS['user']->uid;
modules/user/user.module:    if ($GLOBALS['user']->uid == $uid || user_access('administer users')) {
modules/user/user.module:  return (($GLOBALS['user']->uid == $account->uid) || user_access('administer users')) && $account->uid > 0;
modules/user/user.module:  return ((($GLOBALS['user']->uid == $account->uid) && user_access('cancel account')) || user_access('administer users')) && $account->uid > 0;
modules/user/user.module:      drupal_goto('user/' . $GLOBALS['user']->uid . '/edit');
modules/user/user.module:    $uid = $GLOBALS['user']->uid;
modules/user/user.module:  return empty($arg) || $arg == '%' ? $GLOBALS['user']->uid : $arg;
modules/user/user.module:    $uid = $GLOBALS['user']->uid;
modules/node/node.admin.inc:    if (user_access('view own unpublished content') && $own_unpublished = db_query('SELECT nid FROM {node} WHERE uid = :uid AND status = :status', array(':uid' => $GLOBALS['user']->uid, ':status' => 0))->fetchCol()) {
modules/node/node.module:    if (user_access('view own unpublished content') && $own_unpublished = db_query('SELECT nid FROM {node} WHERE uid = :uid AND status = :status', array(':uid' => $GLOBALS['user']->uid, ':status' => NODE_NOT_PUBLISHED))->fetchCol()) {
modules/node/node.module:    $account = $GLOBALS['user'];
modules/profile/profile.module:  if (user_access('administer users') || $GLOBALS['user']->uid == $account->uid) {

catch@catch-laptop:~/www/8$ grep -r "GLOBALS\['lang" *
includes/common.inc.orig:    $langcode = $GLOBALS['language_content']->language;
includes/theme.inc:  $variables['language']          = $GLOBALS['language'];
includes/theme.inc:  $variables['language']->dir     = $GLOBALS['language']->direction ? 'rtl' : 'ltr';
includes/theme.inc:  $variables['language']          = $GLOBALS['language'];
includes/theme.inc:  $variables['language']->dir     = $GLOBALS['language']->direction ? 'rtl' : 'ltr';
includes/theme.inc:  $language = isset($GLOBALS['language']) ? $GLOBALS['language'] : language_default();
includes/common.inc:    $langcode = $GLOBALS['language_content']->language;
includes/menu.inc:  $cid = 'links:' . $menu_name . ':all:' . $mlid . ':' . $GLOBALS['language']->language . ':' . (int) $max_depth;
includes/menu.inc:    $cid = 'links:' . $menu_name . ':page:' . $item['href'] . ':' . $GLOBALS['language']->language . ':' . (int) $item['access'] . ':' . (int) $max_depth;
includes/menu.inc:  $tree_cid = 'links:' . $menu_name . ':tree-data:' . $GLOBALS['language']->language . ':' . hash('sha256', serialize($parameters));
includes/menu.inc.orig:  $cid = 'links:' . $menu_name . ':all:' . $mlid . ':' . $GLOBALS['language']->language . ':' . (int) $max_depth;
includes/menu.inc.orig:    $cid = 'links:' . $menu_name . ':page:' . $item['href'] . ':' . $GLOBALS['language']->language . ':' . (int) $item['access'] . ':' . (int) $max_depth;
includes/menu.inc.orig:  $tree_cid = 'links:' . $menu_name . ':tree-data:' . $GLOBALS['language']->language . ':' . hash('sha256', serialize($parameters));
modules/comment/comment.module:    $langcode = $GLOBALS['language_content']->language;
modules/comment/comment.module:    $langcode = $GLOBALS['language_content']->language;
modules/user/user.module:    $langcode = $GLOBALS['language_content']->language;
modules/user/user.module:    $langcode = $GLOBALS['language_content']->language;
modules/taxonomy/taxonomy.module.orig:    $langcode = $GLOBALS['language_content']->language;
modules/taxonomy/taxonomy.module:    $langcode = $GLOBALS['language_content']->language;
modules/node/node.module:  $cid = 'node_types:' . $GLOBALS['language']->language;
modules/node/node.module:    $langcode = $GLOBALS['language_content']->language;
modules/node/node.module:    $langcode = $GLOBALS['language_content']->language;
Gábor Hojtsy’s picture

@fago: well, if we rename global $language, we need to teach people its new name; if we make up a new mandatory way to bring global $language to the local scope, we ned to teach people that. It is a change either way. By renaming it, people are forced to act on it, while just telling them not to use it as they used to with other objects will either fly or not.

I'm trying to aim for 3 consistencies here: (1) internal consistency in locale module/language system (2) consistency with other subsystems in Drupal (3) consistency with other systems outside Drupal. Looks like (3) will not be achieved much since all our competition uses $locale, and that did not fare well here. All right. Now we have (1) and (2) then.

The language subsystem has language globals for language types, and $langauge is just one of those types. Depending on whether you need language for a content item, a link or an interface piece, you should use the right $language_* value and NOT $language. The sole reason the global $language is called language is legacy and it gives you false comfort that you actually take language into account if you just rely on that. From the top of bootstrap.inc:

/**
 * The type of language used to define the content language.
 */
define('LANGUAGE_TYPE_CONTENT', 'language_content');

/**
 * The type of language used to select the user interface.
 */
define('LANGUAGE_TYPE_INTERFACE', 'language');

/**
 * The type of language used for URLs.
 */
define('LANGUAGE_TYPE_URL', 'language_url');

For internal consistency within the language system, we'd not have one designated language type as *the* language, would we? Drupal 6 only supported one type of language, while Drupal 7 supports 3 built-in (and any number of others can be defined by contrib). Drupal core uses the right language based on content or url (see catch's grep above). Why not make this consistent?

sun’s picture

Realized that #34 didn't contain a code example mimicking the short-term proposal. Updated that with:

$node->language
$node->language == $langcode
$language = language_load($langcode);
$language->language
$language->language == $langcode
global $language_interface

So regarding the stop-gap fix, we still seem to be in line; @plach stated the identical code in #53. It allows us to move forward with all of the currently postponed issues for now.

Regarding the long-term proposal, it looks like our major pain point is the language identifier name.

  • The name of the language identifier has a direct impact on the names of objects, globals, properties, and schema columns, since there is already a name clash between language identifier and aforementioned other things, and any change to the language identifier name will lead to name clashes, too.
  • We therefore would have to figure out first what the proper name for the language identifier should be. This cannot be discussed and decided isolated on its own, since names like "locale" potentially imply a whole different meaning, preventing us from actually implementing and using that other meaning in the future (without renaming everything once again).

I'll try to capture this as well as #59 in a cleaner way in the issue summary.

Gábor Hojtsy’s picture

@fago: can you agree on global $language_interface based on the above? If I read right, others would be comfortable with that, and that is going to go under the context initiative later anyway with the other two built in $language_* types and any others defined by contrib. (I assume they better have a unique name for each language type there anyway or lump it under a language "collection" with individual pieces but that is way above this discussion :).

Gábor Hojtsy’s picture

@sun: ok, then the suggested plan is that language code is *always* $langcode if a standalone variable, however if it is a property of an object or an element in an array, it is ->language and 'language' respectively. There are no places in core where we need to have a language object as a property of another object or a keyed element of an array or we just don't want to think about that for the stop-gap scope?

fago’s picture

@#65:
I don't think it makes sense to rename it just for the cause of avoid the "name-clash" when using "global $language", which I thought was the primary reason for re-naming. But given we already have three global languages it probably helps cleaning that up anyway. So I'm fine that too.

ad #64:
I think it's fine to have $node->language containing just the language identifier, compared to $comment->pid,cid,uid,nid,.. it's much more descriptive. I could see us going with that way with other entity properties too, e.g. $comment->node containing a nid + an easy way to access loaded objects.

Still, we should avoid $language->language though as it makes it totally unclear whether a $language parameter is the object or the id... The $language->code and $language_code variant makes that clear.

plach’s picture

@fago:

Still, we should avoid $language->language though as it makes it totally unclear whether a $language parameter is the object or the id...

I think everybody here agree this is not ideal, are you talking about the stop-gap fix or the long-term fix? Because fixing it sounds very long-termish ;)

sun’s picture

Still, we should avoid $language->language though as it makes it totally unclear whether a $language parameter is the object or the id...

Sorry for leaving that out, but the stop-gap fix proposal would potentially also involve to possibly fix all instances of $language throughout Drupal core, which actually contain a $langcode (and not a language object). I'm mainly thinking of hook and callback function signatures here.

And now that you've read that paragraph, you might wonder about the extreme subjunctive wording. While I'd personally highly welcome if someone would run over Drupal core and change all $language to $langcode where no actual $language is passed, the stop-gap fix proposal primarily intends to resolve the variable naming problem for code that is being touched in the list of related and currently blocked/postponed issues.

In other words: An agreement on the stop-gap fix proposal implies a new rule for all upcoming core patches (from now on) to use $langcode and $language appropriately in code that is being changed. Violation of this rule means "needs work" (with a pointer to this issue). However, it does not necessarily mean we have to make all of core consistent immediately.

fago’s picture

I think everybody here agree this is not ideal, are you talking about the stop-gap fix or the long-term fix? Because fixing it sounds very long-termish ;)

I think that's something we need to address in the long-term, as changing it too often probably just creates unnecessary headaches.

#69 sounds like a good way to proceed.

plach’s picture

I'm ok with #69 too.

wmostrey’s picture

#69 seems like a very good plan, agreed.

Gábor Hojtsy’s picture

All right, since we clearly have agreement on that, I've reopened #1222194: Rename global $language to $language_interface for continuing work on the global $language then.

Note that while I was loooking through the 'language' values in code, I stumbled into:

- url(), l(), token_replace(), etc. take $options as 'language' => [a language object]
- t(), field storage, date formats, etc use 'language => [a language code]

You clearly need a reference sheet to be able to tell you need to do url(..., array('language' => $langauge)) but t(....., array('language' => $langcode)).

Gábor Hojtsy’s picture

Two other less common examples: hook_mail() gets language object in $message['language'] vs. drupal_get_path_alias() takes language code in $path_language.

Gábor Hojtsy’s picture

Since there is such wide agreement on #64/#69, I went ahead and put that into practice in #1222194: Rename global $language to $language_interface and #1215716: Introduce locale_language_save(). I think it shows some of the weaknesses of the stop-gap plan though. Some examples from the patched code:

Langcode becomes language for data used in url():

  private function checkUrl($langcode, $message1, $message2) {
    $options = array('language' => $langcode);

Langcode gets value from a LANGUAGE_* constant:

  $langcode_none = LANGUAGE_NONE;

Language gets value from langcode again:

    $language = (object) array(
      'language' => $langcode,
    );
    locale_language_save($language);

String is $language_ while value is $langcode_ (would make it awkward for _string values to be $langcode_ wouldn't it?):

    $language_browser_fallback_string = "In $langcode_browser_fallback In $langcode_browser_fallback In $langcode_browser_fallback";
    $language_string = "In $langcode In $langcode In $langcode";

The ultimate question is really how far deep langcode terminology should go? Constants like LANGUAGE_NONE became LANGCODE_NONE or LANGUAGE_LANGCODE_NONE or what? Option keys like in url()? The language object property? Other object properties? I think the stop-gap proposal we have so far is really just scratching the surface and still does not make the langcode/language thing clear.

Gábor Hojtsy’s picture

I think my last 3 comments highlighted proper that the stop-gap idea to try and distinguish between 'language' as sometimes the language code and sometimes a language object in the API and schema is not really doable. I don't think its a good short term goal, since solving the inconsistencies with language/langcode requires massive changes, and doing that to an interim terminology does not work.

Drupal used the 'locale' terminology for language code up to and including Drupal 5 as both the global $locale and the schema column locale, like all the other competitors do that I could find as listed above. See http://drupalcode.org/project/drupal.git/blob/refs/heads/5.x:/includes/c... and http://drupalcode.org/project/drupal.git/blob/refs/heads/5.x:/modules/lo...

Drupal 6 introduced this indecision with 'language' and 'langcode' in the APIs and schemas, trying to avoid 'locale' for some reason. Minus the installer, which still has $locale for language codes even in Drupal 8, see http://drupalcode.org/project/drupal.git/blob/refs/heads/8.x:/includes/i... as well as install_find_locale(), install_select_locale(), etc, see http://drupalcode.org/project/drupal.git/blob/refs/heads/8.x:/includes/i...

In a very funny way (not that I have the nerve to ROTFL in this discussion), the installer in fact uses $locales and $locale not just for language codes but sometimes for a list or a piece of custom file information object type as well (partly retuned from file_scan_directory() then partly mangled), which then however does have a $locale->langcode property to make it even more fun. I'm feeling that is a little bit ironic.

And yes, I understand I was a key guy in introducing this $%#!, and I'm really sorry, I'm trying really hard to figure out a better way as you can see, let's collaborate on that!

Gábor Hojtsy’s picture

Sorry for the "post-bombs", but I'm just gathering more info as I go along and want to preserve the examples for future cleanups. Some more interesting things I found while grepping for locale, langcode and language:

- the locale UI translation code also uses $locales and $locale for a totally unrelated data type that we have not yet mentioned here, UI translations for specific strings
- _field_invoke_multiple() has this very interesting piece of code: foreach ($languages as $langcode)
- field_attach_form(), etc. take $langcode but then generate a structure with '#language' => $langcode
- Similarly field language context takes a $langcode but uses it in 'language', and returns a langcode from the variable $display_language (which variable name is also used at other places to denote a langcode)
- field_valid_language(), field_available_languages() return a langcode / langcodes
- locale_date_format_form() uses 'language' for display of a language name(!) in the form
- user's get a ['locale']['language'] form item to select a language
- system module uses 'locale' in its form to set the site default country and first day of week (!)
- node_view(), user_view(), etc add '#language' => $langcode to the built array
- t() does take options as array('langcode' => $langcode), while url() takes array('language' => $language_object) (contrary to what I said above)
- bootsrap.inc does use $lang at places for a language object
- installer, locale module uses $languages to designate a custom data structure returned by _locale_get_predefined_list()
- language_from_default() and other language providers return/provide language codes, not language objects
- _locale_translate_language_list() uses $limit_language as a langcode
- locale_get_localized_date_format() uses $languages as a list of langcodes and $language as langcode
- book-export-html.tpl.php actually misdocuments that it has $language as a langcode, it has it as an object
- in _field_language_suggestion($available_languages, $language_suggestion, $field_name), the first two arguments are an array of langcodes and a langcode

(I wanted to put these into a table with usage and variable name axes but it gets pretty complex fast, so does not look feasible). In short Drupal uses $locale, $locales, $language, $languages, $langcode, array('language' => ...) and $obj->language and $obj->langcode all to designate a language code or language codes at different places.

Gábor Hojtsy’s picture

For the record, Roger Pfaff from Munich suggested (at https://twitter.com/#!/rogerpfaff/status/96211010691661824) that we name these $language_isocode-s. I've pointed out that these are not ISO codes, as our language codes are locale identifiers / language tags possibly containing the language name AND the country code. Its not an ISO code. The file name includes/iso.inc and the code comments on http://api.drupal.org/api/drupal/includes--iso.inc/function/_locale_get_... are very misleading on this matter. Those should not be trusted and will need to change in Drupal 8. Submitted an issue for that at #1231402: Drupal does not use ISO language codes, iso.inc is misleading.

plach’s picture

It seems our double-fix strategy is not performing that good, is it under discussion again? Gabor, are you suggesting to find a unique definitive (OMG:) fix?

In this light I have a proposal that at first might sound silly, but perhaps could solve many of the issues reported in #75 and #77:

<?php
$node->language
$node->language == $language
$language_data = language_data_load($language); // or $language_object or $language_properties or anything you like better :)
$language_data->language
$language_data->language == $language
global $language_interface // this might be a little inconsistent, but it should be only a temporary fix
?>

Let's RRRUMBLE again!

plach’s picture

Another one, not sure it'll work :)

<?php
class Language {
  protected $code;
  protected $native;
  // ...

  function __construct($language_code) {
    $this->code = $language_code;
    // ...
  }

  function __toString() {
    return $this->code;
  }
}

$language = new Language($language_code)
$node->language == $language
?>
Gábor Hojtsy’s picture

@plach: what's the plural for $language_data? $lanuage_datas?

$language_datas = locale_language_data_load_multiple(...);

(Note that right now locale manages languages, so language loading will/should be in the locale_* namespace).

plach’s picture

F**k, $data is already plural, it's from latin (singular datum, plural data) :(

Aside from this detail, do you like the idea of renaming local objects? If so we can try to find another name.

Gábor Hojtsy’s picture

@plach: yeah, although it feels natural to name the object $language and find a good name we can use universally for language identifiers (lid, langtag, langcode, language_tag, language_code, locale, etc. were suggested above). However, changing language identifier terminology will probably mean LOTS more API changes and especially LOTS of schema changes. If we change the name for the object, the schema is left untouched and lots of the API will stay intact. That's a huge plus, if we can have a good name for the object that is :) We derive names from the object name though, $language_* globals are derived and will be misleading if not renamed to use the object name then (such as $language_data_content instead of $language_content if it would be a good name :). This probably gives a perspective as to what environment should the object name stand.

plach’s picture

We derive names from the object name though, $language_* globals are derived and will be misleading if not renamed to use the object name then (such as $language_data_content instead of $language_content if it would be a good name :). This probably gives a perspective as to what environment should the object name stand.

Yes, this is what I meant with inconsistency in #79, but can't we assume that those globals will go away with the WSSCI? I can imagine the following scenario:

<?php
$language_data = context()->get(LANGUAGE_TYPE_INTERFACE);
?>

What about $language_info?

Gábor Hojtsy’s picture

Well, thinking more of it and stepping back a bit, if we name the language identifier THE language, then things like "name of the language" sound pretty awkward ($language_name and such would become $language_info_name?). Once again, we never have these questions for nodes, terms, users or whatever because we have distinct names for their pieces. We always know if we encounter a $node or a #node that it is a node object, and never, never ever something else. Why is this so hard for languages? Are we too purists? We have clearly valid reasons against all proposals :)

I don't see how we can have $language itself as the language identifier and then derive $language_X from that, since language identifier itself derives from the language concept. $language (object/concept) is broken down to $language_code ($language_id, $language_tag, $language_locale, whatever, but NOT $language_language and NOT $language), $language_name, $language_direction, etc, like $node_title and such appear in templates, tokens, etc. Let's keep those in mind! Language identifier is a sub-concept of language. Users, nodes, etc. do that with $uid, $nid, etc, expressing they are identifiers and sub-user, sub-node, etc.

$language_SOMETHING, $langSOMETHING, $lSOMETHING would express this hierarchy, this derived meaning. Drupal usually does not do abbreviation like $langX, you don't see $entid for an entitity ID or $usrname for a username, but we use $langcode extensively, so let's keep that a possibility. So applying the ideas so far to this concept:

code tag locale id
Notes Strictly speaking not correct terminology, our language IDs are not just language codes, depending on how you define code Correct terminology according to W3C/IETF, can be confusing with HTML tags, not used much in industry Not correct terminology, locale itself is a concept that can have an ID; many competitors use this and D5 used this though for the ID itself Not correct terminology at least per similar Drupalisms, because language ids are not numbers
$language Long, looks odd on $language object ($language->language_code) $language_code $language_tag $language_locale $language_id
$lang Drupal does not do abbreviations like this at other places $langcode $langtag $langlocale $langid
$l Might want to avoid this for a $locale object concept in the future (that could be $l as well) $lcode $ltag $llocale $lid
Standalone Only works if you have a good standalone name N/A N/A $locale N/A

The more I think about applying a new name to the language concept, the more it looks like an ever bigger change, since then we'd need to change everything that derives from that, while for language identifier, we "only" need to change those things that use/derive from that.

My favorites in each column are $lcode, $ltag, $locale and $lid respectively, depending on which compromise do we make (based on the other Drupalisms $nid, $uid, $rid, $tid, etc, which you all need to learn working with Drupal, so $lX would not be a stretch at all).

plach’s picture

Perhaps this needs a feedback from Dries? I feel rather stuck :(

webchick’s picture

Something that would help on my side is if the issue summary could be edited with an example of what each of these things would output through echo/var_dump().

webchick’s picture

Issue summary: View changes

Updated issue summary.

Gábor Hojtsy’s picture

Issue summary: View changes

Update with more explanation

Gábor Hojtsy’s picture

@webchick: attempted to update. In short, we are looking for names for two things. An object as loaded based on a record from the {language} table and the identifier of that object. Similar pairs include $node - $nid, $user - $uid, etc. (Enough to assume for this that $node and $user are objects of records coming off of the node and user tables). The matrix in #85 outlines possibilities for identifiers of that object if we assume the object is $language. The main problem is that we use 'language' interchangeably on object properties, in arrays, on form API elements, in field API etc. for either the object or the identifier of the object (and we less often use $locale as well for both of those).

plach’s picture

My favorites in each column are $lcode, $ltag, $locale and $lid respectively, depending on which compromise do we make (based on the other Drupalisms $nid, $uid, $rid, $tid, etc, which you all need to learn working with Drupal, so $lX would not be a stretch at all).

About $locale I already said what I think, about $lx we'd have:

<?php
function field_attach_view($entity_type, $entity, $view_mode, $lcode = NULL) // or
function field_attach_view($entity_type, $entity, $view_mode, $ltag = NULL) // or
function field_attach_view($entity_type, $entity, $view_mode, $lid = NULL)

// and

$node->lcode == $lcode
$node->ltag == $ltag
$node->lid == $lid
?>

I can already hear @chx screaming ;)

I'm afraid I'm still (very personally) leaning towards longer but more readable schemes such as:

$language->code == $language_code
$node->language_code == $language_code
function field_attach_view($entity_type, $entity, $view_mode, $language_code = NULL)
$form['#language_code']

for the record tag is still my favorite terminology but this discussion is exhausting me, at this point I'll gladly accept any choice with joy (except for $locale ones :).

Gábor Hojtsy’s picture

@plach: I don't think @chx is screaming about $nid, $uid, $mlid (menu link id), or any of the other Drupalisms (or we should call out @chx on this). I don't think we should stop introducing Drupalisms ($langcode is/was/going to be a Drupalism as well). $user does not have a $user->user_identifier or $user->user_id either. I think its just as much DX to apply Drupalisms similar as in other areas of the system for consistency while in parallel trying to look for human understandable names. Most of the object identifiers are not human understandable as-is in Drupal, yet widely used in the system.

webchick’s picture

(Note: I haven't read this full discussion, just the issue summary and the #85. This is "by design" because I'm trying hard not to be influenced.)

As a general Drupalist, and not a Drupal language API expert, this is what makes the most sense to me.

$language: This is always the full language object, basically a SELECT * FROM language.

$language->lid: The primary key of the language table ('en' or 'de')
$node->lid: The language assigned to this node ('en')
$user->lid: The language assigned to this user ('de')

(I realize this would mean changing the intuitively-named 'language' column in the user/node tables to the Drupalism-named 'lid' column, but again the key is consistency.)

Xid is what we (almost) always use to refer to a table's primary key, or a foreign key to another table. user.uid, node.nid, node_revision.nid. This same pattern is used pretty much everywhere. I think above there were concerns that Xid is usually autoincrement, numeric primary key, but this isn't always the case. For example, the cache tables use 'cid' and this is a string like 'image_styles'.

$llid, $ltype is not a pattern used anywhere. Neither is $langauge_foo, $language_bar, $language_the_other_thing. We do $node->nid, $node->name, etc. when we want to refer to specific properties.

So yes, I'd prefer we stick with well-established Drupalisms here, rather than trying to introduce a new pattern which might arguably be more intuitive for newcomers to Drupal, but would be non-standard to everything else and would cause newbies to run afoul the first time they encoutered nodes, users, comments, taxonomy vocabularies, or basically anything else in Drupal. (Which, let's face it, they're far more likely to encounter first and have expectations set by than the inverse.)

sun’s picture

Reading @webchick's comment and double-checking with the table in #85, I'm

-1 on "lid": Too many chances for name clashes with other "lids" in modules (e.g., Length ID, Loop ID, Large ID, whatever), and not descriptive enough for third-party function signatures (e.g., hook_node_view($node, $view_mode, $lid) -- WTF??).

+1 on "langid": In this entire discussion, we actually used the term "language identifier" in order to be able to communicate with each other in the first place. ;) So a $langid directly maps to "language identifier", doesn't come with code/tag/WTF, and is also crystal clear in database schemas ({node}.langid) as well as on objects ($language->langid, $node->langid). For some reason, the table in #85 notes that IDs would always be integers, but that is not really the case, or at least it's an extremely far stretch of trying to project a "rule" to current $*id variables, which also doesn't hold water if you consider the real-world term and usage of "ID" (e.g., social security numbers, passport numbers, UUIDs (!), etc).

Gábor Hojtsy’s picture

@sun: really, why is $uid, $nid or $tid or $mlid not a WTF? Because you are used to it! They were never named any different! $lid is/would be/could be new, and it had an inconsistent (langcode) and inconsistently applied (never in schema) pattern. Yes. In this sense, $lid might clash with existing ids in contrib. With that thinking we'll never be able to introduce ids in the same pattern that most core objects use with this. There never can be an $xid, $zid or $wid or $kid in core for any new concept with fear there might be something with x, z, w or k in contrib then? Is this a rule?

sun’s picture

why is $uid, $nid or $tid or $mlid not a WTF?

They are less of a WTF, because they are the IDs of the object at hand. $node » $nid, $term » $tid, $user » $uid, etc.

A $lid is out of context in a $node context. It could be the node's "last-revision-author-id" or whatever other thing you can think of and which you can abbreviate with "lid".

In this sense, $lid might clash with existing ids in contrib. With that thinking we'll never be able to introduce ids in the same pattern that most core objects use with this. There never can be an $xid, $zid or $wid or $kid in core for any new concept with fear...

The situation is different, since the context is different. We can certainly introduce new objects in core that use $lid, $xid, $zid, or $kid variables and keys within their scope and context, and there is going to be zero impact on any other module.

But as soon as we pass on something to other functions, we use a clear name. That's why we have $entity_type, $entity_id, $entity, etc and not $type, $id, ...

Very similar to language identifiers are text formats, which are also scattered throughout core and contrib, and across objects and table schemata. We also don't use $fid there, and neither do we store $field[0]['value'] with $field[0]['fid'].

A Drupal using $fid and $lid (for stuff like text fields even within the same context, yay!) would be an absolute horror scenario in my mind.

Gábor Hojtsy’s picture

All right, a node table has nid, vid and uid. The taxonomy_index table has a nid. Who would have thought that it is about node taxonomy relations? (I'd have assumed it relates to entities). Anyway, I know now. Because nid is universal. Same goes for uid in the watchdog table, I know its a user id. How do I know that? Well, that is the name we use it for everywhere. We are talking about very basic core concepts like nodes and users. Language strives to become one like that.

Image effects and image styles use ieid and isid. watchdog uses wid, etc. Core already has incosistent ID names like fid for flood table (flood id) and file table. Or comment uses pid, while taxonomy_term_hierarchy uses 'parent'.

And yes, field API uses field_id and entitiy_id. Is that where Drupal is heading? Then why not $language_id? We are trying to have consistency here, right? Drupal used to be consistent in Xid's. That is clearly not 100% applied to some systems, but is our most common pattern for universally used identifiers. We want language to be a universal thing like that. It will be eeeeeverywhere. If our new pattern is in the field API, then are we going to ever have user_id and node_id in core? Is that the new consistency we are striving for or is entity/field API an exception, not a rule?

How can core be consistent? Especially how can it be consistent if we introduce yet another scheme with langid (shortened name, but not as short as the initials and no underscore). There is either a full name underscore pattern or an initial letter no underscore pattern. That's what I tried to express in #85 (explicitly saying the $lang type of abbreviation is nowhere to be seen in Drupal).

webchick’s picture

I would personally handle this by changing

hook_node_view($node, $view_mode, $lid)

to

hook_node_view($node, $view_mode, $language)

and just pass the entire $language object in for context. Ambiguity solved.

catch’s picture

If a node has a language and you load the node, unless we munge it the column name is going to be $node->langcode or whatever the string is. For function arguments and the current global I think we're assuming that all gets replaced by context object.

Gábor Hojtsy’s picture

In anticipation of a resolution to this issue soon, I've created http://drupal.org/sandbox/goba/1233384 to coordinate the massive work on this change. Pushed a languageid branch to http://drupalcode.org/sandbox/goba/1233384.git/tree/refs/heads/languageid. Happy to give people commit access who'd like to help. (BTW used this excellent guide from @sun to set up the sandbox: http://drupal.org/node/1181472, thanks @sun for writing that up!).

fago’s picture

-1 on lid too (see the linked issue for reasons).

I agree with Gabor that we should aim for consistency. Thus, let's rething the naming pattern once, then apply it. As this isn't only about languages I've opened #1233394: [Policy, no patch] Agree on a property naming pattern for that. Please comment.

Gábor Hojtsy’s picture

Clearly I did not want to open such a big can of worms, but this issue is really a cornerstone to move forward on many of the meaningful parts of the multilingual initiative. We can keep architecting APIs without any standards whatsoever for naming stuff regarding language, sometimes calling things language, langcode and locale depending on how we woke up, but that does not sound nice really. Figuring out a standard is important to start the real work, figuring out the standard is step 0, and it does not get close to any user facing change which are our ultimate goal. So while I appreciate the motion to open an even bigger monster thread for a more generic question, I'm just trying to underline that having a standard we can code towards would be important to have for my initiative soon. The OP for this issue was a language object load API. You know language load and language load multiple like other things do in Drupal. Sounds pretty simple, eh? How do you name the id you load languages by? We don't know that yet, so we cannot even do a language load API (withouth needing to rewrite it again soon). Its such a low level thing that we should hopefully move over from sooner than later.

plach’s picture

I was thinking about posting something like #100 too, but then realized that it might bring detriment to the D8MI: there is no warranty that #100 will be accomplished in an useful timeframe, instead if we agree on a coherent pattern at least in the language scope, it should be easy to conform to any new approved standard coming out of #100 later. This would let us proceed with the other tasks (step > 0), leaving the discussion open for better and more consistent object identifers and property names.

So from me +1 to any proposed scheme that does not involve 'locale'.

Gábor Hojtsy’s picture

Problem with "interim decisions" here is that we are going to do lots of API and schema work. There are already lots of APIs and schemas referring to language stuff, we'll expand on that considerably. Working with an interim naming scheme we'll just generate even more work for us to migrate to something else later. Is this delayed time-waste strategy the best we can do in the current situation to keep D8MI moving? (I tried to fend off the "stop-gap solution" suggestions above, because they need lots of schema/API changes at least as we work on the APIs piece by piece, and they do not guarantee we actually had an agreement and we'll need to revisit these big changes again in a major way in this same release cycle).

fago’s picture

Sry, my intention was not to hold anything up. However, I think this is something that needs to be discussed, and as outlined in #103, the sooner the better.
So what about giving the discussion a certain period of time (a week?) and then move on - even if there we were not able to reach consensus. Then someone has to make decisions to move on, and I think this is why it's good to have leaders or initiative owners..

Gábor Hojtsy’s picture

Issue tags: +terminology
plach’s picture

Probably I did not express myself correclty: what I meant is that we should find a (possibily) definitive fix for language identifiers. If from #100 a different pattern emerges, massive changes through out Drupal core will be performed anyway, but at least having standardized on a consistent pattern in the language scope should help the global consistency issue to be accomplished.

That said, @fago's proposal of fixing a deadline after which going on anyway might make sense to avoid doubling the efforts.

Gábor Hojtsy’s picture

Looks like its best for us for now to just use confusing names as always, since we cannot agree on improving in any direction. That is what basically happens while this issue is on the side and we are trying to make some actual progress instead of debating this for months.

However I've opened #1293304: Break up locale.module, but how? for the locale module split question, which in itself might possibly be interleaved with this discussion a bit at least (ie. locale/language question), but I believe can hopefully stand as its own discussion.

Gábor Hojtsy’s picture

Language / langcode problem also discussed as question on arguments at #1305378: Tokens should use $options['langcode'] and not need a language object.

IceCreamYou2’s picture

I just read this entire issue, although I admit skimming the last 50 or so posts. For reference, I try to avoid anything language-related, and only deal with it when it's relevant for a module I'm writing -- and currently I find language support perennially confusing. So here's what I would do:

global $page_language; // The language the current page was requested in. $language_interface sounds like a default language regardless of user settings.
$language = language_load(); // $language is a full language object
$language->langcode;
$langcode;
$object->langcode;
{language}.langcode

For langcodes:

  • "Locale" means "A place where something happens or is set, or that has events associated with it." It has nothing to do with languages. I don't care what other systems do. "Locale" is also the name of a module, and it feels like something named "locale" would be an object associated with the locale module.
  • "Tags" is confusing. Even if you use "language_tag" everywhere (including in the {languages} DB column and $language objects) it still feels like you are "tagging" languages (and you could, with terms like "Western," "Latin," "LTR," etc.). Additionally note that what I'm calling langcodes here is not a standard; they're not langcodes because they use underscores, so it's just a convention that doesn't have a name (albeit a common convention).
  • "Language" is obviously confusing.
  • "language_id" kind of makes sense, but everything else with "id" in Drupal is a number, and we don't use the "[string]_id" pattern very often.
  • "machine_name" is Drupal-y and obvious what it means. I like it. The only problem is it doesn't work on objects -- you can't write $node->machine_name. We should be consistent everywhere.
  • "langcode" is also obvious. Even the current DB schema describes {language}.language as "Language code." We should always use "langcode" and never just "code." In my (admittedly limited) experience with languages I have never heard this called anything other than a language code. Also note: 824 results for "language code," 81 results for "language tag," 833 results for "langcode" (and "locale" is obviously not comparable). The one argument AFAICT against "langcode" is that coding standards usually dislike abbreviations. However, we already violate this all over the place in about a bazillion ways. To me "langcode" is most analogous to "nid"/"uid" which are also abbreviations.

Additionally, according to Wikipedia:

  • "In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language identifier and a region identifier." (Emphasis mine. Although to be fair, that page confuses the concept of locale a bit too.)
  • "A language code is a code that assigns letters and/or numbers as identifiers or classifiers for languages. These codes may be used to organize library collections or presentations of data, to choose the correct localizations and translations in computing, and as a shorthand designation for longer forms of language-name." There are a bunch of examples of language code standards on that page (Drupal doesn't use any of the standards there) including the IETF Language Tag spec.
  • The Wikipedia pages for language codes like en-GB often explain it as a "language code."

All that said, I don't want to hold up actual work. Code is golden. If we standardize on something, it will be easier to change it later (though the chances we actually do are probably pretty low). So, if the bikeshed is over, forget I said anything and just get on with it. :-)

Gábor Hojtsy’s picture

@IceCreamYou2: just commenting on two things :)

1. Drupal does use of the standards referenced on the Wikipedia language codes page, namely IETF language tags. That is BCP 47 and is currently referring to RFC 5646 (as underlying standards are updated, BCP 47 will still be BCP 47 but the referenced RFC will change). For user friendliness, Drupal 8 links to the W3C language tags documentation, see http://www.w3.org/International/articles/language-tags/. This should be the best and most up to date guide to link to for any web app :)

2. I think the current grid-lock over what name to choose is that the more general break-out discussion at #1233394: [Policy, no patch] Agree on a property naming pattern proposes to get rid of nid, uid, etc. altogether and unify propert naming instead. According to that, we'd never abbreviate langcode again, ids would be named "id", while machine names would be made "name", and labels would be named "label". So the current $language "name" property would become the "label" property, the current "langcode" property would become the "name" property and probably other objects would refer to that as "language_name" I guess. So anyway, regardless of specifics, #1233394: [Policy, no patch] Agree on a property naming pattern has the goal set to turn the whole naming scheme to something very different from what we have now, so it seems very hard to justify to unleash a huge renaming effort to rename all "language" schema items and argument/property values to "langcode" at the moment, when it might need to be renamed again to something entirely different. So looks like since people want to redo naming of everything but don't yet agree how, this is kind of on hold. However, that whole renaming effort might not go anywhere in practice in which case we'll still need to figure this out or risk releasing D8 with similar confusions as D7 and D6 had in terms of developer experience.

IceCreamYou2’s picture

Ooh, thanks Gábor.

Re 1), my interpretation of IETF is that language tags use hyphens, and Drupal uses underscores, e.g. the standard is en-US and Drupal uses en_US. Same pattern, but that makes it different in my mind.

Re 2), thanks for clarifying. I like that approach. Would have saved me a whole lot of time if #1233394: [Policy, no patch] Agree on a property naming pattern was referenced in the issue summary. :-)

Gábor Hojtsy’s picture

@IceCreamYou2: Re 1) where have you seen Drupal using underscores instead of hyphens? I'm afraid you are mixing something up there.

IceCreamYou2’s picture

#8, #18, #19... I feel like somewhere in this thread someone mentioned something about it but I can't find it now and I may have read too much into it.

Gábor Hojtsy’s picture

Just to summarize we still don't have any consensus, so our new code introduces "nice" snippets like this (from #1260716: Improve language onboarding user experience):

$browser_langcode = locale_language_from_browser($enabled_languages);
//....
if ($language->language == $browser_langcode) {
  // ...
}

(In fact @good_man working on the patch there got pretty confused as to what $language->language is used for even, the actual patch has other issues due to that, yeah).

Gábor Hojtsy’s picture

Status: Needs work » Fixed

All right, well, issues like #1260716: Improve language onboarding user experience really necessitate we move some way from our pile of confusion, so langcode be it. The installer issue moves from 'locale' GET/POST param to 'langcode' and makes similarly appropriate changes to obliterate 'locale' from everywhere it had it. It also introduces a "langcode" key in install profile .info files in sync with that. That is a clear move to the summary/plan outlined above by @IceCreamYou2. Since nobody argued with @IceCreamYou2 and previous statements were in support of a move like this + #1233394: [Policy, no patch] Agree on a property naming pattern is going nowhere, this is our guideline:

$language = language_load(); // $language is a full language object
$language->langcode;
$langcode;
$object->langcode;
{language}.langcode
plach’s picture

Works for me, although it's not my preferred solution among the ones presented here. I guess this might be revisited if the general issue gets somewhere. As said above, having a fixed schema to rely on will ease the task in this case.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Make sure PHP code wrapping is ok.

Gábor Hojtsy’s picture

Issue tags: +API clean-up, +langcode

Tagging up so we can find the langcode API change related issues later.

Active work on implementing "langcode" universally is going on in

- #1357918: Missing update for language_default in language langcode update and
- #1357912: Convert path language code schema to langcode

Please help there! The installer was converted earlier to langcode from locale in #1260716: Improve language onboarding user experience.

Ongoing, langcode related core issues can be found at http://drupal.org/project/issues/search/drupal?issue_tags=langcode

Gábor Hojtsy’s picture

Issue tags: +language-base

Tagging for base language system.

Gábor Hojtsy’s picture

Issue summary: View changes

Update summary for up to date decisions.