The list of languages in the Add language dropdown is very impressive. However, when adding most of those languages, you don't get any community provided backing (eg. translations), the list of languages on localize.drupal.org is not as wide as in stock Drupal. Even some languages hosted on localize.drupal.org are barely started, but at least we have point persons for them.

For Drupal 8, the plan is that Drupal would "update" its known list of languages from localize.drupal.org (keeping any custom languages added locally) on some schedule (cron, when you want it to, etc). See #568986: Dynamically update standard language list from localization server. For that, it would be best if Drupal would start off with the languages that are available on localize.drupal.org, so we have a clear starting point. For that effect, I've looked at the two language lists in core and localize.drupal.org and made sure we only have those in core that we have on localize.drupal.org. Also, I've looked up http://www.omniglot.com/language/names.htm and wikipedia articles for the specific languages to figure out the native names for the many languages that did not have that yet.

All-in-all this should be a complete list of languages from localize.drupal.org with their native names included and no other languages on top of that. For the other less common languages, people can always add them as custom languages.

Parent issue

#1260690: META: Improve multilingual user experience in Drupal 8

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Lars Toomre’s picture

There are many changes in this list as well as the specification of native names where missing. I worry when I see such a list of data is specified that a spelling mistake or misinterpretation might be included. Can we thus get several sets of eyes to review this important data array? Or perhaps even print a table in this issue with the language and its proposed native name?

Thanks Gábor for this detailed work!!

Gábor Hojtsy’s picture

I think its irrational to expect all those teams to come and verify their values. I did my best and we can always make improvements, so there is absolutely no harm done I think. The only thing that is volatile in terms of Drupal use is the language code and I did not change any language codes whatsoever in this list. All the ones are as they were before. If we are to wait for verification from those languages, this is going to sit here for eternity. As easy it is maybe to have mistakes in this patch as easy are they to fix later IMHO and having it in there would let us show it to the respective communities as part of the ongoing work.

What do you think?

Gábor Hojtsy’s picture

Status: Needs review » Needs work

A few new languages were just added to localize.drupal.org (http://localize.drupal.org/node/4124), so the patch will need to be updated for them. Otherwise I think our best shot is to get this in soon and then get people review it on the UI as it is going to appear in the installer and the config screens. Don't think we can get this covered pre-commit.

Lars Toomre’s picture

Status: Needs work » Reviewed & tested by the community

Sounds like a great plan...

Gábor Hojtsy’s picture

Agreed.

catch’s picture

Status: Reviewed & tested by the community » Needs work

Why not just go straight to #568986: Dynamically update standard language list from localization server? The patch was outdated by changes external to core within three weeks of being posted, so this seems like it'd be impossible to keep up to date efficiently - either way it at least needs to be up to date before commit.

Gábor Hojtsy’s picture

@catch: there are various reasons we need to have this "locally cached". Consider this list a primed cache of the localize.drupal.org list. Drupal should not require an internet connection to be installed; if you do install it without an internet connection, you should be able to have a .po file locally and select that language, which requires us to have that information available. (.po files don't have all the information we need about the language, like its writing direction). I'll update the list momentarily for commit :)

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
9.81 KB

Updated.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
catch’s picture

Status: Reviewed & tested by the community » Needs work

So how is the list going to be kept up-to-date then? Will you be rolling patches to update it every month? Why couldn't that list be moved to the packaging script and 'cached' there - people who git clone we could require to have an internet connection to get a list of languages.

Gábor Hojtsy’s picture

Status: Needs work » Needs review

Well, we can theoretically consider the language list not part of the Drupal package, like .info file version numbers are not. That would mean that if you are working with a git checkout of core (and you are on a plane, I believe part of the great things we have with git is that you can work while offline, right), you'll not have a language list with your Drupal. That is definitely not the end of the world, however @sun argued in #1231402: Drupal does not use ISO language codes, iso.inc is misleading that the language list is useful for all kinds of reasons, not just for the locale module / translation thing, and we are following that thinking in #1293304: Break up locale.module, but how? as well.

Anyway, if we want to have this generated by the build scripts, I assume it would surely not be in includes/* in any case. Regardless if its executable PHP or in a not executable data format, we need a place in core which does not collide with your git checkout (ie. you'll not commit it accidentally, etc), so its not in the source tree. I assume eventually the l.d.o provided language list will be cached locally in some form like that. For now, we could use a database table, a variable with a dump of an array, etc, but I assume we'll use configuration storage as provided by D8 (and probably not provide a UI for the list of "predefined" languages, since we have a UI for the list of custom defined ones). All this sounds like pretty much farther in our process though. We don't yet have configuration storage and also have various other issues to resolve to get there for the l10n_update module integration process.

In the meantime (a) we can keep the list of the languages outdated or (b) remove it altogether (AKA remove support for predefined languages), but I thought having it up to date and more in line with what we expect the language list to contain / looks like would be very useful to work on other issues like #1260716: Improve language onboarding user experience without the need to solve the syncing and storage problem for predefined languages first. The primary motivation for me to come in and add native language names to all languages was that we want to display that for language selection in #1260716: Improve language onboarding user experience, and then I figured we can just as well remove those that we do not support anyway on l.d.o, since those are ("exotic") languages Drupal is not used at all to our knowledge, and we don't want to "distract" the installer UI with them.

So I think this patch has value currently in that it lets #1260716: Improve language onboarding user experience with the kind of data it expects to have available. This patch and #1260586: Consolidate .po file import to one directory would unblock #1260716: Improve language onboarding user experience, to start work on that. Without the kind of data we need for the installer, we'd postpone that indefinitely until we get there with the l10n_update integration process which is still pretty much off in the future (lots of other huge dependencies there like #1189184: OOP & PSR-0-ify gettext .po file parsing and generation and #361597: CRUD API for locale source and locale target strings).

Hope this helps shed some light on the motivation.

catch’s picture

Discussed this in irc with Gabor.

My main concern here is we're adding something which pretends to keep up to date with l.d.o (but would not without manual review and patching), in favour of something that is completely out of sync but doesn't make any such claim. Gabor's counter argument was:

- this is going to be completely removed in favour of the l.d.o stuff (even if we 'cache' a copy of that in Drupal core somewhere it could still be generated the same way).

- because we're planning to get rid of it, there is no need to keep it up-to-date manually.

- making the list closer to l.d.o now unblocks other patches working towards the end goal.

This also makes Lars' point about checking the list a bit less important - since eventually we'll be able to just correct things on l.d.o and regenerate.

So I think I'm fine with committing this, but it'd be nice to have a third reviewer here.

Gábor Hojtsy’s picture

Discussed this with @catch in IRC. He says it would be good to have one more reviewer to help verify native names the plan, but otherwise he agrees accepts the goals. :)

gdd’s picture

Gabor asked me to chime in here about the CMI implications. I don't see any problem storing this info in our current config system. This is a pretty simple structure to represent in xml and if our API can't handle updating it on the fly then we've got big problems.

Gábor Hojtsy’s picture

Rolled patch with this comment addition to explain the "Left to right marker" comments. Asked for by @hejrocker.

 * The "Left-to-right marker" comments and the enclosed UTF-8 markers are to
 * make otherwise strange looking PHP syntax natural (to not be displayed in
 * right to left). See http://drupal.org/node/128866#comment-528929.

No other changes, so tests should still pass.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Plan verified, code even cleaner, so RTBC again IMHO.

catch’s picture

Status: Reviewed & tested by the community » Fixed

OK. Committed/pushed to 8.x. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Gábor Hojtsy’s picture

Issue tags: +language-base

Tagging for base language system.

Gábor Hojtsy’s picture

I've opened #1632236: Convert built-in language list to CMI to convert the list to CMI :)

Gábor Hojtsy’s picture

Found no support for converting the list to CMI, so closed down #1632236: Convert built-in language list to CMI. Looks like it is going to stay in a PHP array as-is.

Gábor Hojtsy’s picture

Issue summary: View changes

Add parent