Updated: Comment #10

Problem/Motivation

Drupal's core/lib/Drupal/Core/Locale/CountryManager.php currently uses data from the Debian project, whose data is derived from ISO 3166-1 which takes it country names from the United Nations. The UN list is a politically-charged list of countries partly due to member states deciding on the list and ignoring non-member states input. The goals of the UN are to create a list that is amicable to all governments of member states. Clarity of naming is not a consideration.

For example, the ISO/UN data lists "Korea, Democratic People's Republic of" and "Korea, Republic of" instead of the more commonly known "North Korea" and "South Korea", respectively.

Basically, the interests of parties in the UN do not align with the interests of software developers (and open-source developers in particular), and, thus, doesn't make for a good source of country/territory data. See also http://en.wikipedia.org/wiki/ISO_3166-1#Naming_and_code_construction

CLDR seems to be the emerging standard for localization and internationalization, used by many organizations that make software.

From the CLDR website:

The Unicode CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Proposed resolution

Start using the CLDR data as the up-stream source of country list. That data can be found by:

  1. Going to CLDR's downloads page at http://cldr.unicode.org/index/downloads
  2. Clicking on the "Data" link on the latest release.
  3. Choosing the json.zip file. (For example, at: http://unicode.org/Public/cldr/23.1/ )
  4. Using the appropriate country list data. The English language country data is found in main/en/territories.json

Update/rename the update-iso-3166.sh import script created in #1068840: core/includes/standard.inc contains inaccurate country data to use the latest CLDR data.

Remaining tasks

Patch review!

User interface changes

Instead of using Debian's data (derived from ISO 3166-1 which is derived from the United Nations list of countries), we'd use the data provided by CLDR. This would change the text shown in Drupal 8’s installer for the list of countries.

API changes

None.

Original issue where Debian was used as upstream source of data: #1068840: core/includes/standard.inc contains inaccurate country data
Proposal to possibly remove the entire list from core: #1933614: [META] Locale settings in Drupal make little (UX) sense

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

JohnAlbin’s picture

Title: Integrate/use cldr unicode data (http://cldr.unicode.org/) » Regression: Replace incorrect Debian country data with cldr unicode data (http://cldr.unicode.org/)
Category: feature » bug

From #1068840: core/includes/standard.inc contains inaccurate country data:

Because ISO data is not readily available, and Debian is doing a great job at maintaining accurate, ISO-compliant lists of language, territory, currency, script codes (*and* their translations in many languages).

Oh, god, no. Debian is doing a terrible job at maintaining a country code list. They are tone deaf to changes in the country list and are militant in their conformance to the ISO list, which is a copy-and-paste of the UN's list of countries, despite the fact that that list is a politically-biased list maintained by member states of the UN with no voice for governments who have lost their seat at the UN, like Taiwan.

I spent weeks arguing with Debian regarding Taiwan's entry. Alas, their bug tracking website is so gawd awful you can't see any of the numerous comments from numerous different people who disagree with Debian's position. Otherwise, I'd provide a link.

The selection “Taiwan, Province of China" is very problematic. To some people it would be like relabeling the USA as "US, province of Britain" or Germany as "Germany, former state of USSR". There's a somewhat rambling Wikipedia article on the subject: http://en.wikipedia.org/wiki/Taiwan,_China

And if you think I'm being over-sensitive, I'd point out that the government of Taiwan sued the ISO for using “Taiwan, Province of China" in their standard. http://www.chinapost.com.tw/taiwan/2007/10/02/124980/Taiwan-sues.htm

Automated process for getting up-to-date country lists: +1000!
Using Debian as the up-stream source for this data: -1,000,000

Using CLDR looks promising. I'll update the issue summary with links to their data.

JohnAlbin’s picture

Issue summary: View changes

sdfgsdfg

JohnAlbin’s picture

Title: Regression: Replace incorrect Debian country data with cldr unicode data (http://cldr.unicode.org/) » Regression: Replace ill-suited Debian country data with CLDR unicode data

Updated title and issue summary

JohnAlbin’s picture

JohnAlbin’s picture

Issue summary: View changes

Expanded issue summary

droplet’s picture

Choosing the json.zip file. (For example, at: http://unicode.org/Public/cldr/23.1/ )

Where's the country list ?? I saw timezone list only.

JohnAlbin’s picture

Where's the country list ?? I saw timezone list only.

As posted in the issue summary, the English language region list is in main/en/territories.json. There's one for several different languages.

{
  "main": {
    "en": {
      "identity": {
        "version": {
          "@cldrVersion": "23.1",
          "@number": "$Revision: 8671 $"
        },
        "generation": {
          "@date": "$Date: 2013-05-03 14:17:48 -0500 (Fri, 03 May 2013) $"
        },
        "language": "en"
      },
      "localeDisplayNames": {
        "territories": {
          "001": "World",
          "002": "Africa",
          "003": "North America",

*snip*

          "TH": "Thailand",
          "TJ": "Tajikistan",
          "TK": "Tokelau",
          "TL": "Timor-Leste",
          "TL-alt-variant": "East Timor",
          "TM": "Turkmenistan",
          "TN": "Tunisia",
          "TO": "Tonga",
          "TR": "Turkey",
          "TT": "Trinidad and Tobago",
          "TV": "Tuvalu",
          "TW": "Taiwan",
          "TZ": "Tanzania",
          "UA": "Ukraine",
          "UG": "Uganda",
          "UM": "U.S. Outlying Islands",
          "US": "United States",
          "US-alt-short": "U.S.",
          "UY": "Uruguay",
          "UZ": "Uzbekistan",
          "VA": "Vatican City",
          "VC": "Saint Vincent and the Grenadines",
          "VE": "Venezuela",
          "VG": "British Virgin Islands",
          "VI": "U.S. Virgin Islands",
          "VN": "Vietnam",
          "VU": "Vanuatu",
          "WF": "Wallis and Futuna",
          "WS": "Samoa",
          "XK": "Kosovo",
          "YE": "Yemen",
          "YT": "Mayotte",
          "ZA": "South Africa",
          "ZM": "Zambia",
          "ZW": "Zimbabwe",
          "ZZ": "Unknown Region"
        }
      }
    }
  }
}
junedkazi’s picture

Also I see some more inconsistency with the list in core right now like the name is not complete.

'TZ' => t('Tanzania, United Republic of'),
'VE' => t('Venezuela, Bolivarian Republic of'),

There is nothing mention as to Republic of ???

catch’s picture

That's just the way to write 'United Republic of Tanzania' so it shows up alphabetically as Tanzania.

JohnAlbin’s picture

Status: Active » Needs review
FileSize
9.45 KB
13.78 KB

It turns out the update-iso-3166.sh script was already broken by the move from core/includes/standard.inc to core/lib/Drupal/Core/Locale/CountryManager.php. :-\

Ok. This patch updates the script name to be update-countries.sh, adds instructions on how to use the script and also includes the changeset on core/lib/Drupal/Core/Locale/CountryManager.php after running the script.

The script includes a code stub for $alt_codes if later we want to use any of CLDR's alternate territory names instead of the default territory names. I've attached CLDR's latest territories.json for your perusing convenience so you can see what the dataset looks like.

JohnAlbin’s picture

Issue summary: View changes

Updated issue summary.

jimyhuang’s picture

Agree with “Taiwan, Province of China" is very problematic. Instead of "Province of China", we at lease have our president elected from people in Taiwan at 2012. How a province have a president?

As an asia user, I'm sure this patch "fix" many data of county name.
Such as "Korea, Democratic People's Republic of" and "Korea, Republic of". I can't imagine other country people can recognize which is South Korea, which is another.

JohnAlbin’s picture

Issue summary: View changes

Updated issue summary.

JohnAlbin’s picture

I've added Jimmy's example of North/South Korea to the issue summary. Its not just about the naming of Taiwan (the thing that got me to write the patch), but about a normal user to be able to recognize the country names. Yes, in the installer, you only need to recognize your own country, but the CountryManager.php API is supposed to be used by any functionality needing a country list, so having recognizable country names is essential. The ISO/UN data does not provide that clarity.

JohnAlbin’s picture

Issue summary: View changes

Add Korean examples.

amourow’s picture

The correction of Taiwan in #8 patch is right.
Taiwan, officially the Republic of China, is never a "Province of China". ISO 3661-1 doesn't reflect the actual situation of Taiwan.

The problem occurs often in the Internet. Google also helped to remove "province of China" from Google Maps.
http://news.ebrandz.com/miscellaneous/2005/433-taiwans-province-tag-foll...

Thanks to @JohnAlbin for the patch.

jamesliu78’s picture

The #8 patch is greater.

"Some of" is not friendly for end user. And clear it's also more shorter, easier and comfortable.

Thanks to JohnAlbin for the issue.

droplet’s picture

The script looks good. Few improves can be done:

+++ b/core/scripts/update-countries.shundefined
@@ -0,0 +1,93 @@
+$data = json_decode(file_get_contents($uri));

+++ /dev/nullundefined
@@ -1,72 +0,0 @@
-// Fall back and default to original Debian source.

Old script falls back to online sources. New script, I think we need a file existence check.

+++ b/core/scripts/update-countries.shundefined
@@ -0,0 +1,93 @@
+    list($code, ) = explode('-', $code);

It can be
$code = strtok($code, '-');

JohnAlbin’s picture

Old script falls back to online sources.

Unfortunately, the territories.json file is not accessible directly from the web. It's only available as part of a downloadable .zip file.

This patch incorporates the changes droplet mentioned above.

agrozyme’s picture

At the page http://en.wikipedia.org/wiki/ISO_3166-1 just record the thing happened in AD 2007 - 2009.
In 2009, the Federal Supreme Court of Switzerland has no court the case.
So the problem still has no result.

But we can ask the people who live in Taiwan: What is the name of your country?

tim.plunkett’s picture

+++ b/core/lib/Drupal/Core/Locale/CountryManager.phpundefined
@@ -151,7 +156,7 @@ public static function getStandardList() {
-      'IR' => t('Iran, Islamic Republic of'),
+      'IR' => t('Iran'),

@@ -164,12 +169,12 @@ public static function getStandardList() {
-      'LA' => t("Lao People's Democratic Republic"),
+      'LA' => t('Laos'),

@@ -182,16 +187,16 @@ public static function getStandardList() {
-      'MD' => t('Moldova, Republic of'),
+      'MD' => t('Moldova'),
...
-      'MO' => t('Macao'),
+      'MO' => t('Macau SAR China'),

@@ -224,17 +229,18 @@ public static function getStandardList() {
-      'PS' => t('Palestine, State of'),
+      'PS' => t('Palestinian Territories'),
...
-      'RU' => t('Russian Federation'),
+      'RU' => t('Russia'),

Some of these could be just as touchy as the change we're fixing here...

droplet’s picture

Code side, RTBC!!!
Political issue, no comments.

JohnAlbin’s picture

Some of these could be just as touchy as the change we're fixing here...

Actually, no. You're comparing the ISO data added in March of this year to this patch. If you compare the Drupal 7 country list to this patch, you'll find they are much more similar. If they were more touchy, we'd have issues in the Drupal 7 queue already. Here's the diff between the patch and D7:

+      'AC' => t('Ascension Island'),
       'AD' => t('Andorra'),
       'AE' => t('United Arab Emirates'),
       'AF' => t('Afghanistan'),
@@ -58,7 +59,7 @@ public static function getStandardList() {
       'AT' => t('Austria'),
       'AU' => t('Australia'),
       'AW' => t('Aruba'),
-      'AX' => t('Aland Islands'),
+      'AX' => t('Åland Islands'),
       'AZ' => t('Azerbaijan'),
       'BA' => t('Bosnia and Herzegovina'),
       'BB' => t('Barbados'),
@@ -73,6 +74,7 @@ public static function getStandardList() {
       'BM' => t('Bermuda'),
       'BN' => t('Brunei'),
       'BO' => t('Bolivia'),
+      'BQ' => t('Caribbean Netherlands'),
       'BR' => t('Brazil'),
       'BS' => t('Bahamas'),
       'BT' => t('Bhutan'),
@@ -81,30 +83,33 @@ public static function getStandardList() {
       'BY' => t('Belarus'),
       'BZ' => t('Belize'),
       'CA' => t('Canada'),
-      'CC' => t('Cocos (Keeling) Islands'),
-      'CD' => t('Congo (Kinshasa)'),
+      'CC' => t('Cocos [Keeling] Islands'),
+      'CD' => t('Congo - Kinshasa'),
       'CF' => t('Central African Republic'),
-      'CG' => t('Congo (Brazzaville)'),
+      'CG' => t('Congo - Brazzaville'),
       'CH' => t('Switzerland'),
-      'CI' => t('Ivory Coast'),
+      'CI' => t('Côte d’Ivoire'),
       'CK' => t('Cook Islands'),
       'CL' => t('Chile'),
       'CM' => t('Cameroon'),
       'CN' => t('China'),
       'CO' => t('Colombia'),
+      'CP' => t('Clipperton Island'),
       'CR' => t('Costa Rica'),
       'CU' => t('Cuba'),
-      'CW' => t('Curaçao'),
       'CV' => t('Cape Verde'),
+      'CW' => t('Curaçao'),
       'CX' => t('Christmas Island'),
       'CY' => t('Cyprus'),
       'CZ' => t('Czech Republic'),
       'DE' => t('Germany'),
+      'DG' => t('Diego Garcia'),
       'DJ' => t('Djibouti'),
       'DK' => t('Denmark'),
       'DM' => t('Dominica'),
       'DO' => t('Dominican Republic'),
       'DZ' => t('Algeria'),
+      'EA' => t('Ceuta and Melilla'),
       'EC' => t('Ecuador'),
       'EE' => t('Estonia'),
       'EG' => t('Egypt'),
@@ -137,12 +142,13 @@ public static function getStandardList() {
       'GU' => t('Guam'),
       'GW' => t('Guinea-Bissau'),
       'GY' => t('Guyana'),
-      'HK' => t('Hong Kong S.A.R., China'),
+      'HK' => t('Hong Kong SAR China'),
       'HM' => t('Heard Island and McDonald Islands'),
       'HN' => t('Honduras'),
       'HR' => t('Croatia'),
       'HT' => t('Haiti'),
       'HU' => t('Hungary'),
+      'IC' => t('Canary Islands'),
       'ID' => t('Indonesia'),
       'IE' => t('Ireland'),
       'IL' => t('Israel'),
@@ -183,14 +189,14 @@ public static function getStandardList() {
       'MC' => t('Monaco'),
       'MD' => t('Moldova'),
       'ME' => t('Montenegro'),
-      'MF' => t('Saint Martin (French part)'),
+      'MF' => t('Saint Martin'),
       'MG' => t('Madagascar'),
       'MH' => t('Marshall Islands'),
       'MK' => t('Macedonia'),
       'ML' => t('Mali'),
-      'MM' => t('Myanmar'),
+      'MM' => t('Myanmar [Burma]'),
       'MN' => t('Mongolia'),
-      'MO' => t('Macao S.A.R., China'),
+      'MO' => t('Macau SAR China'),
       'MP' => t('Northern Mariana Islands'),
       'MQ' => t('Martinique'),
       'MR' => t('Mauritania'),
@@ -223,14 +229,15 @@ public static function getStandardList() {
       'PK' => t('Pakistan'),
       'PL' => t('Poland'),
       'PM' => t('Saint Pierre and Miquelon'),
-      'PN' => t('Pitcairn'),
+      'PN' => t('Pitcairn Islands'),
       'PR' => t('Puerto Rico'),
-      'PS' => t('Palestinian Territory'),
+      'PS' => t('Palestinian Territories'),
       'PT' => t('Portugal'),
       'PW' => t('Palau'),
       'PY' => t('Paraguay'),
       'QA' => t('Qatar'),
-      'RE' => t('Reunion'),
+      'QO' => t('Outlying Oceania'),
+      'RE' => t('Réunion'),
       'RO' => t('Romania'),
       'RS' => t('Serbia'),
       'RU' => t('Russia'),
@@ -250,10 +257,13 @@ public static function getStandardList() {
       'SN' => t('Senegal'),
       'SO' => t('Somalia'),
       'SR' => t('Suriname'),
-      'ST' => t('Sao Tome and Principe'),
+      'SS' => t('South Sudan'),
+      'ST' => t('São Tomé and Príncipe'),
       'SV' => t('El Salvador'),
+      'SX' => t('Sint Maarten'),
       'SY' => t('Syria'),
       'SZ' => t('Swaziland'),
+      'TA' => t('Tristan da Cunha'),
       'TC' => t('Turks and Caicos Islands'),
       'TD' => t('Chad'),
       'TF' => t('French Southern Territories'),
@@ -272,11 +282,11 @@ public static function getStandardList() {
       'TZ' => t('Tanzania'),
       'UA' => t('Ukraine'),
       'UG' => t('Uganda'),
-      'UM' => t('United States Minor Outlying Islands'),
+      'UM' => t('U.S. Outlying Islands'),
       'US' => t('United States'),
       'UY' => t('Uruguay'),
       'UZ' => t('Uzbekistan'),
-      'VA' => t('Vatican'),
+      'VA' => t('Vatican City'),
       'VC' => t('Saint Vincent and the Grenadines'),
       'VE' => t('Venezuela'),
       'VG' => t('British Virgin Islands'),
@@ -285,6 +295,7 @@ public static function getStandardList() {
       'VU' => t('Vanuatu'),
       'WF' => t('Wallis and Futuna'),
       'WS' => t('Samoa'),
+      'XK' => t('Kosovo'),
       'YE' => t('Yemen'),
       'YT' => t('Mayotte'),
       'ZA' => t('South Africa'),

As you can see, the are very few changes with my patch. It was the ISO patch in March that was touchy. See that list of changes in the Issue Summary of #1068840: core/includes/standard.inc contains inaccurate country data

In addition, the new update-countries.sh script has stub code to use any alternate country names that we wish to use. Take a look at the regions.json data I attached in the comment above. So, for example, if the Drupal community members in "United Kingdom" say we should be using "U.K." instead, we can easily add that option to our script.

JohnAlbin’s picture

But we can ask the people who live in Taiwan: What is the name of your country?

You just did. http://drupal.org/user/32095 :-D

Also, of the commenters above, amourow, jamesliu78 and jimyhuang live in Taiwan.

At the page http://en.wikipedia.org/wiki/ISO_3166-1 just record the thing happened in AD 2007 - 2009.
In 2009, the Federal Supreme Court of Switzerland has no court the case.
So the problem still has no result.

Yep, the courts in Switzerland threw the case out because it was a "political matter" and not a legal one, or something, something BS.

But that just highlights why Drupal using the Debian data source is so perverse. As of right now, Drupal core says "fix this problem in the upstream Debian code". Debian says "we just use the ISO standard. we won't fork the standard.". The ISO says "take it up with the UN". But, the government of Taiwan can't enter the UN and can't sue the ISO. So…

It's literally impossible to make any changes to ANY country unless we switch the data source.

tim.plunkett’s picture

Ah, the comparison in #18 is very informative. That makes me worry much less. Thanks @JohnAlbin

agrozyme’s picture

May be we can think another solution.
If the Drupal 8 is released and the list still shows "Taiwan, Province of China".
Should we have a hook function to replease the list?

If we have the hook function, we can write a module "Taiwan Patch" to fix it.
In Taiwan, when we use Drupal to build the Taiwan government case, we must use the name : "Taiwan" or "Republic of china" .

Of cause, I don't want to write the module....

tim.plunkett’s picture

There is already a hook_countries_alter() you can use.

adammalone’s picture

As a commenter who speaks from, and lives outside of any country altered by this patch, this issue is perhaps less emotive for me.

That being said, I'm of the opinion that the country list should expose options that citizens of, and those residing in the countries would recognise as the name of the country.

Admittedly a huge political issue although the diff in #18 does show what I would consider more widely used names for said countries.

jamesliu78’s picture

I don't think hook function is a good way to fix it.
That's just make a lot of modules to fix the list for they own country.

Now we already got a patch here, why just make it better?

tim.plunkett’s picture

Oh I agree we should proceed with the patch, I'm not suggesting the alter hook is a solution. Just that it exists to address the suggestion in #21.

droplet’s picture

Status: Needs review » Reviewed & tested by the community

Not a win-win game. Who knows what Kim Jong-un wanted ?

RTBC to me.

Who uses CLDR?

Some of the companies and organizations that use CLDR are:
Apple (OS X & applications; iOS for iPhone, iPad, iPod touch; Safari for Windows; Apple Mobile Device Support in iTunes for Windows; …)
Google (Web Search, Chrome, Android, Adwords, Google+, Google Maps, Blogger, Google Analytics, …)
IBM (DB2, Lotus, Websphere, Tivoli, Rational, AIX, i/OS, z/OS,…)
and many others, including:
ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Argonne National Laboratory, Avaya, BAE Systems Geospatial eXploitation Products, BEA, BluePhoenix Solutions, BMC Software, Boost, BroadJump, Business Objects, caris, CERN, Debian Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Firebird RDBMS, Free BSD, Gentoo Linux, GroundWork Open Source, GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, Jikes, Library of Congress, Mathworks, Mozilla, Netezza, OpenOffice, Oracle (Solaris, Java), Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, SuSE, Symantec, Teradata (NCR), ToolAware, Trend Micro, Virage, webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!
To suggest additions or corrections, please file a ticket.

Follow Up task:
To suggest additions or corrections, please file a ticket.

agrozyme’s picture

The hook function solution is the last choice.
In fact, Taiwan government want to join UN, but we are not a member of UN now. (see this)
Because the UN does not recognize the Republic of China which governs Taiwan and considers the territory to be part of the People's Republic of China.

May be we can't understand what Kim Jong-un wanted, but we are not Kim Jong-un.
We can discuss this issue and get a better result.

I was born in Taiwan at 1974 and I live Taiwan today.
I know this is a very problematic, but I must say that: Taiwan or Republic of China is the name of our country.

tky’s picture

I am not a coder but I know one thing clearly about internet: code is law.
In any way, you are unable to separate law/political issue from code one by this or that easy excuse. When you are coding, you are making laws or extending the area of law in the real world into virtual space.

I agree with the comments made by JohnAlbin, jimyhuang, amourow, jamesliu78 and agrozyme. No matter what names Taiwanese would like to call themselves, Taiwan or ROC., the country is simply not a province of China, legally or politically.

If you knew there is someing wrong in the code, correct it, make thing right.

Pancho’s picture

I absolutely agree with this being RTBC: both from a political perspective and usability-wise, CLDR data is much better.
And as a bonus it includes so much more locale data that we might want to leverage at a later point.
It would be even nicer if built into Update Manager, but that's beyond this issue's scope.
So another clear +1 from me.

Pancho’s picture

As the bash script is being changed and renamed, I'm just providing a diff -u of the two file for easier review.

[edit:] Didn't know files with a .diff extension are sent to testbot, so please ignore the meaningless test results. #14 remains RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 1938892-compare-bash-scripts.diff, failed testing.

Damien Tournoud’s picture

No concerns on my side either. We are not maintaining the list of countries ourselves, which is all that matters from my perspective.

That said, the documentation of CountryManager::getStandardList() clearly references ISO 3166-1, which we are not following anymore. This need to be fixed.

Damien Tournoud’s picture

Title: Regression: Replace ill-suited Debian country data with CLDR unicode data » Switch from ISO country data to CLDR unicode data

I was quoted above, so I feel like I should answer.

Because ISO data is not readily available, and Debian is doing a great job at maintaining accurate, ISO-compliant lists of language, territory, currency, script codes (*and* their translations in many languages).

Oh, god, no. Debian is doing a terrible job at maintaining a country code list. They are tone deaf to changes in the country list and are militant in their conformance to the ISO list, which is a copy-and-paste of the UN's list of countries, despite the fact that that list is a politically-biased list maintained by member states of the UN with no voice for governments who have lost their seat at the UN, like Taiwan.

While I can agree that the ISO list is not a silver bullet, you have your logic totally backward here. Debian is doing a great job at maintaining an "accurate, ISO-compliant lists of language, territory, currency, script codes (*and* their translations in many languages)". The keyword here is ISO-compliant.

Damien Tournoud’s picture

Title: Switch from ISO country data to CLDR unicode data » Switch from ISO-3166-1 country data to CLDR unicode data
Damien Tournoud’s picture

I also removed the "regression" tag, because Drupal always pretended to follow ISO-3166-1, but never actually did. The list of Drupal 8 is an accurate ISO-3166-1 list, so the only way of seeing this is that it is an improvement, not a regression.

That said, I'm in favor of switching to CLDR.

droplet’s picture

Status: Needs work » Reviewed & tested by the community

#14 is the correct patch.

@Pancho,
end with do-not-test.patch or interdiff.patch next time :)

Damien Tournoud’s picture

Status: Reviewed & tested by the community » Needs work

#32 points some work that still need to happen.

JohnAlbin’s picture

@Damien “The keyword here is ISO-compliant.” Fair enough! Your original statement about Debian is accurate then. Sorry about that!

I'l re-roll the patch to update the code comment about ISO.

JohnAlbin’s picture

Status: Needs work » Needs review
FileSize
1.1 KB
14.64 KB

Actually, "ISO 3166-1" is mentioned twice in the code comments. Nice catch, Damien! The latest patch fixes those.

droplet’s picture

Status: Needs review » Reviewed & tested by the community

Thanks @Damien.

OK, re-tested. 2 new docs changes. and alt-short name, exclude country code, all work.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 8.x. Thanks!

JohnAlbin’s picture

Thanks, Dries! :-)

Alan D.’s picture

Um, any one check the data?

Diego Garcia is part of the British Indian Teritory
Ceuta and Melilla - Spain's two autonomous cities, Ceuta and Melilla
Canary Islands - one of Spain's 17 autonomous communities
Saint Martin is an island is divided roughly 60/40 between France / Kingdom of the Netherlands. Isn't this almost like stating that North American is better than Canada, US, Mexico, just on a different scale?
Outlying Oceania.

+Falkland Islands
-Falkland Islands (Malvinas)

Politically dropping iso for cldr, the question should be: who do we want to piss off? Britain or Argentina? Same for Taiwan, Palestine and probably many other countries world wide...

Fairly ugly, would one want to stick with the official name
- 'MM' => t('Myanmar'),
+ 'MM' => t('Myanmar [Burma]'),

Not going to re-open, but really?

Pancho’s picture

I'm not going to reopen this for now, but want to respond to Alan's comment #43, starting with the two disputed names:

1. Myanmar / Burma:
In a majority of languages this country would be simply called something similar to "Myanmar", see CLDR Territories, and in Burmese this would simply be "ကမ္ဘာ" which probably is just "Myanmar".
However, in English, the old name "Burma" is at least as commonly used as "Myanmar", for exampe leading to the Wikipedia article bearing the lemma "Burma", see also:

Burma continues to be used in English by the governments of many countries, including the United Kingdom and Canada. The United States uses both Myanmar and Burma.

(http://en.wikipedia.org/wiki/Burma#Etymology)
So while politically we might find it slightly ugly, adding Burma as an alternative name, is just depicting reality in a fairly politically correct way, namely adding Burma only as an alternative name for Myanmar.
Note also that with Sanmyanmar, a Myanmar-based IT-company is associated Unicode member, and still I can't find a ticket against CLDR asking for removal of the "Burma" alternative name. So the policy seems to be acceptable for all.

2. Falkland/Malvinas:
This is a very complicated issue. By customary, English language usage there is probably no reason to add "Malvinas" as an alternative name, while in Spanish, Portuguese, French this is translated to "Islas Malvinas", "Ilhas Malvinas", "Îles Malouines", because that is customary.
While it might be more sensible to always include the other name in brackets, I think that's acceptable for us, and if not, we can file a ticket with CLDR to change the rule on http://cldr.unicode.org/translation/country-names

The other disputed territories are actually a question of inclusion rules. I agree that we didn't sufficiently address this aspect. This doesn't seem to be a deficiency of CLDR but of the way we use it. So certainly no reason for a rollback. However, we might have to file a followup.
I will cover this aspect separately, as I need to do some more research on how to get it right, and then will probably file a followup.

Alan D.’s picture

Sorry if the above thread was rushed, just a quickie before heading out, but diverging is going to bring up a real potpourri of issues.

1) What is a country?
The United Kingdom is a realm consisting of four countries: England, Northern Ireland, Scotland, and Wales.

The People's Republic of China (PRC) contains five autonomous regions, Guangxi, Inner Mongolia, Ningxia, Xinjiang and Tibet plus the two special administrative regions of Hong Kong and Macau.

2) Who defines a country?
Palestine / Israel
The State of Palestine has received recognition from only 132 states. Israel is not recognised as a state by 32 UN members and by the SADR
Sahrawi Arab Democratic Republic has received nearly no recognition outside of Africa.

3) Who's definition of the name should you use?
Taiwan was the example used here,
Macedonia was one of the more contested ones: http://en.wikipedia.org/wiki/Macedonia_naming_dispute

So the main point is that do we want to diverge from an international recognized standard? Guessing that everyone has gone for a fuzzy definition, using the ISO standards with the CLDR naming.

As from the ISO standard, alpha-2 code AA, QM to QZ, XA to XZ, and ZZ are user defined and should be excluded from the import, maybe this was where some of the territories appeared from?

Pancho’s picture

Good points.
Created #2036219: [policy] Inclusion criteria for CLDR territories in CountryManager::getStandardList() as a critical followup issue, so it would be great to continue discussion over there.

Pancho’s picture

Note that both trunk data and releases indeed are available online in an online repo:

Pancho’s picture

Retroactively tagging.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.

sun’s picture

Awesome to see this! Evolution++

My main goal for the previous issue was to automate the update of country codes as much as possible, disregarding the actual data being imported (whereas there was no clear data source before). In a sense, separating the concern of updating from the concern of which data source to use.

It's great to see that we've further improved the data source now. (And thanks for updating the script! :))

The more we do this in the future, the better our toolset will become. The same mechanism could e.g. be applied to the list of language codes. And who knows, perhaps we even want to move these declarations into XML, JSON, or YAML files at some point. :)

matsbla’s picture

It looks like 'AN' (Netherlands Antilles) is still inside core:
https://github.com/drupal/drupal/blob/8.0.x/core/lib/Drupal/Core/Locale/...
Should it be there? It is not a code in either CLDR or ISO (but looks like it was an ISO code before)
- 'AN' => t('Netherlands Antilles'),

'QO' (Outlying Oceania) is also in core:
https://github.com/drupal/drupal/blob/8.0.x/core/lib/Drupal/Core/Locale/...
Looks like it is used as a "subcontinent" together with 'Eastern Africa', 'Caribbean', 'Central Asia', etc:
http://www.unicode.org/cldr/charts/latest/supplemental/territory_contain...
It is listed as a "private use subtag" in 'Core Specification':
http://cldr.unicode.org/core-spec
Should this be part of the country list?
- 'QO' => t('Outlying Oceania'),

Another thing, in latest CLDR square brackets indicating alternative names have been changed to parentheses:

-      'CC' => t('Cocos [Keeling] Islands'),
+      'CC' => t('Cocos (Keeling) Islands'),
-      'MM' => t('Myanmar [Burma]'),
+      'MM' => t('Myanmar (Burma)'),

This change is mentioned in the release note for CLDR v24 under "Formatting":
http://cldr.unicode.org/index/downloads/cldr-24
I don't know with you, but I think it looks kind of nicer

In release note v27 it is mentioned "effort was made to clean up country names":
http://cldr.unicode.org/index/downloads/cldr-27
Maybe it could be a good idea to do the mentioned "automated process for getting up-to-date country lists" now?

The abbreviation 'St.' have been extensively implemented:

-      'BL' => t('Saint Barthélemy'),
+      'BL' => t('St. Barthélemy'),
-      'SH' => t('Saint Helena'),
+      'SH' => t('St. Helena'),
-      'KN' => t('Saint Kitts and Nevis'),
+      'KN' => t('St. Kitts & Nevis'),
-      'LC' => t('Saint Lucia'),
+      'LC' => t('St. Lucia'),
-      'MF' => t('Saint Martin'),
+      'MF' => t('St. Martin'),
-      'PM' => t('Saint Pierre and Miquelon'),
+      'PM' => t('St. Pierre & Miquelon'),
-      'VC' => t('Saint Vincent and the Grenadines'),
+      'VC' => t('St. Vincent & Grenadines'),

I'm not sure why but 'Sint Maarten' remains unchanged.

'and' is replaced several places with '&'

-      'WF' => t('Wallis and Futuna'),
+      'WF' => t('Wallis & Futuna'),
-      'TC' => t('Turks and Caicos Islands'),
+      'TC' => t('Turks & Caicos Islands'),
-      'TT' => t('Trinidad and Tobago'),
+      'TT' => t('Trinidad & Tobago'),
-      'SJ' => t('Svalbard and Jan Mayen'),
+      'SJ' => t('Svalbard & Jan Mayen'),
-      'GS' => t('South Georgia and the South Sandwich Islands'),
+      'GS' => t('South Georgia & South Sandwich Islands'),
-      'HM' => t('Heard Island and McDonald Islands'),
+      'HM' => t('Heard & McDonald Islands'),
-      'EA' => t('Ceuta and Melilla'),
+      'EA' => t('Ceuta & Melilla'),
-      'BA' => t('Bosnia and Herzegovina'),
+      'BA' => t('Bosnia & Herzegovina'),
-      'ST' => t('São Tomé and Príncipe'),
+      'ST' => t('São Tomé & Príncipe'),

Obs, that change also go for: St. Kitts & Nevis, St. Vincent & Grenadines + St. Pierre & Miquelon which is mentioned over.

For myself, I would also simplify these names which I find clumbsy and a little bit strenuous

-      'HK' => t('Hong Kong SAR China'),
+      'HK' => t('Hong Kong'),
-      'MO' => t('Macau SAR China'),
+      'MO' => t('Macau'),
-      'PS' => t('Palestinian Territories'),
+      'PS' => t('Palestine'),

I guess these are more sensitive issues, but I think that is much more aligned with the names people use in everyday informal language, and for almost all other names we now ignore the official version and focus on using the simple and short version, so why make exceptions for these 3 names?

bojanz’s picture

Yeah, for commerceguys/intl we have the following country filtering:

$ignoredCountries = [
    'AN', // Netherlands Antilles, no longer exists.
    'BV', 'HM', 'CP', // Uninhabited islands.
    'EU', 'QO', // European Union, Outlying Oceania. Not countries.
    'ZZ', // Unknown region
];

CLDR changed quite a lot from v24 to v28, so it's a good time to reimport the list.

All of this deserves a fresh issue.

matsbla’s picture