This is an issue that was split from #1068840: core/includes/standard.inc contains inaccurate country data.
There are some of the opinions taken from the previous thread.
- To rename it in standard_country_list() with no update / defer the update
- To remove it from standard_country_list() with no update.
- To remove it from standard_country_list() and write an update script to handle the deletion that could run either:
- During a major release
- During a minor release
Also, the logic for some updates are trivial, like a code change "XA is now XB" or a country merge, "XA and XB are now XC". However, it is sometimes impossible to handle this without user import, as the country split like in the case of Netherlands Antilles.
The Netherlands Antilles was dissolved on 10 October 2010, Curaçao and Sint Maarten became two new constituent states with their own ISO codes. Bonaire, Sint Eustatius and Saba became special municipalities listed with a singular ISO code. So we get "AN splits into BQ, CW and SX". However, when Drupal 7 was released, the country list used AN and CW and excluded BQ and SX.
Country handling options when doing updates include:
- Leave the country in the list indefinitely.
- Leave and rename the country to mark it as removed, eg: "Netherlands Antilles (historical)"
- Update the country to point to the most populated state and allow users to update manually as required.
- Introduce a new element using an user-assigned code (the series of letters AA, QM to QZ, XA to XZ, and ZZ) that represents an unknown country and update the entry to this. ZZ is used by the Unicode Common Locale Data Repository for "Unknown or Invalid Territory" and this would probably be the best code choice.
- Provide a more in-depth update (somehow)
Comments
Comment #1
XanoThe problem with ISO 3166 is that using it will invalidate data that was perfectly valid in the past, once a country gets removed from the list. It's as if ISO believes that if a country no longer exists, nobody should care about it anymore.
No. The country's name does not include the word "historical". This is metadata and should be handled as such.
Comment #2
XanoDuring a quick search on the removal of countries from ISO 3166 I found the following explanations (source):
Apparently, this is where ISO 3166-3 comes in, which has four-letter country codes. This means that we can actually update old country data, but we need to find a way to do this programmatically.
Comment #3
sunPerhaps a silly idea, but it would definitely be KISS:
Why don't we simply add an additional array key along the lines of
'hidden' => TRUE
for each country that no longer exists?In other words:
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commented@sun: you are assuming that country codes are not reused. I don't know if it is true.
Comment #5
sunTrue. I don't know that either.
OTOH, that slightly sounds like a separate issue to me (for which there might not be any kind of solution at all, unless we'd implement a full-blown CRUD API + hooks for managing country data, so derived values that are being stored elsewhere could be updated accordingly, which would be quite a monumental stretch for something as simple as this ISO data).
I think our main focus here should be to "get rid" of the obsolete elements for new sites, while still retaining the data for backwards-compatibility.
Comment #6
XanoCountry codes can be reused. I don't know if this actually happened before, but the specs say that if countries cease to exist, their codes are altered from 2 letters to 4, and the old 2-letter codes are available to be reused in the future.
Comment #7
JohnAlbinWe are using CLDR instead of ISO for our country list now. See #1938892: Switch from ISO-3166-1 country data to CLDR unicode data
Comment #8
PanchoI know that this is only part of the problem, but let's see what data CLDR has for deprecated country codes (exemplary list):
There are cases where the country code simply changed for various reasons:
Then there are countries that dissolved with more than one successor, a list of successors is given, sorted by population:
Note in the case of YU -> CS -> RS/ME, that always the most current successors are given.
Then there are countries that merged to a completely new country, self-explaining:
And countries that merged into another country (code):
More complicated is the case of a split, where one of the countries keeps the former country code.
Would be the case with Czechoslovakia (CZ) that split into Czech Republic (CZ) and Slovakia (SK).
Another case would be pre-1992 Federal Republic of Yugoslavia that lost some of its republics in 1992, before it only later completely dissolved.
For this we need the numeric territory code of the former country:
Note also that with the numeric code data we can continue tracking the territory of former West Germany (= the former territory of the "DE" country code). Why would we want to throw away data that we have? Core might not need the distinction anymore, but contrib might require it.
Now, while territoryAlias data resides in supplementalData.xml, the full list of numerical codes resides in supplementalMetadata.xml.
See some exemplary entries:
So if you combine this with the replacement dataset, you can see that DE (280) & DD (278) merged into DE (276).
And you can see that CZ (200) split into CZ (203) and SK (703).
Conclusion:
So actually we have everything we need we can update our countries list at any time without losing data (Netherlands Antilles => /dev/null), without falsifying data (Czechoslovakia => Czech Republic) and without diminishing data (West Germany => post-1990-Germany).
However, we need to store the 3-digit ISO 3166-1 numeric code that will always change when the territory changes and will never be reassigned.
Comment #9
Alan D. CreditAttribution: Alan D. commentedDoesn't dropping the upgrade path make this issue obsolete?
Comment #17
Liam MorlandA key question that needs to be answered is what happens with CountryManager::getStandardList(). Should codes be removed from this or not?
standard_country_list() is obsolete. Are there any functions besides getStandardList() which are related to this issue?
Comment #19
AnybodyUpdating this to 9.3.x as the issue is still present in Drupal 9. I guess for country codes like "cz" (Czech), see above, we could rate this as bug. But I'm not the right one to decide that.
Concrete example: We have a customer in Czech republic who is quite unhappy with the old "CS" (Display) / "cs" (language shorthand) still being used and displayed. Drupal makes a bad impression here in those countries.
@Alan D. documented the situation for Czech very well in this contries module issue: #3117578: Czechia, Slovakia and North Macedonia
Comment #20
AnybodyComment #21
Anybody