This issue is created in the hopes of opening up discussion on the implementation of a locale [regional] formats framework that could be used to collect and automatically format/convert non textual data such as measurements, dates, telephone numbers, etc. There are a number of modules that extend Drupal's multilingual capabilities but they function independently and often have limited implementation. Some of the standard formats that could be provided include (2 samples provided for brevity, there could be many):

  • Number formats: 1,000,000.00 vs. 1000000,00
  • Date Formats: MM/DD/YYYY vs. DD/MM/YYYY
  • Timezone
  • Units of measure:
    • Length: cm vs. inches
    • Dry mass: kilograms vs. pounds
    • Liquid volume: litres vs pints
    • many other units
  • Currencies: UK Pound vs. US Dollar
  • Telephone formats: (999) 999-9999
  • Dialing country codes: +011 44 for England
  • Postal codes: 99999-9999

I see this as a basic framework provided by core that can be extended by contrib modules. Core could be responsible for detecting and providing the end users locale in the form of a country code. This could be obtained from browser info, stored user profile preferences, and site defaults, maybe IP detection etc. Contrib modules could extend the locale info by providing formats for specific data stored in fields. e.g. A locale_date module would provide date formats used around the world keyed by country code. It would also provide a date field that uses the site default date format or user selectable date format for collecting the data and storing it. Finally the module would provide formatters that would detect end user country code and format the date appropriately.

Existing modules of interest

  • Format Number API provides a method to configure number formats (site default and user defined) with configurable decimal point and thousand separators. Currently only 6.x. Could be used to convert stored decimal/float values into locale based formatted numbers.
  • Formatted Number CCK provides cck input of numeric types where thousands separator and decimal point are inherited from the Format Number API module. Could be used to collect locale based formatted numbers and store them as standard decimal/float values
  • Measured Value Field provides CCK input of decimal/float values together with a unit of measure such as length -> cm. It also provides formatters that can convert the value to another unit and display it with the unit symbol or name.
  • Calendar Systems provides support for different calendar systems like Iranian , Jalali , Hijir , Hebrew etc. This support is currently limited to display and data entry for date fields and the back-end date is always Gregorian (Timestamp). 6.x & 7.x!
  • Country code Provide location-appropriate path and content handling based on user's country. 6.x - project halted due to core limitations?
  • Units API Converts between various weights and measurements using the International System of Units (SI).

Related posts

Documentation

Formatting numbers, dates/times, sizes and intervals with localization

Possible resources

Unicode

Unicode Common Locale Data Repository (CLDR) seems to have all the locale data we could want. Probably too much. Their Terms of Use seem to allow free use with allowances for modification so long as original copyright notices are included.

Here's a sample of some of the data for en_US:

<ldml>
  <identity>
    <language type="es"/>
    <territory type="US"/>
  </identity>
  <dates>
    <calendars>
      <calendar type="gregorian">
        <dateFormats>
          <dateFormatLength type="short">
            <dateFormat>
              <pattern>M/d/yy</pattern>
            </dateFormat>
          </dateFormatLength>
        </dateFormats>
        <timeFormats>
          <timeFormatLength type="full">
            <timeFormat>
              <pattern>h:mm:ss a zzzz</pattern>
            </timeFormat>
          </timeFormatLength>
        </timeFormats>
      </calendar>
    </calendars>
  </dates>
  <numbers>
    <currencies>
      <currency type="USD">
        <symbol>$</symbol>
      </currency>
    </currencies>
  </numbers>
</ldml>

PHP 5.3+ Internationalization Functions

Pretty much all of this is useful but here are some quick call outs to useful functions.

The floor is yours

What would you like to see in framework like this?
Have thoughts about how this should be implemented?
Any pitfalls to look out for?

Comments

Issue tags:+D8MI

Adding D8MI for tracking.

Cool summary. For now, the numeric field has format options but for multilingual sites it should be language depended like date formats and not field depended. Note that there are other numeral systems used in some languages: (arabic numbers: ٠١٢٣٤٥٦٧٨٩) as included in CLDR.

Right now we have two files for localisation:
standard.inc and date.inc
As I read in #1231402: Drupal does not use ISO language codes, iso.inc is misleading it looks like this will become the standard file for locale standards? Should we add all the localisation there?

An alternative method could be to use localise.drupal.org and the translation system to maintain localisation with tokens. For example, we could add strings like "[currency symbol]" that can be translated to ''€" and "[digits]" to "٠١٢٣٤٥٦٧٨٩". I know this is a wild idea. But doing so, localisation can be done by the localisation teams and we only need to build the functionality in core.

Another option is to use the built in php locale system for localisation. For date formats, Drupal decided not to use php locale date formatting (don't remember the exact reason).

Issue summary:View changes

+issue grouping of Indian numbers

Issue summary:View changes

.

Issue summary:View changes

lists

Issue summary:View changes

ordered list numbers

Issue summary:View changes

arabic numbers

Issue summary:View changes

Change default English numbers to Arabic numbers

Issue summary:View changes

how do I translate the numbers?

Issue summary:View changes

Documentation

Issue summary:View changes

date translation

Issue summary:View changes

latin numbers

Issue summary:View changes

+measures

Issue summary:View changes

bengaii numeral system

@tinker: thanks for the heads up by mail. :)

Re: "...the numeric field has format options but for multilingual sites it should be language depended like date formats and not field depended..."

hmm... I think pairing number formats with languages is not accurate. There are languages spoken in different countries, countries that speak more than one language.

Probably, number format is more close to the country than to the language, but at least until D6, sites have been related to languages, not countries. We have user preferred language, but not user locations.

Also, "number formatting" is related to "money formatting", but the later needs to know about the currency, and that is not related to languages either.

Then, it is also important to think about form elements and user input. Do people need to read numbers in a format, but write them in another format in order to make computations?

So this is not as simple as it seems.

Number digits (0123456789, ٠١٢٣٤٥٦٧٨٩ etc) are language dependent, notations more or less country dependent. See http://unicode.org/repos/cldr-tmp/trunk/diff/by_type/number.symbol.html

For websites we typically have these configurations:
1. monolingual sites
2. multilingual sites
3. country dependent multilingual sites (http://www.apple.com/choose-your-country/ and http://www.microsoft.com/en-us/default.aspx?bldi=2-0)

Normal multilingual sites (2) have typically one number notation for each language, that could work the same as the localised date format system as introduced in D7.

For country dependent multilingual sites we could add a locale variable in the system, or add the locale information to the language variable (es_ES, es_PE etc). See for a locale list of languages combined with countries: http://lh.2xlibre.net/locales/

Input elements need some attention as well, that's true. Numeric data should be normalised before saving (١٫٢ -> 1.2). Converting of units (on input and/or output) is another issue, and could be done by contributed modules.

For Drupal 8, we can require the intl extension (which is bundle in PHP since PHP 5.3). I would strongly discourage building our own localization framework.

@Damien If we can rely on that extension, that would be wonderful. It solves many formatting issues and has all the data. What are the thoughts on having that as an requirement? Or can we simply start using it?

I wrote Format Number API because there was no valid alternative at the moment, but if there is one today, I would vote for it. If that is this i18n extension, then good.

Format Number API also provides client side support for data entry, though. Thats's also a basic feature to cover here, I think.

Issue summary:View changes

.

I'm developing a D7 site which will require locale-specific (language+country) default settings for:
* Number (decimal/separator)
* Date (short/long formats)
* Timezone (offset from UTC)
* Units of measure (length, weight)
* Language

As well... allow each user to select a locale, and possible override the formats on a per-user basis.

I'm coming from the .NET world. This is all very robust, free and easily implemented based on selection of a locale. It just "works". A few gaps I've noticed that I will have to personally fill on my project:

* During site install, selection should specify locale (language + country). It seems the current paradigm is to pick a language, and then optionally a country. But it's not clear that these are related - they're on different pages during install. (Are they related? I don't know). An interesting side node: Windows 7 Regional settings no longer allow a user to select an invariant language - ALL the languages have specific country in parenthesis; eg - English(United States). I'll probably implement this with a custom install profile and hooking the forms. Behind the scenes I'll set site language and country.

* Similarly I'll be changing the user profile to allow selection of this combined (language+country) locale. I expect this is very custom, so I'll roll my own.

* For my scenario it doesn't make sense that location would be "unspecified" either for site or user. I wonder if it would be a good general rule to just require country instead of making it optional. This simplifies A LOT. Forcing a selection of lang+country in a single select element solves this problem nicely, and avoids a possible touchy situation of prefilling country based on language (there are 200 million Brazilians that speak Portuguese for example, nuff said :)

* On a "standard" install, if for example I select language: Deutsch - the timezone is still set to "America/New York". I understand this is coming from PHP, but maybe there is a better logic to use. I'm sorta meh about this, not a big deal.

* Another meh the OP mentions - automatic detection of locale. This is nice for new user registration, but probably not needed for site install. And I appreciate it's a "complex" problem.

* On a "standard" install profile, whether I choose English or Deutsch, the time formats are MM/DD/YYY - this is just plain wrong. Similarly, there is a single entry in the date_format_locale table for pt-pt... Portugal again? What is this? I have no idea. What would I expect? Two things: 1) date formats for EVERY locale (lang+country) baked into the database on install, 2) Pick the correct default formats based on locale.

* Where are the short time and long time date formats ?!? Seems like a pretty common use case to just show a time of day.

* I've had to hack a lot of changes when interacting with date module (date_popup) to deal with locale. Notably: language, date format, start of week. My scenario is special since I provide user-based settings for these that date doesn't know about. But still - for site locale it should just work "out of the box". And it doesn't.

* For measurement (distance/weight), default units based on locale is would be VERY nice. In my software, I allow a user to select "metric" and "us" - but the reality is much more complex. For example, in the UK they have kilometers (distance), kilogram (weight) and stones (human weight), and you still order a pint. I've had mixed customer feedback on kcal vs kJ - different preferences in the same country. So while a locale default is good, ultimately in practice you may want to allow user override.

Incidentally the translation of "weight" changes based on whether it is non-human or human in some languages. So there is a tricky bit too.

Issue summary:View changes

+Units API

Thanks to all that have posted. All very good input. I have checked out the Internationalization functions provided by PHP 5.3+ and think they will be very useful. I have edited the first post adding links to reference docs. Please note that this only handles display formatting and does not appear to do conversion of measured values such as length or currency.

I agree with markus_petrux that most date and numeric formatting is more country specific than language but both are required to format accurately. I think measured values need to be field dependent, stored in a unified format, and displayed in a converted format. e.g. Length field with drop down of available units (cm, m, in etc), converted to a specified default unit (cm) and stored in database, then rendered depending on user locale. This brings about an issue in that PHP Internationalization works server side but we would probably need client side input validation based on locale provided by JavaScript.

Another concern is locale URLs, caching, and SEO. Do we add a locale code to the URL (as done for language) or store it in session? Are pages cached on a per locale basis? How would search engines treat content that is in the same language but has different locale rendering of values (dupe content)?

interesting discussion and thanks for your input @aaronaverill! about aliasing: I think we should add this in the alias to handle issues like linking etc as websites may provide different information on a country base. I can imagine a complex international site might have certain products, services, contact information and legal information in one country and not in another, so we might end up with the same negotation mechanism as the language system. Another way for localisation is to use the countrybased toplevel domain (google.nl, google,be).

In the language and date system we have for some languages already country-specific information (pt-pt, pt-br, en-us, en-uk etc). Could we make country and languages dependent on eachother as @aaronaverill described? Otherwise we get aliases like drupal.org/en-uk/uk/node/1

So a rule could be: if there is only one country for a language enabled, don't add the country code (drupal.org/en), if there are multiple countries enabled for that language (uk and us), add a country specific code (drupal.org/en-uk)?
To get the language from the url, Drupal could look first if there is a country specific language enabled, and otherwise use the generic language version.

Are there already websites built with Drupal dealing with country-specific mechanisms? I did a quickscan thourgh the list of intergovernmental organisations using Drupal (http://groups.drupal.org/node/79093), but so far this seems purely language based. So far, most organisations build a specific site for each country (for example US Aid)

PHP5.3 extension: how is the procedure for Drupal8 to require the intl extension? Could we just simply assume it is always enabled? Can drupal8 require it? Should we provide a fallback?

In the week since I posted this, one mess I ran into is that the date format string returned from the IntlDateFormatter.getPattern() method doesn't coincide with PHP date(). Thus, you can't simply get this format string for a user's locale and pass it to, for example, theme_date_popup[#date_format] (or really any Drupal method).

I've not written a translation method yet since my regexp is rusty and... well... it's not a thrilling task.

For reference in case anyone wants to tackle this...(hint hint)
IntlDateFormatter: http://userguide.icu-project.org/formatparse/datetime
jQuery datepickter: http://docs.jquery.com/UI/Datepicker/formatDate

Well, moving to IntlDateFormatter for date handling in Drupal has certainly a lot of implications for all date and timezone interfaces, and need migration paths. If we want to start simple, we could start with IntlNumberFormatter?

I know we are talking about languages, number systems, values, units and countries here - but perhaps we need to just take a little step back and look at this like an array of culture where the culture array allows you to mix and match whatever you want. As long as the person doing the data entry (user, programmer, whatever) sets an array, then it is merely a programmatic issue of translating culture{"en", "metric", 1000, "mile", "US"} into the appropriate value in American English -> 1,000 miles or Thüringian German -> übelst weit. Haha. But seriously, if it is getting too complicated, abstract it a bit.

Issue tags:-D8MI+D8MI-meta

Move to new meta tag.

I have been making progress on coding locale based modules. They are still is a state of flux but as soon as I lock down structure and implement better safety checks I will upload sandbox projects. The modules I have working in D7 (I have immediate need) are:

  • Measurement Units - Provides standard units (82 defined with more coming), conversion between units, and locale lookup -> measurement systems -> units. TODO: Add custom units, test with currency conversion.
  • Number Formatter - Using PHP NumberFormatter class. Provides field formatter, locale lookup, and API for other modules. Uses only enabled locale language. TODO: complete custom override of formatting, add pattern based formats.
  • Converted Unit Field - Works much like D6 MVF module but instead of storing unit and value it converts the value to a specified unit. So user has selection of units to choose from, such as length: centimeter, meter, inch, foot, and all inputs are stored as selected unit meter. Then using locale lookup from session, user profile, best guess etc the display is converted to a locale unit and number formatted. e.g. user inputs 1000 meters, value stored as 1 km, and displayed as "0.623 miles" to USA web viewer.

Still working on:

  • Locale Country - Allows user to save their country, ties in with GeoIP country lookup, defines default country to language (as a best guess resort), contains locale based lookup info per country. This will work in a similar manner to language detection as provide by locale module.

Hi all, wondering if there is any progress in this, and if we can have some internationalisation of the number format in Drupal 8 before code freeze. Note that now we have Symfony in core, we can probably use the locale component of Symfony, so we are sure we have a intl component we can rely on (see #5 and #10).

I have continued to work on various modules that provide inputs, conversion, and formatting of locale based data but the main issue (which I have not attempted) is to have a minimum of language+country locale code being passed by core just like $langcode currently is.

Don't know who I should speak to about this or whether it's even possible at this point.

As part of my my rewrite for Currency I developed bartfeenstra/cldr, which is a library designed to parse Unicode CLDR number patterns. At the moment it is only capable of formatting currency patterns (with a few other possibilities), but support for the other formats (percentages, exponents) can be added with relative ease. The main different between this library and PHP's Internationalization extension, is that the library allows and ignores characters in patterns that are not part of the Unicode guidelines, thereby allowing people to use HTML, for instance.

Apart from that, my main concern with localization in Drupal is that Drupal assumes that language == locale. Locales, as correctly stated earlier in this issue, as so much more than just languages. A language is just half of a locale's identifier (the other being a country). Locale delegation therefore encompasses language and country negotation. There is also no such concept as "Default locale configuration". There are locales, and there is one locale that should be used as a fallback. Core and contrib can then allow users to override the locale to use for specific applications. Say I'm a Dutchman visiting the USA. This means that the locale negotiation API will set a global en_US locale, based on environment variables. However, I am unfamiliar with the American date and number notations, so in my profile I tell Drupal to use the nl_NL locale for dates and numbers. I do stick to en_US units of measurement, such as speed (miles/h rather than km/h) and temperature (degrees F, rather than C), because that makes it easier to compare these things when communicating with the locals out there.

On a "standard" install profile, whether I choose English or Deutsch, the time formats are MM/DD/YYY - this is just plain wrong.

Is it? I use all my software in US English, but I completely detest the American way of writing dates, so I stick to DD/MM/YYYY, which is the format I grew up with.

So a rule could be: if there is only one country for a language enabled, don't add the country code (drupal.org/en), if there are multiple countries enabled for that language (uk and us), add a country specific code (drupal.org/en-uk)?
To get the language from the url, Drupal could look first if there is a country specific language enabled, and otherwise use the generic language version.

I wouldn't make this too dynamic. The last thing you want is that all your URLs change when you add another country. There's no shame in having a full locale identifier in your URLs.
Also, I'm reluctant towards enabling and disabling languages and countries. We can disable translations, but we should not disallow users from selecting a particular locale, as hat information can be important to more than languages, such as the aforementioned currencies and units of measurement.

I think the most basic core question is how would you define locales and their relations to languages both on the UI and in the backend system. This might depend on whether we use PHP http://php.net/manual/en/book.intl.php or not. intl docs don't seem to provide guidance as to where would locale identifiers be sourced from.

Isn't it documented in Symfony locale? Here you'll find all the combinations of locales with languages (imported from the ICU-list).
One of the implications using ICU could be that the country is always a class of a language. So you can set a site to 'en_UK', 'en_US' or 'en', but a configuration as described above as en_NL is not possible.

So that means any content targeting for regions that is not in the language for the region would not be done with this system, but yet another one?

#20 I'm working on a site (D7) right now that uses en_NL, so it has to be possible to make such combinations. If SF doesn't allow it, we'll need to extend it or use something else.

Symfony is based on the locales in the ICU project http://site.icu-project.org/. The dataset of ICU is used by Adobe, IBM, Apple, Google for internationalisation. Not by Microsoft, but they use the same mechanism for culture as a limited list of language+country combinations, so its pretty safe to use ICU, and the implementation in Sympfony as a base for Drupal.
As far as I know, ICU requires the language, with optional the country, and optional a variant. The hierarchy is always required, as codes can have different meanings ('be' means belarusian as language code, and belgium as language code).
Localisation functions works with all of the official combinations. I am not sure what happens with unusal combinations, but I think it falls back to the language component as that is the highest in the hierarchy, in your case 'en'. I am not sure if unusual combinations will help us. For example, for en_IR, that becomes problematic as that would mean using the Iranian date calendar system and format, but the language of the date calender should be in english. This becomes even more problematic for combinations like zh_IR.
Why would you want to use 'en-NL' for the user interface?

Awesome issue! I'm surprised I did not locate it myself or was not pointed to it earlier. Thanx people for sahring ideas and trying to get this done the right way!!

...still are we doing any of this in D8? :/

...an issue related to calendar systems and dates: #1811912: Add pluggable calendar backend to core and centralize date translation

Also, I'm almost 100% sure I've mentioned someplace before the need to "tie" Country selection to specific timezone(s) during installation (especially for small countries that only have a single timezone), but I cannot find the damn issue :/

...could only spot this one: #1942838: Default country by user Time zone

Edit: See post below...

Issue summary:View changes

Added PHP Internationalization links

Issue summary:View changes

...re-ordered related issue chronologically and added a few more

I'm not sure there is any chance of this happening anymore in Drupal 8 core. It will need to be in contrib for Drupal 8 I think. Unless someone else sees some great solution there...

This issue deals with exactly what we call “localization”.

Any overall progress?