We're probably going to use the ¤#,##0.000 format as a template to display amounts, so we need code to parse it. [#1367630]#6 has some code, but it looks like it needs more work, since it does not take any locale text into account, apart from the thousands separator and the format.
| Comment | File | Size | Author |
|---|---|---|---|
| #31 | currency_1531058_12.patch | 50.34 KB | xano |
| #28 | currency_1531058_11.patch | 12.28 KB | xano |
| #24 | currency_1531058_10.patch | 13.38 KB | xano |
| #18 | currency_1531058_09.patch | 10.87 KB | xano |
| #17 | test._php.txt | 1.51 KB | alan d. |
Comments
Comment #1
xanoThe patch depends on #1531974: Make currencies Ctools exportables. It adds Currency::format(), which accepts an amount and a format string. The format servers as the output value and the amount is rendered into it, after which some replacements are performed for 'tokens' (currency sign, code, and number) and to clean up unused characters from the format. It was deliberately not tied to the proposed CurrencyLocale system, so it serves as an API function for custom formats as well.
Comment #2
xanoThe previous patch still contained a line of debugging code.
Comment #3
xanoImproved comments.
Comment #4
xanoCleanup, code comment love.
Comment #5
xanoFML. 04 had debugging code...
Comment #7
xanoLalalaaaa.
Comment #8
xanoShould be "prefixed".
Comment #10
xanoWe can do three things to make minor units work better:
We might also want to remove every character between redundant hashes and not just periods or commas.
Comment #11
xano- Commenting issue fixed.
- Adds a
SEPARATORtoken to formats, which indicates the separation between the major and minor parts. This means formats will now work with every currency, but they should ideally have three replaceable characters (zeroes or hashes) on the right side of the separator to accommodate currencies with three decimals. The most flexible solution would be three hashes, since the format would then work for currencies with zero, one, two, or three decimals.- Currency::format() now renders the major and minor parts individually, from the separator outward.
- Currency::format() no longer renders digits for which there are no more placeholders.
- Added tests.
See the test's assert messages to get an impression of possible formatting options.
Comment #12
xanoThis should probably done for the entire rendered amount and not just for per major/minor section, because now a major format that ends in a comma or period will keep that comma or period even if there are no decimals to render.
Comment #13
alan d. commentedIt looks like you have lost support for bracketed negatives.
Further review is difficult without a test setup. Hopefully somewhere in the old threads there is a list of all of the different formats parsed from http://www.unicode.org/ or http://cldr.unicode.org/index/downloads for direct download links of the latest cldr. Info on format is here: http://cldr.unicode.org/translation/number-patterns
Just on mute point that may not be applicable, is 1000000000 parsed as 1000000,000 or 1,000,000,000 using the format string #,##0?
When testing, an Arabic (1,00,00,00), English (1,000,000,000) and Indian (1,00,00,000) format should cover all of the bases for testing the major grouping separator, unless you are getting really keen and covering Japaneses & other Asian currencies.
Comment #14
xanoAlan, thanks a lot for the review and the link to the Unicode number patterns documentation!
About the latest patch
I added support for negatives, but it's limited to a minus sign. Bracketed negatives don't really seem to exist. According to the docs there should be an entirely separate format for negatives, brackets or not.
It will be parsed as 0,000. The reason for that is that I didn't want to assume how groupings worked; I believe I saw some formats that used alternating groups of 2 and 3 digits. The idea was to force formats to have enough placeholders to accommodate the largest possible amount.
About the docs
Does this mean adding a minus sign for negative amounts is the default behavior? If so, how should the sign be added?
In short, if we go with this formatting, I suggest we implement the currency general purpose format, and, at least to start with, without support for short numbers. To this, we add a currency code placeholder. However, the standard looks a bit fuzzy and open to implementation.
Comment #15
alan d. commented"Bracketed negatives don't really seem to exist"
This is a common requirement in many systems especially in the commerce sector, and is seen in many of the cldr formats. I have lost the scripts I used last year to extract unique currency formats, but from memory I pulled out about 2 or 3 formats from the XML files that used brackets instead of a minus character. I found no other minus markers when parsing the files and no formats that used brackets for anything else, but if you want to handle two format strings as per the docs, no objections from me!
"The idea was to force formats to have enough placeholders to accommodate the largest possible amount"
The original code did do this (though maybe not i18n safe) with the minimal format strings that boiled down to one of "#,##0", "#,#0" or "#,##,##0", although for fun you could add any possible combo like "#,##,###,#,#0" and it would produce "12,34,56,78,90,12,34,567,8,90". Basically just parsed the major component from the right until there were no more patterns given and the last pattern size is used for the rest. More friendly for site admins.
Re #1 - This should only be the symbol (which may be the code), so I like the idea of either manual tokens (XXX,999) or if you want to support tokens proper, token strings like [currency:iso3] or [currency:num3] could be better.
Re #2 - This is for number abbreviations from memory so can be ignored afaick
Re #3 - Like preferred the single format with - ( or ) and just str_replacing them out, but your call if you want to handle this, see first comment above :)
Re #4 - According to the docs, the full stop is the special placeholder that is replaced by the minor separator and the comma is the major separator.
Comment #16
xanoI was just saying in terms of formatting, there is no such concept as bracketed negatives. They're just negatives, and the system couldn't care less how exactly they're formatted.
2) Not sure what you mean with that. Escaping is used in any context that uses characters with a special meaning, so it applies to currency formatting as well, because what if you want to display a literal comma and not use it as a decimal separator placeholder?
3) I'm not sure what you mean here either. Is there a default behavior when there is no format for negative amounts, or isn't there?
4) Right. I understood that and meant that my solution seems more flexible to me. I haven't been able to think of a situation in which the system wants to know what characters are used as separators. The unicode standard needs to know them for replacement, by my approach worked fine without replacements, so I don't see the added value.
//Edit: are you on IRC? It might be easier discussing a few things there than here in the queue :)
Comment #17
alan d. commentedFormat? Maybe you are talking about something completely different. In regards to formatting, open MS Excel, click a cell, select currency type, and two of the four options are for bracketed negatives. Screenshot.
#2 was in relation to:
Do whatever works for you!
After parsing every XML file last year, all formats were limited to just 9 characters: the currency symbol, hash, zero, comma, dot, dash, space and brackets and personally I would only support these simple strings myself, and maybe handle this via a theme function to allow users to markup elements. But your way does eliminate the need to know about separators.
#3 The "default" is to prefix the format with a dash.
#4 If it works, all good.
For fun I reparsed the XML files and here are the results:
And non-specified negative formats
Just 33 all up :)
Download the latest cldr zip and run the test.php (after renaming it) to set a full run down of these.
Per primary locale, en, es and ar were the most diverse:
Comment #18
xanoI know bracketing is a form of negative amount notation, but from the system's point of view, it doesn't matter how a negative is rendered. It should take care of supporting different locales, but not care about how those locales work. Just a bit of miscommunication :)
2) I opted to just go with the CLDR guidelines for consistency's sake.
3) Good, thanks! Would you have a link to a document that says that, so we can put it in our own documentation?
4) Like 2, I went with the CLDR guidelines here as well. This allows us to parse the CLDR XML files and generate format configurations based on the locale data they contains.
Why do those locales have more than two patterns? http://cldr.unicode.org/translation/number-patterns doesn't mention this.
The attached patch tries to conform to CLDR as much as possible. (Possible) differences:
- Two custom [XXX] and [999] tokens for the currency code and number.
- Every pattern is required to consist of a positive and negative fragment, separated by a semicolon. There is no default negative pattern yet.
- Every pattern fragment is required to have a period for a decimal separator.
- Every pattern is required to have a 00 decimal placeholder anywhere on the right side of the decimal separator.
Comment #19
alan d. commented3) Last paragraph in the first table:
4) Yep, this is a collective for multiple locales, en_US.xml = English USA, etc
Sorry, haven't looked at the code but seems strange to have a minor placeholder "00" for currencies like the Laos Kips where 1,000,000 LAK is only 125 USD.
I wonder if the formatter shouldn't be part of the code, something like this maybe:
Which outputs the stock standard UNICODE formats, and if someone wants to fancy formats, you could extend the class to supply whatever you want. The above two lines could be wrapped into a factory method that initiates a globally defined formatter in case someone wants something special, falling back to the default CurrencyFormatter if the class isn't found.
I'm thinking about integration with external systems / third party modules, and supplying anything other than a raw formatted code would probably break the integration as these wouldn't expect HTML.
Not on IRC but skype or gtalk are options if you want to chat
Comment #20
alan d. commentedD'uh thats sort of what you are doing.
regexp are expensive, string functions are much faster, again boils down to do you want to support complex strings.
The two pattern strings:
Comment #21
alan d. commentedI don't think you can use this.
For example:
outputs
So cast the comparison result if you really need an int value:
Comment #22
xanoFrom http://cldr.unicode.org/translation/number-patterns (first row of the second table):
I interpreted this as keeping two zeroes, which will be replaced by however many decimals the used currency has.
The idea is to let every piece of code do its own thing. This issue is literally just about parsing CLDR number patterns. Locale delegation, which will try to find the correct pattern (either stock Unicode CLDR or custom/overridden) based on available locale parameters, will be added in another issue.
Also, parsing a pattern requires a currency, but the pattern itself does not require a specific currency (e.g. should work with any currency), so passing on a Currency to CurrencyFormat for other reasons than parsing a single pattern only limits CurrencyFormat IMHO.
I just sent you an email through your contact form with my Skype handle.
Comment #23
xanoThis won't work, because it will split on escaped semicolons as well.
Good catch. Your example isn't quite right, because amounts will always be integers anyway, but the casting is indeed wrong.
Comment #24
xanoThis patch adds support for default patterns for negative amounts, and fixes the type casting issue raised in #21. It also updates tests pattern formatting using every currency available in the system that has a minor unit set (and for this, it adds minor units to some currencies).
Comment #25
xanoFor some reason I didn't find this during my first search for existing code, but apparently PHP *can* parse CLDR number patterns using its NumberFormatter class. I haven't checked closely yet, but we can at least make it use custom patterns, and I believe we can even set custom replacement characters (meaning PHP will let us override the defaults it finds in the locale data it has).
We will also need to find out how common this extension is. Based on a quick search it seems most operating systems don't ship with ICU (a requirement for the Intl extension), so using the extension might limit Currency to people managing their server OS themselves.
Comment #26
xanoA few recent findings/realizations:
(quote from http://site.icu-project.org/design/number-parsing)
Comment #27
xanoI took the liberty of putting this on Github already: https://github.com/bartfeenstra/cldr. It's released under the MIT, so we can easily commit the code to the d.o repo.
The code is a massive rework of the patches posted here. It does not only support currencies, but every number format described by CLDR, except scientific notation (currency amounts, decimals, percentages), and it even does integers. Supporting integers actually makes testing and maintenance easier. It has PHPUnit tests and Composer integration.
Comment #28
xanoThe attached patch imports bartfeenstra/cldr using Composer. This ensures that people using Currency don't have to install Composer themselves. To update the code (whenever bartfeenstra/cldr is updated), simply run
composer updateand commit the changes.Comment #30
xanoErr, where did /vendor/bartfeenstra/* go?
Comment #31
xanoComment #32
xanoComment #33
amateescu commentedCommitted and pushed to 7.x-2.x. Thanks for all the hard work on this :)
http://drupalcode.org/project/currency.git/commit/8cee081
Comment #34
xanoSee #1868952: Update bartfeenstra/cldr.
Comment #36
xano