Problem/Motivation
Using eval is bad. Bad for performance and bad for PHP source code compiler. Eval is used to proces the Gettext plural formula. This processing should therefore be removed and replaced by an alternative.
Proposed resolution
Replace the plural formula parser by a script that evaluates the Gettext plural formula and stores the result as an array of plural formula data. This data is used by both a PHP function (locale_get_plural) and a JavaScript function (Drupal.formatPlural) to determine which plural formula should be used for which (@count) value in the string.
Some references:
- #3 : http://translate.sourceforge.net/wiki/l10n/pluralforms
- Mentioned on source forge : http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
- Uses a different approach https://developer.mozilla.org/en/Localization_and_Plurals
Remaining tasks
- Fixed: - Make Drupal JS use the new plural "formula"
- Fixed: #2182265: Drupal.t() placeholder substitution doesn't always work correctly (backport part of the Javascript file translation fixes from Drupal 8)
- Fixed: #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages
How to test manually
Since the Drupal test scripts can not test JavaScript yes, the JavaScript translation can not be tested automatically. Use the steps below to test.
Although the Drupal.formatPlural() is called only twice in Drupal core, it is not easy to trigger those occurrences manually. Therefore this script is made.
(core/modules/comment/js/node-new-comments-link.js
, core/modules/contextual/js/contextual.toolbar.js
)
- Use the latest D8 and apply the latest patch of this issue.
- Enable the interface translation module.
- Download Serbian translation from http://ftp.drupal.org/files/translations/8.x/drupal/drupal-8.0-alpha10.s...
- Import the downloaded PO file at admin/config/regional/translate/import. This will add the Serbian language too.
- Add this two line to the /core/themes/bartik/bartik.info.yml file
scripts: - js/bartik.format_plural_tester.js
- Create the file "core/themes/bartik/js/bartik.format_plural_tester.js" with the following content:
Drupal.behaviors.bartikFormatPluralTester = { attach: function (context, settings) { // We make sure the source string is included in the JS translations file. // The string will be detected and included in the JS translation file. Drupal.t('1 hour'); var args = {}, options = {}, t = ''; t = Drupal.formatPlural(1, 'Some untranslated singular text.', 'Some untranslated singular text.', {}, {}); console.log(t); t = Drupal.formatPlural(2, 'Some untranslated plural text.', 'Some untranslated plural text.', {}, {}); console.log(t); for (var i = 1; i < 120; i++) { t = Drupal.formatPlural(i, '1 hour', '@count hours', args, options); console.log(t); } } };
- Open the console window of your browser
- Visit the front page with English interface language. The output should be:
Some untranslated plural text. Some untranslated plural text. 1 hours 2 hours 3 hours 4 hours 5 hours ...
- Visit the front page with Serbian interface language (http://example.com/sr). (Except for the first two lines) the console output should be the translated to Serbian.
Output should look like:Some untranslated singular text. Some untranslated plural text. 1 час 2 часа 3 часа 4 часа 5[2] часова 6[2] часова ... 20[2] часова 21 час 22 часа 23 часа 24 часа 25[2] часова 26[2] часова ... 30[2] часова 31 час 32 часа 33 часа ...
Of course you can use any other language than the Serbian language.
Original report by chx
That's seriously uncool. I will write more tests.
Comment | File | Size | Author |
---|---|---|---|
#85 | the_last_eval-1273968-85.patch | 12.81 KB | Sweetchuck |
#80 | the_last_eval-1273968-80-interdiff-75-78.txt | 4.57 KB | Sweetchuck |
#80 | the_last_eval-1273968-80-78-clean.patch | 13.6 KB | Sweetchuck |
#80 | the_last_eval-1273968-80-75-reroll.patch | 13.31 KB | Sweetchuck |
#78 | the_last_eval-1273968-78.patch | 22.74 KB | Sweetchuck |
Comments
Comment #2
chx CreditAttribution: chx commentedBecause Gabor asked (although a comment in there says so) the way we get rid of the eval is to pre-compute plurals (re-using the already existing stack validation code structure). How many? As far as I can see, only the last two digits used, ever. So we could precompute 0-99 only and use that? No, because the first few are typically exceptions. So we precompute 0-199 instead.
Later we can do various optimizations: recognizing the default, recognizing 1, recognizing if formula(N + 100) equals to formula(N) and store / retrieve only formula(N) (is 123 stored? no? let's use 23). Once all that's done, we only store about 100 numbers and even that for a relatively few languages.
Comment #3
Gábor HojtsyA short summary of some things we talked about with @chx earlier:
- eval() is evil
- locale module is one of the very few places in core still using eval(), PHP.module is another one (so unlike the patch name, we are not eliminating the last one)
- to avoid using eval, @chx proposes we precompute a lookup table of results based on the formula
- @chx looked at the formulas in l10n_pconfig module and the plural form wiki at http://translate.sourceforge.net/wiki/l10n/pluralforms and concluded that we can keep most lookup tables pretty small (if you look at the formulas they repeat at most after 100, but most repeat patterns below that)
- therefore we can apply a lookup table and if we don't find the value in there, we can mod the number and try again, which should overall be much simpler than eval()-ing the formula all the time
- no performance tests so far to prove it actually performs faster
(Note I just summarized our discussion, was not involved in the making of the algorithm/patch).
Comment #4
Gábor HojtsyThis is being looked at the extended D8MI sprint in Denver in context of making it simple to provide examples for a UI for plural translation. (#1452188: New UI for string translation). Also discussed yesterday with some people vs. making Drupal 8 better support hiphop.
Comment #5
Sutharsan CreditAttribution: Sutharsan commentedThe hard coded plural processing as in Symphonys PluralizationRules (https://github.com/symfony/Translation/blob/master/PluralizationRules.php) can be partial solution for known languages. However for unknown languages we (may) need the flexibility a populated plural table can provide.
Comment #6
Gábor HojtsyI have not tested the suggested patch but if we can (a) cover all operators supported (b) use a prepopulated table for lookup, then we can avoid both hardcoding some formulas and needing to compute for others, since we can do it all dynamic but still pretty fast. That was the conclusion of our discussion about the above with chx.
Comment #7
andypostSuppose we should provide a predefined formula|rule for languages in
standard_language_list()
and use serialized tables as chx suggested for other languages.Comment #8
Gábor HojtsyWhy maintain two ways in parallel?
Comment #9
clemens.tolboomIn #1570346: Let Symfony Translation component handle language messages we discuss symfony Translation component which has PluralizationRules.
I like the idea of hardcoded rules but PluralizationRules allows for custom rules.
Can't we use that instead?
Comment #10
clemens.tolboomAdded some links from here and other places into Issue Summary.
Comment #11
andypostWe need to re-roll this anyway, so I suggests to add serialized values for standard_language_list() and inherit them in locale_translation_plurals variable, also it's good question about how to convert it's value to CMI?
Comment #12
chx CreditAttribution: chx commentedOh of course we can use Symfony! Heaven forbid we use our own code.
Comment #13
clemens.tolboomThis is definitely not closed :)
1. Symfony has hard coded pluralization rules with support for custom.
2. We do have a PluralForms parser and Symfony does not have yet.
When we switch to Symfony Translation component we need implement a callable when we use https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Tra...
On http://localize.drupal.org/translate/languages we support for 104 languages.
Symfony has 95 langcodes in PluralizationRules.php
Comment #14
YesCT CreditAttribution: YesCT commentedchx: I know you are pretty busy. Do you have any intermediate work on this that might help someone else help you? Or maybe this is ok for someone else to pick up?
Comment #15
chx CreditAttribution: chx commentedComment #16
YesCT CreditAttribution: YesCT commentedOK. :)
This looks challenging, and next step might just be a reroll.
Comment #17
chx CreditAttribution: chx commentedA simple way of compression would be: while evaluating the stack for 0-199 (btw the patch leaves out 199), count the ocurrences of values. Store the highest occurence count as default, store everything else with their indexes. So if you have [1, 2, 3, 3, 3, ... 100 => 1, 101 => 2, 3, 3.....] then the array to be stored is only [default => 3, 0 => 1, 1 => 2, 100 => 1, 101 => 2].
array_filter($plural, function ($n) use ($default) { return $n != $default; }
will neatly throw away the defaults from the $plural array.and then:
Comment #18
Gábor HojtsyWe should not have eval() in Drupal core like this.
Comment #19
Sutharsan CreditAttribution: Sutharsan commentedPorting ...
Comment #20
Gábor HojtsyAlso put on sprint then.
Comment #21
Sutharsan CreditAttribution: Sutharsan commentedThis is an port of the #0 patch. It needs more work to get
evaluatePlural()
functioning.Comment #22
chx CreditAttribution: chx commentedGet foo functioning is not a bug report.
Comment #23
Sutharsan CreditAttribution: Sutharsan commentedBad patch in #21. Corrected version here, still needs work.
Comment #24
Sutharsan CreditAttribution: Sutharsan commentedWorking patch without the compression suggested in #17.
Needs tests.
Comment #26
Sutharsan CreditAttribution: Sutharsan commentedThis should pass the test. Additional tests are not required, the existing tests have sufficient coverage IMHO.
Comment #27
clemens.tolboomCan we have an example evaluation. The code is now hard to grasp.
whitespace (2 spaces)
Comment #28
chx CreditAttribution: chx commentedThis is what I used yesterday to find out what's wrong (see the comment on switch(TRUE); that was wrong). I am not sure we want something like this in an inline comment, it's way too long.
Comment #29
chx CreditAttribution: chx commentedAlso, I looked at the patch and it's not exactly "the top elements of the stack are evaluated." -- what we evaluate is the first operator from the top which can be pretty deep for example step 13 has
true 0 false true 1 20 >=
as the stack top. We pick the >= to evaluate because it's the first symbol and it uses 1 and 20 as arguments and evaluates to false. After evaluating we remove the arguments plus the operator itself -- typically 2 arguments and 1 element for the operator, except ?: which is 3 arguments and 2 elements for the operator and then put back the result. This is why the evaluation starts at 2 to boot -- every operator has at least two arguments.Comment #30
chx CreditAttribution: chx commentedOne more thing, we do not have any infinite loop counters for broken plural formulas and we need one. Anything as crude as adding a $counter = 50 before the while and then if (!$counter--) throw exception('broken plural formula') works.
Also, if this passes we should implement the compression algorithm from above.
Thanks for all the work!
Comment #32
Sutharsan CreditAttribution: Sutharsan commentedAdded more comments (#27, #29), added a limiter to prevent endless loops (#30), added the compression logic (#17) and a bit of extra test cases.
Finally I've added code for the upgrade. But this still fails and I did not manage to find the cause of it:
Comment #34
Sutharsan CreditAttribution: Sutharsan commentedRemoved the surplus type casting.
Upgrade test still fails with:
This seems *not* to be caused by the changes in core/includes/update.inc.
Comment #35
Sutharsan CreditAttribution: Sutharsan commentedAfter full reboot and running an upgrade test with this patch applied to HEAD the the same upgrade test passes. I guess it's too late for humans, lets see what the bot thinks.
Comment #37
Sutharsan CreditAttribution: Sutharsan commentedTests fail in a different manner locally. Not nice, give it another spin.
Comment #38
Sutharsan CreditAttribution: Sutharsan commented#34: the_last_eval-1273968-34.patch queued for re-testing.
Comment #40
Sutharsan CreditAttribution: Sutharsan commentedThis brings us one step closer. The test now (only) fails with
The plural formula could not be parsed.
.Comment #42
penyaskitoI think that the problem is that we have 'nplurals=2; plural=($n>1);' with $n instead of n in the UpgradeTest.
I am having problems with testing this locally, but could be green.
Comment #43
Gábor HojtsyNo! .po files will not have a $ in the number. The *database* data has it. The **upgrade** should remove the $ and then parse it.
Comment #44
penyaskitoOops... working on that then.
Comment #45
penyaskitoOops... working on that then.
Comment #46
penyaskitoThis should address the issue, but I'm having some issues with Upgrade tests in my system. Let's wait for testbot.
Comment #47
Sutharsan CreditAttribution: Sutharsan commentedchx noted in irc that we could filter out more values from the more $plurals array. This is what I came up with for parsePluralForms():
Instead of truncating values from the end of the array, this uses array_filter to remove all occurrences of $default from the array.
Comment #48
Sutharsan CreditAttribution: Sutharsan commentedComment #49
penyaskitoLet's if chx + Sutharsan suggestion works...
Comment #50
Sutharsan CreditAttribution: Sutharsan commentedYes it works :)
Out of curiosity I tried to filter on a few other positions ($plurals[0], $plurals[1], $plurals[189] .. $plurals[198]) but none gave a better result. On average the compression is 90%. Because I have used many exotic plural formula's for the test, in practice the compression may go up to 99%.
Comment #51
chx CreditAttribution: chx commentedWe do not need to have the absolute possible best compression, anything sane is good enough -- I think this is good, not sure who could review it now though :)
Comment #52
Gábor Hojtsy@webflo and I looked at this and we found no problems. I'm also very happy with the set of people who looked at this and worked together :) Yay!
Comment #53
chx CreditAttribution: chx commentedFixed some formatting.
Comment #54
Dries CreditAttribution: Dries commentedCommitted to 8.x. Thanks!
Comment #55
Sutharsan CreditAttribution: Sutharsan commentedThanks!
Upon suggestion of chx, I have create a follow-up issue for additional unit test.
Comment #56
Sutharsan CreditAttribution: Sutharsan commentedThe follow-up issue is #2031441: Add unit test for PoHeader::evaluatePlural
Comment #58
Gábor HojtsyRemoving sprint tag.
Comment #59
Gábor HojtsyAttempt to remove tag again.
Comment #60
Gábor HojtsyThis did not cover JavaScript unfortunately. See #2043953: Singular/plural strings don't work in JavaScript anymore.
Comment #61
catchI've rolled this back and am closing the other issue as duplicate. We can commit this again along with the js fix.
Comment #62
larowlanBecause #2043953: Singular/plural strings don't work in JavaScript anymore was critical
Feel free to bump back.
Comment #63
chx CreditAttribution: chx commentedIt was critical because it was broken but since this is rolled back, it's not a critical any more, by far. Now a) you can't disable eval easily in Zend Engine (because it is not, in fact, a function) b) Regarding HipHop, HPHP is no more and HHVM can run eval just fine. So with the original reasons gone, I am degrading this to a 9.x feature request. Nothing is fundamentally broken here that would call to focus our very limited JS resources in pursuing this dream further. Sad :(
Comment #64
Gábor HojtsyI'm not sure @catch considers having an eval() in Drupal's codebase critical, he rolled this patch back to manage the critical queue by trading a critical bug for a major one. That is rather keep an eval() in vs. having a critical about a non-complete implementation of it. I don't necessarily agree that the rollback was the best technique forward to get the eval removed and JS properly fixed, but I don't need to :)
Comment #65
Gábor HojtsyCross-post.
Comment #66
Gábor HojtsyComment #67
catchafaik there's no API change here, so I'd commit a patch to 8.x, I'm not going to play status pong though.
I hate eval() but I don't consider it a release blocker no (especially since we could backport that change anyway).
Comment #68
chx CreditAttribution: chx commentedLet's agree on a 8.x task, then. Doing this in JS couldn't be hard when the heavy lifting is assembling the array, just pass it to the JS settings and port the, what, two lines of PHP code to JS.
Comment #69
Gábor Hojtsy@catch: it is an API change in terms of how the language object is structured, which having no methods exposes its stuff via public properties. That is why the JS code broke.
Comment #70
catch@Gabor: Ok technically that's an API change but do you know anywhere other than the js that uses formula?
Comment #71
Gábor HojtsyAt least the l10n_server, potx and l10n_pconfig modules rely on the formula in some way (needing the string itself in l10n_pconfig and potx and the num of plurals in l10n_server). Which in fact points to a problem in the above patch that it does not keep storing that information additionally to the computed array.
Comment #72
nod_tag
Comment #72.0
nod_Updated issue summary.
Comment #73
Sutharsan CreditAttribution: Sutharsan commentedRerolled the #53 patch.
Comment #74
Sutharsan CreditAttribution: Sutharsan commentedNevermind :/ A proper reroll, this time.
Comment #75
Sutharsan CreditAttribution: Sutharsan commentedThis patch makes JS use the plural "formula" array instead of the old formula. In the translated js file the formula gets replaced by an array of indexes. Drupal.formatPlural() used this array to determine the right index.
I have seen it working with English plural strings in JS, but could not get translated strings. It probably needs the right JS to to demonstrate plural translation.
Comment #76
Gábor Hojtsy@Sutharsan: @Sweetchuck wanted to take this on on Drupal sprint weekend and found several other bugs in JS translation to make JS translation work in the first place... Those are now fixed in #2182265: Drupal.t() placeholder substitution doesn't always work correctly (backport part of the Javascript file translation fixes from Drupal 8), so this can now be adapted to work with JS too from a prior working JS state (as opposed to several bugs there :)
Comment #77
SweetchuckThis patch is not finished.
The JS Drupal.t() is not works with the english language, but works with every other language.
Comment #78
SweetchuckComment #79
Sutharsan CreditAttribution: Sutharsan commented@sweetchuck, I tried to review the issue but found it very hard to check your work. Your patch combines too many things in one: (unrelated) changes due to the reroll, code style changes and finally the actual functional code changes. As you did not provide an interdiff, I can not judge the patch. Could you create a new patch, with interdiff. Ideally I would like to see 1. (first) a patch of the #75 rerolled; 2. a patch with only functional changes including an interdiff and lastly a separate follow-up issue with the code style changes and the non-essential comment changes.
I do realise that this is a lot of extra work, but the #78 patch is not in a state that I can review it and therefore holds the issue up.
Comment #80
Sweetchuckthe_last_eval-1273968-80-75-reroll.patch
Rerolled version of patch #75.
the_last_eval-1273968-80-78-clean.patch
Simplified version of patch #78. The code comments and coding standards modifications has been removed.
the_last_eval-1273968-80-interdiff-75-78.txt
This is the interdiff between the
the_last_eval-1273968-80-75-reroll
andthe_last_eval-1273968-80-78-clean.patch
Comment #81
Sutharsan CreditAttribution: Sutharsan commentedI've reviewed the interdiff:
This duplicates #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages.
This change seems not to be related to this patch, it is probably code clean-up. If so, remove it from the patch.
This is duplicate code of #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages. Remove it.
This is not a functional change, but a personal preference in code style. Please remove to avoid clutter in the patch.
More code clean-up.
This is duplicate code of #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages. Remove it, or drop that issue.
More code clean-up. Please remove.
Am I correct to understand that this line is the only functional change in this patch? The other changes are 1. covered by #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages, 2. usage of Json::encode() instead of custom encoding and 3. the above commented non-essential changes.
We need to decide what to do with #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages. Those changes are completely included in the #80 patch. We either drop the first issue or remove its code from the patch. Dropping #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages simplifies the workflow and we need the combined code to fix the (plural) translation in English. And that is what we try to achieve after this issue got reverted.
Comment #82
SweetchuckComment #83
Sutharsan CreditAttribution: Sutharsan commentedJay, #2224691: JavaScript Drupal.t and Drupal.formatPlural only works for non-english languages got committed.
@Sweetchuck, Will you re-roll and update the patch as we discussed yesterday?
Tagging for D8MI sprint board.
Comment #84
SweetchuckYes. I will reroll this patch today. After the "D8 Field API - Fields reborn" session
Comment #85
SweetchuckReroll the patch #80 the_last_eval-1273968-80-78-clean.patch
Comment #86
Sutharsan CreditAttribution: Sutharsan commentedThe code looks Ok to me. Since javascript is not tested by the testbot, this patch now needs a manual test. Alternatively we could use a simple javascript script and us use that in a manual test.
Comment #87
SweetchuckHow to test manually
The Drupal.formatPlural() is called only twice in Drupal core.
core/modules/comment/js/node-new-comments-link.js
core/modules/contextual/js/contextual.toolbar.js
Not easy to trigger it manually. Therefore I wrote a small script to call theDrupal.formatPlural() as way as I want.
Of course you can use any other language not just the Polish language.
Comment #88
SweetchuckComment #89
Sutharsan CreditAttribution: Sutharsan commentedUpdated the issue summary with a general problem description and added details to the manual test.
Comment #90
Sutharsan CreditAttribution: Sutharsan commentedI have ran the manual test,but it fails. This is the output:
It looks like the default is not being applied.
Comment #91
Sutharsan CreditAttribution: Sutharsan commentedComment #92
Sutharsan CreditAttribution: Sutharsan commentedThe problem was in the test. It used a translated string, but the string was not in the JavaScript translation file. The source string is now include in the code and used in the
Drupal.t()
function, further and a more commonly used source string is being used in the test.Comment #96
Sutharsan CreditAttribution: Sutharsan commentedIgnore #93 - #95 comments, they are about old patches. The patch in #85 is the latest and still applies.
Comment #97
botanic_spark CreditAttribution: botanic_spark commentedI did the manual test, and I got all strings translated.
Comment #98
Sutharsan CreditAttribution: Sutharsan commented@botanic_spark, thanks for testing.
+1 for RTBC
Comment #99
alexpottCommitted 0ac4b91 and pushed to 8.x. Thanks!
Removed unused local variable on commit.
Comment #101
alexpottComment #102
Gábor HojtsyYay, thanks all!
Comment #104
idebr CreditAttribution: idebr at ezCompany commentedSince the arithmetic plural formula is now evaluated and then discarded, the original plural formula is no longer available. This is a problem when creating interface translation exports in .po files. This currently blocks #2611298: Port potx to Drupal 8
Can you have a look at the following issue and help determine the proposed resolution? #2882617: String version of plural formula is not available, exported .po files contain an incorrect default