Problem/Motivation

This issue is about testing the Symfony Translation component for a Gettext replacement.

As motivated in #1189184: OOP & PSR-0-ify gettext .po file parsing and generation we need a better component for Gettext processing.

Key parts that need to be addressed are:
- Can we support the installer?
- Can we load a .po file into the database?
- Can we dump from the database to a .po file?
- Can we edit translations?

The questions seems trivial but are they?

Proposed resolution

Use the Symfony components Translation and Config to Drupal to load / save PO files and handle translation.

Benefits are:
- being able to write different input/output formats to use for version control. xml, yaml, php
- no tests anymore for translations needed. These should reside within Symfony.

Cons are:
- do we loose Gettext comments and is that bad?
- dependency on the development pace Symfony provides.

Symfony uses a different format regarding handling translations compared to the Gettext format:

1. Messages are stored into a MessageCatalog which supports domains. Drupal uses Context for that.
2. Singular and Plural message source ids are disjunct. (Not really sure how Drupal does this)

When Symfony has

$messages = array(
  'One sheep' => 'Un mouton',
  '@count sheeps' => 'Un mouton|@count sheeps'
);

is uses only the @count sheeps when doing

format_plural(3, 'One sheep', '@count sheeps')

That is Symfony has it's own pluralization rules. We can add our own Rules through PluralizationRules::set($rules, $locale)

Remaining tasks

Symfony must definitely change their plural seperator which is now a '|'. This should be something like our '\03' LOCALE_PLURAL_DELIMITER.

We need the PO header for our plural calculations. Symfony does not yet parse the header. It uses PluralizationRules which is extendible by custom rules. This related closely to #1273968: remove eval from locale.module

But we do write PO files and want them to have a PO Header. See pull request https://github.com/symfony/symfony/pull/4249 discusses the progress.

Comments '#' are not yet handled by the PoFileLoader. Do we need these?

The major Symfony PR is [WIP]: Allow Drupal to use Translate component which lead to some small commits done through:
#4336 [BUG] PoFileLoader pluralhandling uses interval instead of index.
#4337 Fixed for allowing empty translation.
#3339 Allow for missing whitelines.

User interface changes

API changes

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

clemens.tolboom’s picture

Issue tags: +D8MI, +sprint, +language-base

Added tags

clemens.tolboom’s picture

Status: Active » Needs review
FileSize
481.77 KB

An initial patch to see if testbot does not crash again.

This patch let's the installer load ie a french translation which works by hand.

Note this patch includes https://github.com/symfony/symfony/pull/4249 which is not committed yet. See also https://github.com/clemens-tolboom/symfony/tree/trans2drupal

I noted the file sites/default/files/translations/install.fr.po is loaded twice by

locale_translate_batch_import()

. First through install.php and later again by index.php (?not 100% sure by index.php).

clemens.tolboom’s picture

Status: Needs review » Needs work
clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
482.48 KB

New patch with git diff --binary as the reported files are binary I guess. At least for .mo files

clemens.tolboom’s picture

Sigh ... forgot to commit local fixes for header (necessary for plural tests) and file object.

clemens.tolboom’s picture

Status: Needs review » Needs work

I updated the issue summary slighly with two remaining tasks:
- How to handle reader PO headers. Best is to use PluralizationRules (part of symfony Translation)
- Do we need to write PO headers? I have to look into PoFileDumper.

(I definitely could use a sponsor for this as I'm an independent developer)

clemens.tolboom’s picture

Issue summary: View changes

Updated issue summary.

clemens.tolboom’s picture

I just thought of what is our real batch constraint?

function locale_translate_batch_import($filepath, &$context) {
... // blazing fast reader
    $result = gettext_read_messages_from_po_file($file, $langcode);

Why not just http://nl3.php.net/array_slice our $result['messages'] on some batch parameters? Loading of the file probably not memory constraint but db constraint.

    gettext_write_messages_to_database($result['messages'], $langcode, array(), LOCALE_NOT_CUSTOMIZED, $result['header']);
    $context['results'][] = $filepath;
  }
}

This way we don't have to implement any Batch interface. Some help appreciated as batch_example.module/8 does not give me much clues :(

(Still wonder why #4 did not fail: I expected the French plural check to fail definitely)

clemens.tolboom’s picture

I added a Gettext class to symfony (pending https://github.com/symfony/symfony/pull/4249)

Another note on our batch needs. As we have all messages read into memory we can bulk load them into the database probably too. I wonder whether we need batch at all.

Attached patch works on install and import. Not export as I first have to study Symfony's PoFileDumper.

clemens.tolboom’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, core-1570346-8-use-symfony-translation-component.patch, failed testing.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
0 bytes

The Gettext class did not replace '\n' so PO Header was not multiline.

My simpletest screams Base table or view not found: 1146 Table 'drupal_d8.simpletest432981semaphore not sure what that is :(

clemens.tolboom’s picture

Status: Needs review » Needs work

The last submitted patch, core-1570346-12-use-symfony-translation-component.patch, failed testing.

clemens.tolboom’s picture

When visiting the modules page then enable testing a batch start to import the install.fr.po file.

    Notice : Undefined index: title dans _batch_progress_page_js() (ligne 127 dans /Users/clemens/Sites/drupal/d8/www/core/includes/batch.inc).
    Notice : Undefined index: error_message dans _batch_progress_page_js() (ligne 135 dans /Users/clemens/Sites/drupal/d8/www/core/includes/batch.inc).
    Notice : Undefined index: init_message dans _batch_progress_page_js() (ligne 136 dans /Users/clemens/Sites/drupal/d8/www/core/includes/batch.inc).
    Aucun traitement par lot actif.
clemens.tolboom’s picture

(#14 is probably about not running the simpletest from a French drupal site?!?)

The other tests fails due to Symfony's way of handling the messages.

This is the Symfony way to process file from locale.test LocaleImportFunctionalTest::getPoFile()

Array
(
    [__HEADER__] => Project-Id-Version: Drupal 8
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Plural-Forms: nplurals=2; plural=(n > 1);

    [One sheep] => un mouton
    [@count sheep] => {0} un mouton|{1} @count moutons
    [Monday] => lundi
    [Tuesday] => mardi
    [Wednesday] => mercredi
    [Thursday] => jeudi
    [Friday] => vendredi
    [Saturday] => samedi
    [Sunday] => dimanche
)

This shows one extra message namely the '@count sheep'.

format_plural( $count, 'One sheep', '@count sheep', array( '@count' => $count), array('langcode'=>'fr');

which should try to call t('One sheep') when count == 1 or plural-index==0 ... in other cases we need to select the appropriate version from "[@count sheep] => {0} un mouton|{1} @count moutons"

Note LOCALE_PLURAL_DELIMITER is replaced by a '|' by Symfony.

clemens.tolboom’s picture

When trying the following code

foreach (array('fr', 'en') as $langcode) {
  foreach ( array(0,1,2) as $count) {
    drupal_set_message( "$langcode - $count" . ' = ' . format_plural( $count, "Sheep", "@count sheep", array(), array('langcode' => $langcode)));
  }
}

with the rule

@count sheep => {0} un mouton|{1} @count moutons

output is

fr - 0 = un mouton
fr - 1 = 1 moutons
fr - 2 = Sheep
en - 0 = Sheep
en - 1 = 1 sheep
en - 2 = Sheep

and there is no room for Zero moutons or 'No moutons'.

Ideally we should use all Symfony Plural handling (remember the translation of "@count sheeps" is "{0} un mouton|{1} 2 moutons" loaded into our databases by PoFileLoader) and contains all translaiton options.

function format_plural($count, $singular, $plural, array $args = array(), array $options = array()) {
  $args['@count'] = $count;

  if (!isset($options['langcode'])) {
    $options['langcode'] = drupal_container()->get(LANGUAGE_TYPE_INTERFACE)->langcode;
  }
  // Get the translation string without values
  $plural_string = t($plural, array(), $options);
  $ms = new Symfony\Component\Translation\MessageSelector();
  try {
    $translation = $ms->choose($plural_string, $count, $options['langcode']);
    return t($translation, $args, $options);
  }
  catch (Exception $e) {
    // TODO should we return $singular?
    return t($singular, $args, $options);
  }
}

So now I'm puzzled with whether the PoFileLoader is in err mapping our singular to {0} or Drupal has done it wrong all time?

clemens.tolboom’s picture

Issue summary: View changes

Updated issue summary.

clemens.tolboom’s picture

Issue summary: View changes

Updated issue summary.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
512.72 KB

I misunderstood the Symfony pluralization rules. It has two options:
- indexed: that is how our Gettext msg_plural works
- interval: an awesome way to define plural form incompatible with gettext.

I still didn't managed to pass all plural tests. Hope someone else sees what's wrong.

Status: Needs review » Needs work

The last submitted patch, core-1570346-17-use-symfony-translation-component.patch, failed testing.

Anonymous’s picture

Issue summary: View changes

Added example about PoFileDumper problem.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
514.05 KB

Import failures (I skip the plural failures for now):
- Two failures are related with plural imports. Those need a +1 for number of imports (fixed)
- 1114: Another has to do with context. PoFileLoader does not know how to handle contexts. Symfony knows about domains. Can we solve that by using domains?
- 1137 | 1151: The others are probably about not getting the correct override options. We need to check those in more detail.

I'm not sure we can make it into Symfony without context support. For getting context to work we must read a PO file in full and decide what contexts are in it. The we need to iterate over these contexts to load each on separately into a domain. That is definitely not performing well. We can rephrase this by exporting PO files by context thus creating multiple files then import each into the correct domain.

What do you think?

Status: Needs review » Needs work

The last submitted patch, core-1570346-19-use-symfony-translation-component.patch, failed testing.

Gábor Hojtsy’s picture

I don't understand what is happening here. The summary does not really explain why is this being done and how.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
525.68 KB

I'll update the summary this week.

Attached patch should fix the context for the loader.

Status: Needs review » Needs work

The last submitted patch, core-1570346-22-use-symfony-translation-component.patch, failed testing.

clemens.tolboom’s picture

Status: Needs work » Needs review

(Just some notes to get integrated with the issue summary shortly.)

Gettext related notes.

header

Current implementation of header support maybe controversial as this is implemented by 'translate' __HEADER__ to a header string.

$messages['messages']['__HEADER__'] = headerstring;

Note that our Gettext default context messages are have a symfony domain of 'messages'.

comments #

As Symfony import/export several file formats we lose comments when using Symfony. I don't see a quick solution for adding comments to Symfony.

msgctxt

Adding context was not hard as Symfony understand domains with are afaik equals. Processing context from within Symfony maybe controversial as the list of contexts need to get into Symfony somehow. This is now done through the parsed messages by adding a translation like

$messages['messages']['__CONTEXT__'] = $contexts;

Thus loading ie drupal-7.13.nl.po processes first the file for messages with the Symfony domain 'messages' equivalent with the Gettext default msgctxt (empty) while scanning for all msgctxt.

  $loader = new PoFileLoader();
  $resource = $file->uri;
  $catalogue = $loader->load($resource, $langcode);
  $all = array();
  $messages = $catalogue->all('messages');

Next reload the resource for each domain.

  $domains = Gettext::getContext($messages);
  $all['messages'] = $messages;
  foreach($domains as $domain) {
    $catalogue = $loader->load($resource, $langcode, $domain);
    $messages = $catalogue->all($domain);
    $all[$domain] = $messages;
  }
  $messages = $all;

Writing PO(T) files

The Symfony PoFileDumper just calculates a string which is saved by the FileDumper. Symfony's current implementation dumps each domain into a separate file named after it's domain. We could overload the PoFileDumper::dump() method to allow for a single file dump.

Symfony currently lacks a POT dumper and loader. I'm not sure but guess that's not a big problem.

Plural handling

Symfony breaks the link between singular and plural form. The PoFileDumper can merge the back but is that good enough?

clemens.tolboom’s picture

I planned to gix all tests but now am puzzling what's going wrong with editing translations. Is it only the LOCALE_PLURAL_DELIMITER or is the more to investigage.

    $lid = db_query("SELECT lid FROM {locales_source} WHERE source = :source AND context = ''", array(':source' => "1 hour" . LOCALE_PLURAL_DELIMITER . "@count hours"))->fetchField();
    $this->assertNotNull($lid, "The lid for 1 hour found $lid");

This attached patch has more fixes on header but that's mere Symfony. I hoped to having fixed bulk db_insert but that's delayed :(

Status: Needs review » Needs work
Issue tags: -D8MI, -sprint, -language-base

The last submitted patch, core-1570346-25-use-symfony-translation-component.patch, failed testing.

clemens.tolboom’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: +D8MI, +sprint, +language-base

The last submitted patch, core-1570346-25-use-symfony-translation-component.patch, failed testing.

Anonymous’s picture

Issue summary: View changes

Added a Symfony interval example.

clemens.tolboom’s picture

Issue summary: View changes

Added links to Symfony pull requests (PR)

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
541.49 KB

Installing by hand seems to work for the translation edits in Dutch.

Locally comments fail and some (but less) plural tests.

clemens.tolboom’s picture

I edited the Issue Summary ... but not enough.

Things to discuss @ #drupal-i18n are
- the edit workflow and database content. Hungarian test probably fails. Dunno why exactly.
- the i18n ecosystem. What impact can this rework have?

Status: Needs review » Needs work

The last submitted patch, core-1570346-29-use-symfony-translation-component.patch, failed testing.

Gábor Hojtsy’s picture

I've re-read the updated issue summary, and still don't understand what Drupal components would this replace? Does it have any benefits beyond outsourcing code? Any improvements to the current system? Comments re some concerns in the summary:

1. Loosing gettext sourced comments is not a problem IMHO.

2. I don't understand how would using domains instead of contexts change the approach of translation; if all modules belong to separate domains, then 'Home' would need to be translation tens of thousands of times over and over again (as it appears in modules). The role of context is to specifically be able to mark strings that have a special meaning and let the others be shared. If we segregate modules, then this is how many times you need to translate "Operations": http://localize.drupal.org/translate/source-details/398 Sounds fun? I'd expect Symfony to support contexts as they strive to be generic for everybody, right? It is a core part of the format.

3. Not sure about the plural lookup. The gettext format is basically a key-value mapping store. The keys are either defined by msgid or msgid in combination with msgid_plural. If you have this file, then the translations should be different:

msgid "@count sheep"
msgstr "whatever whatever"

msgid "One sheep"
msgid_plural "@count sheep"
msgstr[0] "something else"
msgstr[1] "more of something else"

This is a perfectly valid gettext file. (I tested it with msgfmt just to be double sure btw :) What does a translation lookup return for "@count sheep" in this case? It sounds like Symfony makes some assumptions about the source that people are surely not using the plural version of a string as a standalone source string elsewhere. This is just an assumption and the format allows for it perfectly, so it does not sound like something a generic framework should assume.

I went ahead a bit to try to figure out how does msgfmt consider uniqueness, and figured out that in fact the msgid for the singular version is the identifier and the msgid_plural is just supplemental. Try this with msgfmt:

msgid "One sheep"
msgstr "something"

msgid "@count sheep"
msgstr "whatever whatever"

msgid "One sheep"
msgid_plural "@count sheep"
msgstr[0] "something else"
msgstr[1] "more of something else"

msgid "One sheep"
msgid_plural "@count sheep"
msgstr[0] ""
msgstr[1] ""

You'll get both plurals generate errors as duplicate definitions of the first one. Now if you remove the first one, you "only" get the duplicate for the last one. Then if you edit the msgid_plural string for the last one to something else, you'd still get the duplicate error. So in practice, the msgid seems to govern as the unique key after all. That makes it even more wrong for symfony to assume the msgid_plural would be unique.

I've tested these with:

$ msgfmt -V
msgfmt (GNU gettext-tools) 0.17

(Drupal 8 uses the combination of the singular and plural as the unique key or the single string in case of non-plural strings).

clemens.tolboom’s picture

@Gábor Hojtsy

1. Regarding #32 #1 Gettext comments

On ie http://drupal.d8/admin/config/regional/translate I see a source location. Isn't that .po comment?

An AJAX HTTP error occurred.
core/misc/drupal.js

2. Regarding #32 #2 msgcnxt versus domains

I guess and hope you misunderstood. Symfony does as translation lookups on a domain selection which are the same as gettext msgctxt.

@see https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Tra...
@see https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Tra...

3. Regarding #32 #3 singular|plural connection

Symfony uses just a source string and a translation string. But the translation string contains all $count dependent values from which the correct version is selected by the MessageSelector

@see https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Tra...

For PoFileDumper we need indeed the msgid + msg_plural to so I added these with the PoFileLoader. Is it bad when we loose the singular | plural connection? When broken and exporting back into a Gettext format it will not help the Gettext translators.

Thus import the following

msgid "One sheep"
msg_plural "@count sheep"
msgstr[0] "Un mouton"
msgstr[1] "@count mouton"

and loosing the singular|plural connection results in

msgid "One sheep"
msgstr "Un mouton"
...
msgid "@count sheep"
msgstr "Un mouton|@count mouton"

This doesn't look too bad to me but it does not conform your #32 #3 msgfmt excercise.

I personally like the "@count sheep" translates to "Un mouton|@count moutons" construct as that complies more to a string resource ID concept.

Gábor Hojtsy’s picture

1. Yes, the location is sometimes in the comment. However, the location is sometimes a file name, or a URL even. If you import .po files from localize.drupal.org those contain no location comments. In part for making the file smaller, in part because it was found to be of no use to the translators.

2. I don't understand from that code how is the domain used or what should it be. All the docs I found was "The domain" in the file, not very descriptive. Do they have actual docs for this?

3.

I personally like the "@count sheep" translates to "Un mouton|@count moutons" construct as that complies more to a string resource ID concept.

Whether you like this or that, we are talking about implementation of the Gettext .po format here, so whatever we like, it would be prudent to follow the format.

clemens.tolboom’s picture

#29 Test results

I broke the locale.test 699 as I refactored the format_plural

// Hungarian is not imported, so it should assume the same text as
// English, but it will always pick the plural form as per the built-in
// logic, so only index -1 is relevant with the plural value.

locale.test 796 : I overlooked the comments (inserts a new entry into the db.)

// Inject a plural source string to the database. We need to use a specific
// langcode here because the language will be English by default and will
// not save our source string for performance optimization if we do not ask
// specifically for a language.
format_plural(1, '1 day', '@count days', array(), array('langcode' => 'fr'));

Most other breaks are related to the current plural PoFileLoader which adds an extra key singular|plural ... see #33

(sigh)

clemens.tolboom’s picture

Status: Needs work » Closed (won't fix)

After a 1:30 hour skype call with Symfony developer Drak we agreed on Symfony does not understand po msgctxt and Drupal does not understand domains. I further learned from Drak a Symfony 2.1 release is coming which will block RFCs probably. We should target Symfony 2.2 for this.

My conclusion for now is:
- do not use Symfony Translation component as it does not fit yet
- A domain != context
- Pull Request https://github.com/symfony/symfony/pull/4249 is too big and has no reviewers available as it need Gettext experts like Drak.
- it was a nice excercise to try and use this within Drupal.
- We switch back the development to #1189184: OOP & PSR-0-ify gettext .po file parsing and generation thus Close (won't fix) for now.

Some links about Gettext past by during the discussion:
- The place to look for PluralForms is https://translations.launchpad.net/+languages (331 languages from 398 available). Each language lists it's plural form.
- PluralForms are not all correct on http://translate.sourceforge.net/wiki/l10n/pluralforms
- Discusses bad Gettext 0.15 msgctxt implementations http://translate.sourceforge.net/wiki/toolkit/duplicates_duplicatestyle?...
- A favorite tool does not even support msgctxt http://www.poedit.net/trac/ticket/148
[edit]
- Zikula needs PO comment parsing to as https://github.com/zikula/i18n-portal/blob/master/gettext/de/core/1.3.0/...

Gábor Hojtsy’s picture

Issue tags: -sprint

I tried to explain in #32 above that domains are silos of strings whereas contexts are metainformation that you "tag" one specific string with. They are for different purposes. I understand Symfony only supports domains, and it is not in fact a full implementation, and therefore not reusable in a framework context. I also understand from your conversation notes that Symfony is not interested in implementing contexts(?). Drupal only supports contexts and consider all strings being in "one domain", but Drupal was not really built to be reusable in a framework context like that, we implemented what we needed.

Removing this from the sprint then.

clemens.tolboom’s picture

Symfony in particular Drak is interested in getting a full blown Gettext implementation. But we (Drupal) have a feature freeze making it risky to wait for ie Symfony 2.2

My Symfony PRs cover most drupal related stuff (in draft) except Gettext header plural form parser which is a beast to build. I probably continue on Symfony coding late September unlessthey asked for a sponsored version.

clemens.tolboom’s picture

Issue summary: View changes

More info?