Hi,

Does Coder have functionality to review translations in .po-files? The Swedish translation team discussed this (in Swedish) and we are looking for a module to use and Coder might be that project as it already has the UI already written.

Since this functionality is not only relevant for the Swedish translation team we should probably have this as a separate project and use a Coder hook to parse files. Or?

Commenta? Suggestions?

Jens

Comments

stella’s picture

It should be possible to extend coder to do this. However what are the reviews you would want to add? What checks would need to be done? Can you provide a link to the Drupal standards on .po files?

douggreen’s picture

You might want to look at the "Interface Text Translation" review that ships with the potx module (with #309875: Fix (and improve) hook_reviews).

You might also look at the new "Internationalization" review in the 6.x-1.x-dev version.

ztyx’s picture

Category: support » feature

Sorry for such a late reply. It took some time to go through and figure out what the module reviews do.

@douggreen: Thanks for the links! The potx module only looks at the existing strings/translations in PHP-files and comments on those, which is not exactly what we'd like. Also, I didn't manage to find the review you mentioned that should be in "Internationalization" (I checked both i18n-dev.tar.gz and CVS HEAD) but if it is using a Coder module review it is not really what I'm looking for.

@stella: Well, for reviewing a .po-file I think a slightly different approach is needed because of the nature of its format. A .po file looks like this, ie. it holds pairs of text strings (original language, usually English, and another language, in our case Swedish). Therefor a callback function taking two arguments (the strings) would be sufficient. But such a callback I can make myself in my own external module.

Anyway, do be more concrete, I am making this issue a feature request: Add possibility for reviews to specify which file-suffixes they are reviewing. By doing so, other reviews are possible than PHP-reviews and will make reviews of both .po-files and other files possible.

ztyx’s picture

Title: .po files » Setting file suffixes as per review

Changing title accordingly.

stella’s picture

ztyz: it's already possible to do different checks based on file-suffixes. We already have different tests for js (not complete), patch and tests. We just need to know which checks should be done on .po files.

douggreen’s picture

The new "Internationalization" review is in the 6.x-1.x-dev version. It will be in the 6.x-1.2 version when we release it (but it's not released yet). But based on your most recent comments, I don't think that this review checks what you're asking for. The "Internationalization" review is still checking the php code for proper Drupal programming.

ztyx’s picture

Ah, well here are a few things that we would like to check in the .po-files:

  • If the first letter in the translation text should have the same capitalization as it's original text.
  • The punctuations (.,:;?! etc.) that exists in translation should exist in the original and vice versa.
  • Check translations for opening/closing tags if they contain HTML.
  • If the original text contains an HTML tag the translation should hold the same tag.
  • If placeholders like !name, @name or %name exist in the original text, they must also exist in the translation.
  • A list of blacklisted terms that we'd like to check for.
  • We also have a dictionary containing a bunch of standard translations. If one of these words show up in the original text but not translation a warning should be fired.
  • Quotation checks.
  • Parenthesis (()[]{}) checks.

The problem is that some of these rules are locale specific. Therefor, maybe using a hook and implementing this project in a different module would be a better idea. I envision having some kind of system where you select your language and the right checks will be made for you. If these settings should be hardcoded or uploaded/downloaded as a file for easy sharing I don't know. I guess hardcoding at first.

Also, I noticed the link to the Swedish discussion was broken. Here is a link that works.

Edit: Added more possible tests to the list.

nedjo’s picture

This would make a very valuable and powerful battery of tests. It would allow us to assess to a degree not only the completeness but the quality of a translation. We could present summary statistics for each module, showing the number of problem-free, problematic, and missing translations.

Clearly this issue presents challenges because it's quite distinct from the usual coder needs in that it needs to compare specific data between sets of text files. Probably this would need to be done in a separate module, maybe one that depends on potx, since potx is already set up to deal with .pot files.

Doug, Stella, what do you think?

nedjo’s picture

Title: Setting file suffixes as per review » Tests to detect errors in transations in .po files
ztyx’s picture

Title: Tests to detect errors in transations in .po files » Tests to detect errors in translations in .po files
StatusFileSize
new2.88 KB

Allright, I've made an outline of the module, which I've called Translation Review for now.

I am attaching 1) the module outline (and .info file) together with 2) a bunch of test cases. You can view the test cases both as a way of proposing the functionality and also to set goals where I'm headed with this. Feel free to comment patch as you want.

The function _tr_check(...) is the starting point for the checks that are to be made. Since checking the quality of the translations only requires the original text together with its translation, those are the only inputs needed. I am not too familiar with Coder hooks, but maybe it would be possible to use a callback hook per .po-file that filters out these two strings and runs them through _tr_check(...)?

@nedjo: Good summary. You mentioned potx maybe becoming a dependency. It was proposed in the Swedish translation issue too, but after browsing through the potx (6.2/CVS) code it seems as though it is only generating .pot-files from t(...) calls. In other words it's not parsing .po(t)-files in any way. A translation review will need code to parse the .po files and will not need to parse the t(...) calls at all. Am I wrong?

Also, I am curious to hear whether I should create a new project on Drupal.org or whether you are interested in taking this on in the Coder project bundle? Either way I am learning Drupal and I'd love contribute to this.

douggreen’s picture

Version: 6.x-1.0 » 6.x-2.x-dev
Status: Active » Needs review

Attached is a patch that does some restructuring of coder to make it simpler to add file extension based rules. And then it takes the first stab at a callback that implements many of the requests above. I'm likely to commit part (or all of this patch) before it is really done, just to get the restructuring in place.

But we should continue to work on improving these rules, and possibly even move them to the potx review.

douggreen’s picture

StatusFileSize
new15.52 KB

Opps, here's the patch...

hass’s picture

Please, don't forget to test:

1. Sentences must end with periods. (http://drupal.org/node/312523#comment-1030590)
2. Also check if people are using l() inside the placeholder array. If they do - make it an context sensitive "error". Code example can be found at http://drupal.org/node/310852#comment-1029681.
3. Validate if people are using @ placeholder if url() is used. $output .= '<p>'. t('Go to the <a href="@contact-page">contact page</a>.', array('@contact-page' => url('contact'))) .'</p>';

douggreen’s picture

This patch attempts to catch translation issues. Aren't these latest three requests more about the potx review, and not about the translation itself.

hass’s picture

Sorry, aren't you working here on the internationalisation tests? It's not POTX only... It extends the list of checks in #7 above.

ztyx’s picture

@douggreen: Yes, you are correct.

@hass: Well, we are working on internationalization tests but this issue only deals with finding common errors in translations (by comparing original text with translated one). The checks in #7 are reviews done on translations in .po-files doing exactly this. The tests you are proposing are part of another review and should be filed with the potx-project which hosts a review for finding errors in the PHP-code related to translations.

EDIT: Wrong. See comment #17. Sorry about that.

stella’s picture

@hass: actually it shouldn't be filed with the potx project. I don't think those are appropriate cases for that module. Instead create a new issue against the coder module for those requests. They should be added to the coder built-in internationalization review. Thanks.

douggreen’s picture

Status: Needs review » Active
StatusFileSize
new0 bytes

From hass's new suggestions:

1. I've added a check that if the original text ends in a period, that the translation should also end in a period.
2. Please create a separate coder issue for this.
3. Please create a separate coder issue for this.

I then committed this patch. I'm leaving the issue open, so that we can add additional checks.

stella’s picture

Don't think the following review is working:

"The translation text should end in a period when the same original text also ends in a period."

It's giving me errors in po files for lines where both the original source text and the translated text both end in a period.

Secondly how do you compare upper and lower case of the first letter of a sentence when the source text is in English and the translated text is in a different character set, e.g. Japanese, Arabic, etc. Similarly not sure if the punctuation checks will work here either.

Cheers,
Stella

hass’s picture

Well, not every string needs a period... but descriptions should have one - mostly/everytime(!?)... titles - mostly not... but maybe sometimes. Not sure how we should test this 100% correctly.

stella’s picture

hass: I'm actually referring to the review of the .po files themselves, i.e. where the comparison of the translated text and the source text. For example, if the English source text ends in a period, then a check is done to ensure that the French translation does too. So your comment is more in relation to the parsing of the strings passed to t() in the source code, which is a separate issue. This issue is just for the comparison the translated strings with the original source text.

hass’s picture

Ahhh, well that would be cool... haven't yet thought about comparing English with the translation. Very good idea. :-)

douggreen’s picture

The upper/lower case comparison is using ctype_upper.

The period and punctuation tests aren't working because the '.' in other languages doesn't match the '.' in English. Any ideas how to solve this?

And lastly, as hass mentions above, not every sentence should end with a period. But as stella also mentions, this check is that if the original text ends in a period, that the translated text should as well.

stella’s picture

Could we separate the review of ".po" files out from the "internationalization" review? Or at least have a flag to stop .po files from being reviewed by default as part of that review? It's just that as a developer reviewing my module's code, I don't want to see errors in relation to translation. I just want to see my coding errors.

ztyx’s picture

douggreen wrote:

The period and punctuation tests aren't working because the '.' in other languages doesn't match the '.' in English. Any ideas how to solve this?

How about having input for the punctuation tests as an associative array where keys are punctuations in English and values are corresponding punctuations in the foreign language? And settings for .po review could have a separate configuration page for test specific settings? This makes another argument to split up i18n review and .po review into two different reviews.

fgm’s picture

Version: 6.x-2.x-dev » 6.x-2.0-beta1

The period matching appears to have been included in the latest beta, although it still doesn't work.

gábor hojtsy’s picture

Title: Tests to detect errors in translations in .po files » Automated quality checking of translations
Project: Coder » Localization server
Version: 6.x-2.0-beta1 » 6.x-2.x-dev
Component: Review/Rules » Code

This is not really relevant anymore, since the new standard is now translations managed on localize.drupal.org and exported separately from d.o projects. There are to be no .po files in projects to check with coder module. However, quality checking of translations (on input and possibly on request for existing translations) would be a good feature request for localization server, so moving this issue there.

droplet’s picture

I started a similar issues a year ago #587666: Add simple automated translation formatting checks.

I & my FD started to do similar stuff to save our time to do error checking, this is a test page: http://bit.ly/gOqUOI
(black is origin texts, green is following our standard, red is errors)

for my experience, we only catch very few error (non-standard) strings. specifically, English and other Latin lang translation have SPACE to split the word and lots punctuations is SAME. There would be very very less errors. (IMO, I think it do not worth if it need spent a lots time to code a script to do formatting check for English (eg. punctuations checking, have SPACE or no SPACE ..capitalization or not). but always happing to see that :)

at the point, placeholders / HTML Markup checking is more important.

gábor hojtsy’s picture

Status: Active » Closed (duplicate)

Yes, in fact, we can continue on the existing issue.