Tests to detect errors in translations in .po files
ztyx - September 20, 2008 - 17:10
| Project: | Coder |
| Version: | 6.x-2.x-dev |
| Component: | Review/Rules |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Description
Hi,
Does Coder have functionality to review translations in .po-files? The Swedish translation team discussed this (in Swedish) and we are looking for a module to use and Coder might be that project as it already has the UI already written.
Since this functionality is not only relevant for the Swedish translation team we should probably have this as a separate project and use a Coder hook to parse files. Or?
Commenta? Suggestions?
Jens

#1
It should be possible to extend coder to do this. However what are the reviews you would want to add? What checks would need to be done? Can you provide a link to the Drupal standards on .po files?
#2
You might want to look at the "Interface Text Translation" review that ships with the potx module (with #309875: Fix (and improve) hook_reviews).
You might also look at the new "Internationalization" review in the 6.x-1.x-dev version.
#3
Sorry for such a late reply. It took some time to go through and figure out what the module reviews do.
@douggreen: Thanks for the links! The potx module only looks at the existing strings/translations in PHP-files and comments on those, which is not exactly what we'd like. Also, I didn't manage to find the review you mentioned that should be in "Internationalization" (I checked both i18n-dev.tar.gz and CVS HEAD) but if it is using a Coder module review it is not really what I'm looking for.
@stella: Well, for reviewing a .po-file I think a slightly different approach is needed because of the nature of its format. A .po file looks like this, ie. it holds pairs of text strings (original language, usually English, and another language, in our case Swedish). Therefor a callback function taking two arguments (the strings) would be sufficient. But such a callback I can make myself in my own external module.
Anyway, do be more concrete, I am making this issue a feature request: Add possibility for reviews to specify which file-suffixes they are reviewing. By doing so, other reviews are possible than PHP-reviews and will make reviews of both .po-files and other files possible.
#4
Changing title accordingly.
#5
ztyz: it's already possible to do different checks based on file-suffixes. We already have different tests for js (not complete), patch and tests. We just need to know which checks should be done on .po files.
#6
The new "Internationalization" review is in the 6.x-1.x-dev version. It will be in the 6.x-1.2 version when we release it (but it's not released yet). But based on your most recent comments, I don't think that this review checks what you're asking for. The "Internationalization" review is still checking the php code for proper Drupal programming.
#7
Ah, well here are a few things that we would like to check in the .po-files:
The problem is that some of these rules are locale specific. Therefor, maybe using a hook and implementing this project in a different module would be a better idea. I envision having some kind of system where you select your language and the right checks will be made for you. If these settings should be hardcoded or uploaded/downloaded as a file for easy sharing I don't know. I guess hardcoding at first.
Also, I noticed the link to the Swedish discussion was broken. Here is a link that works.
Edit: Added more possible tests to the list.
#8
This would make a very valuable and powerful battery of tests. It would allow us to assess to a degree not only the completeness but the quality of a translation. We could present summary statistics for each module, showing the number of problem-free, problematic, and missing translations.
Clearly this issue presents challenges because it's quite distinct from the usual coder needs in that it needs to compare specific data between sets of text files. Probably this would need to be done in a separate module, maybe one that depends on potx, since potx is already set up to deal with .pot files.
Doug, Stella, what do you think?
#9
#10
Allright, I've made an outline of the module, which I've called Translation Review for now.
I am attaching 1) the module outline (and .info file) together with 2) a bunch of test cases. You can view the test cases both as a way of proposing the functionality and also to set goals where I'm headed with this. Feel free to comment patch as you want.
The function
_tr_check(...)is the starting point for the checks that are to be made. Since checking the quality of the translations only requires the original text together with its translation, those are the only inputs needed. I am not too familiar with Coder hooks, but maybe it would be possible to use a callback hook per .po-file that filters out these two strings and runs them through _tr_check(...)?@nedjo: Good summary. You mentioned potx maybe becoming a dependency. It was proposed in the Swedish translation issue too, but after browsing through the potx (6.2/CVS) code it seems as though it is only generating .pot-files from
t(...)calls. In other words it's not parsing .po(t)-files in any way. A translation review will need code to parse the .po files and will not need to parse thet(...)calls at all. Am I wrong?Also, I am curious to hear whether I should create a new project on Drupal.org or whether you are interested in taking this on in the Coder project bundle? Either way I am learning Drupal and I'd love contribute to this.
#11
Attached is a patch that does some restructuring of coder to make it simpler to add file extension based rules. And then it takes the first stab at a callback that implements many of the requests above. I'm likely to commit part (or all of this patch) before it is really done, just to get the restructuring in place.
But we should continue to work on improving these rules, and possibly even move them to the potx review.
#12
Opps, here's the patch...
#13
Please, don't forget to test:
1. Sentences must end with periods. (http://drupal.org/node/312523#comment-1030590)
2. Also check if people are using l() inside the placeholder array. If they do - make it an context sensitive "error". Code example can be found at http://drupal.org/node/310852#comment-1029681.
3. Validate if people are using @ placeholder if
url()is used.$output .= '<p>'. t('Go to the <a href="@contact-page">contact page</a>.', array('@contact-page' => url('contact'))) .'</p>';#14
This patch attempts to catch translation issues. Aren't these latest three requests more about the potx review, and not about the translation itself.
#15
Sorry, aren't you working here on the internationalisation tests? It's not POTX only... It extends the list of checks in #7 above.
#16
@douggreen: Yes, you are correct.
@hass: Well, we are working on internationalization tests but this issue only deals with finding common errors in translations (by comparing original text with translated one). The checks in #7 are reviews done on translations in .po-files doing exactly this. The tests you are proposing are part of another review
and should be filed with the potx-project which hosts a review for finding errors in the PHP-code related to translations.EDIT: Wrong. See comment #17. Sorry about that.
#17
@hass: actually it shouldn't be filed with the potx project. I don't think those are appropriate cases for that module. Instead create a new issue against the coder module for those requests. They should be added to the coder built-in internationalization review. Thanks.
#18
From hass's new suggestions:
1. I've added a check that if the original text ends in a period, that the translation should also end in a period.
2. Please create a separate coder issue for this.
3. Please create a separate coder issue for this.
I then committed this patch. I'm leaving the issue open, so that we can add additional checks.
#19
Don't think the following review is working:
"The translation text should end in a period when the same original text also ends in a period."It's giving me errors in po files for lines where both the original source text and the translated text both end in a period.
Secondly how do you compare upper and lower case of the first letter of a sentence when the source text is in English and the translated text is in a different character set, e.g. Japanese, Arabic, etc. Similarly not sure if the punctuation checks will work here either.
Cheers,
Stella
#20
Well, not every string needs a period... but descriptions should have one - mostly/everytime(!?)... titles - mostly not... but maybe sometimes. Not sure how we should test this 100% correctly.
#21
hass: I'm actually referring to the review of the .po files themselves, i.e. where the comparison of the translated text and the source text. For example, if the English source text ends in a period, then a check is done to ensure that the French translation does too. So your comment is more in relation to the parsing of the strings passed to t() in the source code, which is a separate issue. This issue is just for the comparison the translated strings with the original source text.
#22
Ahhh, well that would be cool... haven't yet thought about comparing English with the translation. Very good idea. :-)
#23
The upper/lower case comparison is using ctype_upper.
The period and punctuation tests aren't working because the '.' in other languages doesn't match the '.' in English. Any ideas how to solve this?
And lastly, as hass mentions above, not every sentence should end with a period. But as stella also mentions, this check is that if the original text ends in a period, that the translated text should as well.
#24
Could we separate the review of ".po" files out from the "internationalization" review? Or at least have a flag to stop .po files from being reviewed by default as part of that review? It's just that as a developer reviewing my module's code, I don't want to see errors in relation to translation. I just want to see my coding errors.
#25
douggreen wrote:
How about having input for the punctuation tests as an associative array where keys are punctuations in English and values are corresponding punctuations in the foreign language? And settings for .po review could have a separate configuration page for test specific settings? This makes another argument to split up i18n review and .po review into two different reviews.