Posted by meba on January 14, 2008 at 9:16pm
| Project: | Localization server |
| Version: | 5.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
Some strings end with a space, which is not visible in translation interface and resulting translation may be cluttered up a bit, if something depends on that space (connected strings).
Comments
#1
More: this should be easy to fix, but what characters will we use for space and tab?
#2
No-no :) We should basically trim($translation) and then add leading and trailing whitespace from the source string on submission. This ensures that the translator does not need to think about this. AFAIR, it is even documented that we do this, although we do not do this yet. Let's be smart where we can instead of nagging translators about small details :)
Ps. matching leading and trailing whitespace in source and translation is crucial for valid .po files for some gettext tools, so this is an important thing to solve
#3
Matching leading and trailing whitespace in source and translation is also crucial for packaging committed translations at cvs.drupal.org.
Packaging errors occur and no tarbal or release will be made until all non-matches have been solved!
A gettext editor like poEdit can be helpfull in finding the errors. It check and reports non matching lines (only by line number) when saving the po-file. Export your translation in the 'all on one file' format, open it in a poEdit and save it. Reported errors can be saved to disk for analysis.
#4
An other related finding I post here for reference.
A leading "\n" is translated as "\r\n" causing the packaging script to generate the error like this:
"nl/modules-filter.po:62: `msgid' and `msgstr' entries do not both begin with '\n' msgfmt: found 1 fatal error"
From the po-file:
#: modules/filter/filter.module:183msgid ""
"\n"
"<p>This site allows HTML content. While learning all of HTML may feel intimidating, learning how to use a very small number of the most basic HTML \"tags\" is very easy. This table provides examples for each tag that is enabled on this site.</p>\n"
"<p>For more information see W3C's <a href=\"http://www.w3.org/TR/html/\">HTML Specifications</a> or use your favorite search engine to find other sites that explain HTML.</p>"
msgstr ""
"\r\n"
"<p>Op deze site kan HTML worden gebruikt. Terwijl het leren van de volledige HTML-code overweldigend kan zijn, is het leren van een beperkte aantal basis HTML-'tags' veel eenvoudiger. Deze tabel geeft een aantal voorbeelden van iedere tag die op deze site kan worden gebruikt.</p>\r\n"
"<p>Voor meer informatie zie de <a href=\"http://www.w3.org/TR/html/\">HTML-specificaties</a> van W3C of gebruik uw favoriete zoekmachine om een site te te vinden die uitleg over HTML geeft.</p>"
#5
l10n_server should actually do:
- trim the translation on submission
- grab the leading and trailing whitespace from the source string as it is there exactly
- add the leading and trailing whitespace to the translation
- save the translation to the DB
This can all be automated, and should not require *any* manual work whatsoever.
#6
Before your step 3 leading and trailing white space should be stripped of the translated string.
#7
Sutharsan: that was step 1: "trim the translation on submission"
#8
#9
Here's a patch.
Update script included.
If this release finally gets packaged, it will mean that it works :-)
http://drupal.org/node/225715
#10
Thanks! Look like packaging was fine. However this code looks interesting code-wise (all others looked all right to me):
"... WHERE s.sid = %d AND (t.sid IS NULL OR t.language = '%s' AND t.is_suggestion = 0 AND t.is_active = 1)", $sid, $langcodeWhat this does is to ensure the proper source string is selected with either no translation or a translation which is not a suggestion and is active. I'd add parenthesis to make sure that the processing order is right:
"... WHERE s.sid = %d AND ((t.sid IS NULL) OR (t.language = '%s' AND t.is_suggestion = 0 AND t.is_active = 1))", $sid, $langcodeThis might also show the two cases better. I am not sure the code in the patch does the same. What do you think?
#11
Rerolling against DRUPAL-5, fixed parenthesis as mentioned above.
#12
I have just applied this patch, tested and found it working well, so committed. Also removed the @todo item, which suggested that this should be done. Thanks, this will make people's live all the much easier!
#13
I think there is a minor problem with this patch. Correct me if I am wrong.
Line-breaks are taken care of and correctly formatted in database, but there is an inconsistency/mismatch when displaying these strings. If a string has a leading line-break ("\n"), this line-break is displayed (as a blue arrow) in original string, but is stripped out in translation string on view translations page. This causes confusion as a translator would see the missing blue arrow and try to fix it, but only to find out any line-break added to the beginning 'appears' (appears, because it's actually correctly inserted in database) to be automatically stripped out again.
#14
Could you provide a patch for that?
#15
I am on Windows and can't submit a patch now, but it's a simple fix. Just add /m to the regexp on line 736 of l10n_community.modules :
<?phppreg_match("/^([$chars]*).*[^$chars]([$chars]*)\$/m", $source, $matches);
?>
#16
This is a patch...
#17
meba: did you test/verify that this fixes the issue?
#18
I've taken this patch and tested it. To me the /m did not seem like making any change (although my initial gut feeling was that it will be able to break things, since it makes ^ and $ the line beginning and ending delimiters, not the string beginning and ending delimiters. What I found missing however was /s, which lets the dot match newlines, so that this also applies to multiline strings. I've also noticed that we can just simplify the matching by using \s and \S, which are made exactly for whitespace matching, and include \n \t and friends. These also help us actually grab what trim() will eventually remove. I hope this makes this issue fixed once and for all.
#19
Automatically closed -- issue fixed for two weeks with no activity.