Some strings end with a space, which is not visible in translation interface and resulting translation may be cluttered up a bit, if something depends on that space (connected strings).

Comments

meba’s picture

More: this should be easy to fix, but what characters will we use for space and tab?

gábor hojtsy’s picture

No-no :) We should basically trim($translation) and then add leading and trailing whitespace from the source string on submission. This ensures that the translator does not need to think about this. AFAIR, it is even documented that we do this, although we do not do this yet. Let's be smart where we can instead of nagging translators about small details :)

Ps. matching leading and trailing whitespace in source and translation is crucial for valid .po files for some gettext tools, so this is an important thing to solve

sutharsan’s picture

Matching leading and trailing whitespace in source and translation is also crucial for packaging committed translations at cvs.drupal.org.
Packaging errors occur and no tarbal or release will be made until all non-matches have been solved!

A gettext editor like poEdit can be helpfull in finding the errors. It check and reports non matching lines (only by line number) when saving the po-file. Export your translation in the 'all on one file' format, open it in a poEdit and save it. Reported errors can be saved to disk for analysis.

sutharsan’s picture

An other related finding I post here for reference.

A leading "\n" is translated as "\r\n" causing the packaging script to generate the error like this:
"nl/modules-filter.po:62: `msgid' and `msgstr' entries do not both begin with '\n' msgfmt: found 1 fatal error"

From the po-file:

#: modules/filter/filter.module:183
msgid ""
"\n"
"<p>This site allows HTML content. While learning all of HTML may feel intimidating, learning how to use a very small number of the most basic HTML \"tags\" is very easy. This table provides examples for each tag that is enabled on this site.</p>\n"
"<p>For more information see W3C's <a href=\"http://www.w3.org/TR/html/\">HTML Specifications</a> or use your favorite search engine to find other sites that explain HTML.</p>"
msgstr ""
"\r\n"
"<p>Op deze site kan HTML worden gebruikt. Terwijl het leren van de volledige HTML-code overweldigend kan zijn, is het leren van een beperkte aantal basis HTML-'tags' veel eenvoudiger. Deze tabel geeft een aantal voorbeelden van iedere tag die op deze site kan worden gebruikt.</p>\r\n"
"<p>Voor meer informatie zie de <a href=\"http://www.w3.org/TR/html/\">HTML-specificaties</a> van W3C of gebruik uw favoriete zoekmachine om een site te te vinden die uitleg over HTML geeft.</p>"
gábor hojtsy’s picture

l10n_server should actually do:

- trim the translation on submission
- grab the leading and trailing whitespace from the source string as it is there exactly
- add the leading and trailing whitespace to the translation
- save the translation to the DB

This can all be automated, and should not require *any* manual work whatsoever.

sutharsan’s picture

Before your step 3 leading and trailing white space should be stripped of the translated string.

gábor hojtsy’s picture

Sutharsan: that was step 1: "trim the translation on submission"

gábor hojtsy’s picture

Title: A space at the end of string should be emphasized » Automated white space formatting for translations
jose reyero’s picture

Status: Active » Needs review
StatusFileSize
new3.66 KB

Here's a patch.

Update script included.

If this release finally gets packaged, it will mean that it works :-)
http://drupal.org/node/225715

gábor hojtsy’s picture

Thanks! Look like packaging was fine. However this code looks interesting code-wise (all others looked all right to me):

"... WHERE s.sid = %d AND (t.sid IS NULL OR t.language = '%s' AND t.is_suggestion = 0 AND t.is_active = 1)", $sid, $langcode

What this does is to ensure the proper source string is selected with either no translation or a translation which is not a suggestion and is active. I'd add parenthesis to make sure that the processing order is right:

"... WHERE s.sid = %d AND ((t.sid IS NULL) OR (t.language = '%s' AND t.is_suggestion = 0 AND t.is_active = 1))", $sid, $langcode

This might also show the two cases better. I am not sure the code in the patch does the same. What do you think?

meba’s picture

StatusFileSize
new3.83 KB

Rerolling against DRUPAL-5, fixed parenthesis as mentioned above.

gábor hojtsy’s picture

Status: Needs review » Fixed

I have just applied this patch, tested and found it working well, so committed. Also removed the @todo item, which suggested that this should be done. Thanks, this will make people's live all the much easier!

dami’s picture

Status: Fixed » Active

I think there is a minor problem with this patch. Correct me if I am wrong.

Line-breaks are taken care of and correctly formatted in database, but there is an inconsistency/mismatch when displaying these strings. If a string has a leading line-break ("\n"), this line-break is displayed (as a blue arrow) in original string, but is stripped out in translation string on view translations page. This causes confusion as a translator would see the missing blue arrow and try to fix it, but only to find out any line-break added to the beginning 'appears' (appears, because it's actually correctly inserted in database) to be automatically stripped out again.

gábor hojtsy’s picture

Could you provide a patch for that?

dami’s picture

I am on Windows and can't submit a patch now, but it's a simple fix. Just add /m to the regexp on line 736 of l10n_community.modules :

  preg_match("/^([$chars]*).*[^$chars]([$chars]*)\$/m", $source, $matches);
meba’s picture

Status: Active » Needs review
StatusFileSize
new771 bytes

This is a patch...

gábor hojtsy’s picture

meba: did you test/verify that this fixes the issue?

gábor hojtsy’s picture

Status: Needs review » Fixed
StatusFileSize
new816 bytes

I've taken this patch and tested it. To me the /m did not seem like making any change (although my initial gut feeling was that it will be able to break things, since it makes ^ and $ the line beginning and ending delimiters, not the string beginning and ending delimiters. What I found missing however was /s, which lets the dot match newlines, so that this also applies to multiline strings. I've also noticed that we can just simplify the matching by using \s and \S, which are made exactly for whitespace matching, and include \n \t and friends. These also help us actually grab what trim() will eventually remove. I hope this makes this issue fixed once and for all.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.