Automated white space formatting for translations [#209980]

Comment	File	Size	Author
#18	fix-whitespace-matching.patch	816 bytes	gábor hojtsy
#16	l10n_white_space_fix.patch	771 bytes	meba
#11	l10n_server_trimming_2.patch	3.83 KB	meba
#9	l10n_server_trimming.patch	3.66 KB	jose reyero

Comment #1

meba commented 14 January 2008 at 21:17

More: this should be easy to fix, but what characters will we use for space and tab?

Log in or register to post comments

Comment #2

gábor hojtsy

he/him

Hungarian

Hungary

commented 16 January 2008 at 23:58

No-no :) We should basically trim($translation) and then add leading and trailing whitespace from the source string on submission. This ensures that the translator does not need to think about this. AFAIR, it is even documented that we do this, although we do not do this yet. Let's be smart where we can instead of nagging translators about small details :)

Ps. matching leading and trailing whitespace in source and translation is crucial for valid .po files for some gettext tools, so this is an important thing to solve

Log in or register to post comments

Comment #3

sutharsan commented 8 February 2008 at 17:42

Matching leading and trailing whitespace in source and translation is also crucial for packaging committed translations at cvs.drupal.org.
Packaging errors occur and no tarbal or release will be made until all non-matches have been solved!

A gettext editor like poEdit can be helpfull in finding the errors. It check and reports non matching lines (only by line number) when saving the po-file. Export your translation in the 'all on one file' format, open it in a poEdit and save it. Reported errors can be saved to disk for analysis.

Log in or register to post comments

Comment #4

sutharsan commented 8 February 2008 at 17:49

An other related finding I post here for reference.

A leading "\n" is translated as "\r\n" causing the packaging script to generate the error like this:
"nl/modules-filter.po:62: `msgid' and `msgstr' entries do not both begin with '\n' msgfmt: found 1 fatal error"

From the po-file:

#: modules/filter/filter.module:183
msgid ""
"\n"
"<p>This site allows HTML content. While learning all of HTML may feel intimidating, learning how to use a very small number of the most basic HTML \"tags\" is very easy. This table provides examples for each tag that is enabled on this site.</p>\n"
"<p>For more information see W3C's <a href=\"http://www.w3.org/TR/html/\">HTML Specifications</a> or use your favorite search engine to find other sites that explain HTML.</p>"
msgstr ""
"\r\n"
"<p>Op deze site kan HTML worden gebruikt. Terwijl het leren van de volledige HTML-code overweldigend kan zijn, is het leren van een beperkte aantal basis HTML-'tags' veel eenvoudiger. Deze tabel geeft een aantal voorbeelden van iedere tag die op deze site kan worden gebruikt.</p>\r\n"
"<p>Voor meer informatie zie de <a href=\"http://www.w3.org/TR/html/\">HTML-specificaties</a> van W3C of gebruik uw favoriete zoekmachine om een site te te vinden die uitleg over HTML geeft.</p>"

Log in or register to post comments

Comment #5

gábor hojtsy

he/him

Hungarian

Hungary

commented 8 February 2008 at 18:09

l10n_server should actually do:

- trim the translation on submission
- grab the leading and trailing whitespace from the source string as it is there exactly
- add the leading and trailing whitespace to the translation
- save the translation to the DB

This can all be automated, and should not require *any* manual work whatsoever.

Log in or register to post comments

Comment #6

sutharsan commented 9 February 2008 at 21:38

Before your step 3 leading and trailing white space should be stripped of the translated string.

Log in or register to post comments

Comment #7

gábor hojtsy

he/him

Hungarian

Hungary

commented 11 February 2008 at 10:43

Sutharsan: that was step 1: "trim the translation on submission"

Log in or register to post comments

Comment #8

gábor hojtsy

he/him

Hungarian

Hungary

commented 11 February 2008 at 12:47

Title:

A space at the end of string should be emphasized

» Automated white space formatting for translations

Log in or register to post comments

Comment #9

jose reyero commented 25 February 2008 at 17:36

Status:

Active

» Needs review

Status	File	Size
new	l10n_server_trimming.patch	3.66 KB

Here's a patch.

Update script included.

If this release finally gets packaged, it will mean that it works :-)
http://drupal.org/node/225715

Log in or register to post comments

Comment #10

gábor hojtsy

he/him

Hungarian

Hungary

commented 26 February 2008 at 13:58

Thanks! Look like packaging was fine. However this code looks interesting code-wise (all others looked all right to me):

"... WHERE s.sid = %d AND (t.sid IS NULL OR t.language = '%s' AND t.is_suggestion = 0 AND t.is_active = 1)", $sid, $langcode

What this does is to ensure the proper source string is selected with either no translation or a translation which is not a suggestion and is active. I'd add parenthesis to make sure that the processing order is right:

"... WHERE s.sid = %d AND ((t.sid IS NULL) OR (t.language = '%s' AND t.is_suggestion = 0 AND t.is_active = 1))", $sid, $langcode

This might also show the two cases better. I am not sure the code in the patch does the same. What do you think?

Log in or register to post comments

Comment #11

meba commented 7 March 2008 at 18:30

Status	File	Size
new	l10n_server_trimming_2.patch	3.83 KB

Rerolling against DRUPAL-5, fixed parenthesis as mentioned above.

Log in or register to post comments

Comment #12

gábor hojtsy

he/him

Hungarian

Hungary

commented 25 April 2008 at 18:19

Status:

Needs review

» Fixed

I have just applied this patch, tested and found it working well, so committed. Also removed the @todo item, which suggested that this should be done. Thanks, this will make people's live all the much easier!

Log in or register to post comments

Comment #13

dami commented 29 April 2008 at 16:07

Status:

Fixed

» Active

I think there is a minor problem with this patch. Correct me if I am wrong.

Line-breaks are taken care of and correctly formatted in database, but there is an inconsistency/mismatch when displaying these strings. If a string has a leading line-break ("\n"), this line-break is displayed (as a blue arrow) in original string, but is stripped out in translation string on view translations page. This causes confusion as a translator would see the missing blue arrow and try to fix it, but only to find out any line-break added to the beginning 'appears' (appears, because it's actually correctly inserted in database) to be automatically stripped out again.

Log in or register to post comments

Comment #14

gábor hojtsy

he/him

Hungarian

Hungary

commented 29 April 2008 at 16:16

Could you provide a patch for that?

Log in or register to post comments

Comment #15

dami commented 29 April 2008 at 17:00

I am on Windows and can't submit a patch now, but it's a simple fix. Just add /m to the regexp on line 736 of l10n_community.modules :

  preg_match("/^([$chars]*).*[^$chars]([$chars]*)\$/m", $source, $matches);

Log in or register to post comments

Comment #16

meba commented 28 July 2008 at 21:14

Status:

Active

» Needs review

Status	File	Size
new	l10n_white_space_fix.patch	771 bytes

This is a patch...

Log in or register to post comments

Comment #17

gábor hojtsy

he/him

Hungarian

Hungary

commented 24 August 2008 at 08:23

meba: did you test/verify that this fixes the issue?

Log in or register to post comments

Comment #18

gábor hojtsy

he/him

Hungarian

Hungary

commented 26 October 2008 at 15:23

Status:

Needs review

» Fixed

Status	File	Size
new	fix-whitespace-matching.patch	816 bytes

I've taken this patch and tested it. To me the /m did not seem like making any change (although my initial gut feeling was that it will be able to break things, since it makes ^ and $ the line beginning and ending delimiters, not the string beginning and ending delimiters. What I found missing however was /s, which lets the dot match newlines, so that this also applies to multiline strings. I've also noticed that we can just simplify the matching by using \s and \S, which are made exactly for whitespace matching, and include \n \t and friends. These also help us actually grab what trim() will eventually remove. I hope this makes this issue fixed once and for all.

Log in or register to post comments

Comment #19

Anonymous (not verified) commented 9 November 2008 at 15:33

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

Log in or register to post comments

Automated white space formatting for translations

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

News items

Our community

Documentation

Drupal code base

Governance of community