This was already discussed many times (especially in #19425): several strings (both in Drupal Core and in contrib) are simply too short not to be ambiguous in some languages. For example 'View', as in 'a view of the Views module', and 'View' as in the action of viewing something (a node, a user, etc.).

It boils down to this:

  • There are really very few cases where this happens. Thus we don't need a full featured context system, that will make lookups expensive.
  • Thus, we can simply add the context to those strings manually.
  • We need that information to be embedded inside .po files.

I suggest we come up with a convention like that one:

  • Strings passed to t() can be prefixed with an optional context. The exact way of doing that has to be discussed, but I suggest the [<context>]<string> convention, as it should not collide with existing strings, both in core and contrib.
  • For English, the context is simply automatically stripped by t().
  • For other languages, the full string, including the context is used as the lookup key. No fallback mechanism is implemented.
  • The full string is exported to .po files by potx, which thus requires no modification.

Examples:

// Short names in format_date() and such.
t("[short-week]Mon");
t("[short-week]Tue");
t("[short-week]Wed");

// View tab
  $items['user/%user/view'] = array(
    'title' => '[user-tab]View',
    'type' => MENU_DEFAULT_LOCAL_TASK,
    'weight' => -10,
  );

// etc.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gábor Hojtsy’s picture

Format date already does something like this in a special one-off way. It might be best to standardize on a placeholder format which is always cleared out.

Gábor Hojtsy’s picture

BTW what format_date() does in Drupal 6 is a one-off !long-month-name placeholder which is replaced with empty string and then trimmed, so that the separating space is properly deleted:

$date .= trim(t('!long-month-name '. gmdate($c, $timestamp), array('!long-month-name' => ''), $langcode));
Damien Tournoud’s picture

The format_date() special case for the long month name looks like a hack, especially that added trim() :)

As I don't think we would need that inside the string, we could standardize on something like "if the first character is '#' (or any other, I don't care), remove the beginning of the string up to the first space".

$date .= t('#long-month-name '. gmdate($c, $timestamp), array(), $langcode);
Gábor Hojtsy’s picture

The space was added, so that the month name itself is easily recognized and it not part of the placeholder hack. And the ! placeholder was reused because translators know what it is. We can make up a #long-month-name type of placeholder, which is always replaced with an empty string. I think the hard part will be to get people use common context names for View (as in Views module View) and View (as in View this content).

nedjo’s picture

Yes, this would be a welcome improvement.

I think the square brackets (or some other symbol at beginning and end) are more intuitive than the space for designating where the context placeholder ends.

dami’s picture

subscribe

nedjo’s picture

Issue tags: +i18n sprint
moshe weitzman’s picture

Priority: Normal » Critical

This plagues contrib as well. OG uses some context specific words like 'group'.

Panaromic.webdesign.com’s picture

Assigned: Unassigned » Panaromic.webdesign.com
Priority: Critical » Normal

I can take on this. I propose we will go with opening and ending with square bracket. For example, [user-tab]View to indicate not-to-translate to the translator.

KarenS’s picture

It's possible strings could have single brackets in a description, so maybe add the pound sign too to be sure of the intention, like:

[#long-month-name]May

Jose Reyero’s picture

Another take on context:

a. We just need context for short strings / single words
b. Doing things with strings is dirty
c. Aim for a more useful meaningful one

Thus we can:

1. Have an optional function which possibly doesn't need variables, like tc(string, context)
2. Have the context being human readable, so translators can use it too
3. Set up specific guidelines, translators will help us with their specific troubles
Example, for spanish we have trouble with English words being both a noun and a verb, so we could use
tc('post', 'Name for content sent')
tc('post', 'Action of sending content')

This way we'll have really useful context

Damien Tournoud’s picture

@José: on most translation workflow, the context is useless if not directly in the string. Adding context in .po files comments and support for them on l10n_server would require heavy modifications, including in potx.

Plus, all our all translation system is based on the source string being the key.

Adding simple context inside brackets is the way to go for me.

Jose Reyero’s picture

> on most translation workflow, the context is useless if not directly in the string.

The context in the string itself is mostly useless for non technical translators and it is really ugly when reading the code.

I am proposing a *really useful* one that both makes code more readable and can be displayed to translators as meaningful words.

> Plus, all our all translation system is based on the source string being the key

We just need to change that. Actually we are working on adding more options as we need it for user defined string

So I just mean if we are going to go through this, let's make it a) nice b) readable c) useful

Panaromic.webdesign.com’s picture

This is my line of thinking, let me know what you think:

/* GOWRI'S COMMENTS
* $string has the format #text describing the context#STRING-NEEDING-TRANSLATION
* if $string has no preceding #....# format, the STRING is treated as if it is in the default context form, i.e. noun in English
* if $string has #....# format, translator can use this additional context description to translate more accurately
*/
function t($string, $args = array(), $langcode = NULL) {
  global $language;
  static $custom_strings;

/* GOWRI's CODE WILL GO HERE  
  if ((strstr($string,'#'))) {
     // strip the context helper string from # to # 
     // and further process the stripped $string below
     }
*/
  $langcode = isset($langcode) ? $langcode : $language->language;

  // First, check for an array of customized strings. If present, use the array
  // *instead of* database lookups. This is a high performance way to provide a
  // handful of string replacements. See settings.php for examples.
  // Cache the $custom_strings variable to improve performance.

      
 ....NO CHANGE IN THE REST OF THE FUNCTION

Panaromic.webdesign.com’s picture

/* GOWRI'S COMMENTS
* $string has the format #text describing the context#STRING-NEEDING-TRANSLATION
* if $string has no preceding #....# format, the STRING is treated as if it is in the default context form, i.e. noun in English
* if $string has #....# format, translator can use this additional context description to translate more accurately
*/
function t($string, $args = array(), $langcode = NULL) {
  global $language;
  static $custom_strings;

  $langcode = isset($langcode) ? $langcode : $language->language;

  // First, check for an array of customized strings. If present, use the array
  // *instead of* database lookups. This is a high performance way to provide a
  // handful of string replacements. See settings.php for examples.
  // Cache the $custom_strings variable to improve performance.
  if (!isset($custom_strings[$langcode])) {
    $custom_strings[$langcode] = variable_get('locale_custom_strings_'. $langcode, array());
  }
  // Custom strings work for English too, even if locale module is disabled.
  if (isset($custom_strings[$langcode][$string])) {
    $string = $custom_strings[$langcode][$string];
  }
  // Translate with locale module if enabled.
  elseif (function_exists('locale') && $langcode != 'en') {
    $string = locale($string, $langcode);
  }
  // GOWRI's COMMENT
  // Now begin parsing the returned string
  if (strpos('#',$string)) {
      $pattern = '#([a-zA-Z]+)#(\w+)';
      $replace = '$2';
      $string = preg_replace($pattern, $replace, $string);
       }
  if (empty($args)) {
    return $string;
  }
  NO CHANGE AFTER THIS LINE
Jose Reyero’s picture

Following up with the idea in #11, some more considerations (and one more reason why it is better).

  tc('post', 'Name for content sent')
  tc('post', 'Action of sending content')

Context is module dependent, so if a hundred contrib modules define the same string with different context (even if the meaning is the same, they can mark the context differently we get to this: ******* We need to handle fallbacks *******

Think of this situation (string, context): the string (post, noun) may be translated for (post, content object) and (post, verb) and as a word without context too.
1. First we search for (post, noun), no luck
2. We search for the word without context, no luck either
3. Then we should be able to search for the same word in any context. If we add the context into the string '[#verb]post', then we cannot do this without some slow 'LIKE' query.

So I think we need to:
- Add a new context colum to locales table
- Use this new tc() function or similar that may fall back to t() if the word is not found for the required context
- About the po extractor we just need to append the context to the location, or replace the location by the context

Panaromic.webdesign.com’s picture

This is my proposal:

Syntax: t('string',array('#context'=>'xxx'))

Here xxx can be only one of the pre-defined strings, so far we have identified it can be either 'noun' or 'verb'. Any other idiosyncrasies of English language can be captured here as a context with a suitable id. Let us assume the default context is noun, thus a string without context is treated as a noun in further processing steps.

Inside t() function
First, check for error in context id, i.e. noun spelled as nuon can be rejected and error message can be sent at this stage.
Second, append the string with its context, i.e string becomes [#xxx]string
Third, hand over to locale module

Inside locale module
Retrieve the translated string by joining tables locales_target and locales_source (no change is required in the current code)
If it is a new string add to the table, note the context is already part of the string.
As each string with its context appended to it is different than the same string with a different context, they all get assigned different 'lid' key in the table hence, we don't need 'LIKE' in the query either.

Inside potx.inc
While processing string with context, add an additional information next to 'msgid' field pertaining to its context as a help tip for the human translators
plus append [#xxx] inside the msgstr field to avoid any typo errors. A typical entry in .pot will look like
#: source location information
msgid "string" msgcontext "verb"
msgstr "[#verb]"

The translator has to insert the translated string after the context information in msgstr field.

Although my solution is simple in terms of code or table change, but it does hinge on code modification in potx.inc. I don't know how easy or difficult this code change is, waiting for reply from Gabor. You can follow my posting here http://drupal.org/node/369943

Gábor Hojtsy’s picture

The context would then be supported by functions like format_plural() and friends. Like format_plural($num, '@count view', '@count views') has the same problem of not telling you whether view is a verb or noun. To me it sounds more logical to add a third parameter after the variables and before the language code to specify context and leave it at NULL by default.

Also, thinking should be put into the same things for menu items and such, so a storage mechanism needs to be figured out for menu item titles and other magically translated things as well. Menu item titles can equally be short, like 'View'.

KarenS’s picture

I would caution against trying to decide ahead of time what the allowed contexts will be. There are many other possible ones besides 'noun' and 'verb' and it will be impossible to identify all of them. In the Date API I'm trying to create context for things like 'a very short (one letter) abbreviation for Monday', 'a short (two letter) abbreviation for Monday', 'the normal (three letter) abbreviation for Monday', 'the abbreviation for May', 'the full month name for May', 'The word Repeats used in a menu tab', 'The word Repeats as a verb in a sentence', 'the word second as in hour/minute/second', 'the word second as in first/second/third'. It would be best if 'context' can be any text the module needs to provide to explain how this word is used.

hass’s picture

subscribe

Jose Reyero’s picture

Agree with Gabor (#18), context is better as a separate parameter we need all around. However a shorthand function for simple t(), may help writing code faster (and using context more) for short strings (#16)

Also I agree with KarenS (#16), that we cannot predefine contexts beforehand, as each module / usage can have its own very specific needs, so a plain string will be better. We just need a good fallback system so we don't need to have always 10 translations for the same string, just for the languages/words where it makes sense.

So for now maybe we should focus on how the API is going to look, think of simple but flexible 'contexts'. Standardization can come later when we have compiled a big list of context usage, and we have a full list of requests like 'I need to know whether this is a name or a verb' for translating to xxx language.

yhager’s picture

Subscribing

Xano’s picture

What about using tags like DBTNG does?

Freso’s picture

Subscribing. :)

Damien Tournoud’s picture

Status: Active » Needs review
FileSize
38.65 KB

Courtesy of the flight back home from DC... here is a preliminary patch for adding msgctxt support in Drupal. This:

- adds a $context parameter to t(), just before $langcode to limit API changes
- adds a $context parameter to format_plural()
- load msgctxt from .po files
- save msgctxt to .po files
- tweak the string translation UI a little bit

On the TODO:

- should the empty context be saved as NULL in the database?
- upgrade path from D6 to D7 (this is tricky)
- convert all this to DBTNG (apparently not done yet, but outside of the scope of this patch)

Status: Needs review » Needs work

The last submitted patch failed testing.

hass’s picture

Status: Needs work » Needs review

Great that you are working on it. A few findings... only patch code wise...

1. Is this change intended? Looks wrong.

 -  set_exception_handler('_drupal_exception_handler');
+  // set_exception_handler('_drupal_exception_handler');

2. I thought this patch adds a context, therefore I think we do not need the below any longer... but I''m not 100% sure.

 -      $date .= trim(t('!long-month-name ' . date_format($date_time, $c), array('!long-month-name' => ''), $langcode));
+      $date .= trim(t(date_format($date_time, $c), array('!long-month-name' => ''), 'Long month name', $langcode));
hass’s picture

Status: Needs review » Needs work

crosspost

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
38.21 KB

Fixing 2 and 3 from #27. As indicated above, the upgrade path for that is tricky, as it needs to happen before update.php bootstrapped the Drupal 7 code base, in something like we did for D6 (update_d6_requirements()).

Damien Tournoud’s picture

Oups.

Damien Tournoud’s picture

This one should actually pass.

hass’s picture

How should the "context" be used in generally? I have never worked with contexts...

Damien Tournoud’s picture

@hass: the use of this has been discussed in this issue. You can also take a look at [1] for a more generic introduction.

[1] http://www.gnu.org/software/gettext/manual/gettext.html#Contexts

hass’s picture

Yeah, but I thought more about defining rules for developers - how the context need to be named. Naming a context in a way like pgettext ("Menu|Printer|", "Connect") sounds very good and individually, but we are not developing GUI software like this and context names like 'Long month name' may potentially collide with other context names.

This question have not so much to do with your patch... it's more about the docs we need to write afterwards to tell others - make sure your context depend on module names or something like this... for e.g. "views menu|admin|build|views", "date_popup next" and "date_popup prev" for example. We need to have rules for this naming like code style rules... :-)

Thank you for the link.

hass’s picture

And we shouldn't forget to explain "why" we need this and "when" it need to be used (for e.g. strings with less then equals to - one/two/three words).

Damien Tournoud’s picture

@hass: those details will be ironed-out later. The patch only implement how to technically deal with contexts, see the rest of the issue for why this is needed. The context is *always* optional.

chx’s picture

I just would like to say that reading the issue filled me with dread but then people turned from the evil of string parsing to arrays. I like Damien's patch a lot.

chx’s picture

Please note that the world is moving en masse to msgctxt support instead of string mangling. http://www.poedit.net/trac/ticket/148 has a good of list of who: KDE, Gnome, Yii framework. poedit apparently keeps the msgctxt intact at least now even if it does not display it yet.

Gerhard Killesreiter’s picture

I like the general idea a lot.

I am not sure I like the signature of t(). I think that the patch at least has a lot of NULLs in there.

Maybe t($string, $args = array(), $context = array/()) would be better? in $context there would be $context['langcode'] and $context['context'].

The laguage code is after all only a special kind of context.

One could also merge both into the first array.

Damien Tournoud’s picture

@Gerhard: there are a lot of NULLs in the patch, because we simply don't use the context parameter for now. The idea is that we will probably use much more frequently the $context parameter then the $langcode parameter, hence the suggested order.

Merging language and context into one parameter? why not.

Freso’s picture

If we're merging language and context into one parameter, why not simply make a "context" index and a "language" index in $args? The $langcode parameter is not about context, it's about what translation to try and fetch - ie., it's saying nothing about the string at hand.

Status: Needs review » Needs work

The last submitted patch failed testing.

Jose Reyero’s picture

I like a lot the idea of an array parameter for all these, so we can easily extend t().

function t($string, $args = array(), $params = array())

where $params may contain:
- context
- langcode
- textgroup
etc.....

This will also help with this other patch #361597: CRUD API for locale source and locale target strings

Xano’s picture

Beware that this will probably be a heavy performance drain (merging with default params) or a PITA for developers (no defaults).

Damien Tournoud’s picture

@Xano: since when merging default parameter is an "heavy performance drain"?

Xano’s picture

"will probably" should have been "might". This is because t() is being called dozens if not hundreds of times per page load. Handling default values (and offering the possibility to translate to multiple languages, requiring Drupal to load more than just the current language) for so many calls might decrease Drupal's performance, but we'll need benchmarks to be sure.

Damien Tournoud’s picture

Status: Needs work » Postponed

I'll reroll that once #353883: Convert locale.inc to dbtng. is in.

Gábor Hojtsy’s picture

1. I agree that an array based argument format would be better. We are getting too many parameters here. I like the signature from Jose in #43 better then naming the extra array $context. $params sounds better, not misleading.

2. For the tests to truly test what we do here, some PO importing would be in order. Import simple .po files with and without context and see if the expected strings show up in the DB IMHO.

3. The msgctx support I totally agree with. As soon as core supports it, the other tools around it would need to be updated, but that is just natural :)

stella’s picture

Status: Postponed » Needs work

The patch in #353883: Convert locale.inc to dbtng. has now been committed, so re-opening.

Damien Tournoud’s picture

Title: Add simple context to t() » Add msgctxt-type context to t()
Assigned: Panaromic.webdesign.com » Damien Tournoud
Status: Needs work » Needs review
FileSize
53.95 KB

Here is a reroll, now with an import test! Let's see how it goes.

sun’s picture

Status: Needs review » Needs work

Awesome. Plain awesome. Even comes with tests.

The only thing that prevents me from marking as RTBC is that we want to put comments above the remarked line:

+      if ($context == "MSGSTR") {   // End current entry, start a new one

Given this patch, and the need for it in non-English speaking countries, I would like to propose to commit this patch after this (simple) re-roll. Since the testbot seems to be happy, we should take the "core-breaking-phase" serious, and make sure to flesh out any resulting fixes until D7 is released.

This deserves a CHANGELOG.txt entry as well. Great work, Damien!

sun’s picture

Priority: Normal » Critical

Bumping priority.

Freso’s picture

Hm. If you take a look at the locale.inc file (or just the patch), you'll see that there are plenty of lines with "end-line" comments, so I'm not sure this is an issue here. If it is, perhaps it would be better to apply this (noting its importance), and then make a new (code style) bug against locale.inc, for the "end-line" comments to be "above line" comments?

Freso’s picture

And here's the patch from #50 with CHANGELOG updated.

hass’s picture

Status: Needs work » Needs review
Dries’s picture

Issue tags: +Favorite-of-Dries

Excellent patch. It is almost ready to be committed:

1.

+ *     - 'context' (default to the empty context) The context the source string
+ *       belongs to.

For people that are unfamiliar with msgctxt (i.e. 99% of the people), I think it is worth explaining what the "context" is used for. It would not hurt to extend the examples.

2. There is no upgrade path yet for the database change.

webchick’s picture

subscribe, per sun, although it looks like Dries beat me to it. :)

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

FileSize
56.8 KB

So here is reroll with upgrade path with db changes.

Suppose possible somewhere in core still hook_mail functions that should be examined as already
contact.module, update.module, and user.module
for t(..., $langcode)

+ fixed 1 line in user.module

For me is not clear - what this for?

+++ modules/taxonomy/taxonomy.module	3 May 2009 08:35:38 -0000
@@ -311,7 +311,7 @@ function taxonomy_check_vocabulary_hiera
       $hierarchy = 2;
       break;
     }
-    elseif (count($term->parents) == 1 && 0 !== array_shift($term->parents)) {
+    elseif (count($term->parents) == 1 && current($term->parents) != 0) {
       $hierarchy = 1;
     }
   }
andypost’s picture

Status: Needs work » Needs review

forget to change status

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
56.81 KB

another try

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
56.83 KB
andypost’s picture

FileSize
56.92 KB

One more fix - update trying to create cache_path with t() in description

There's a still broken update process

andypost’s picture

FileSize
57.11 KB

Now upgrade fixed!!!

1) Field context added in update_fix_d7_requirements()
2) locale_update_7000() updates only primary key

Damien Tournoud’s picture

Status: Needs review » Reviewed & tested by the community

Thanks andy. This is ready to go in.

I don't exclude we might have missed some t() calls. Probably not much, but it is completely possible. We will fix those later on if we find them. The test bot is happy, which is already a good sign.

andypost’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
57.36 KB

Ops, not null should be TRUE, added check in update_7000 for existence of field

andypost’s picture

Status: Needs work » Needs review
FileSize
56.7 KB

locale_update_7000() is useless because no support to HEAD updates

so all moved to update_fix_d7_requirements()

Damien Tournoud’s picture

Status: Needs review » Needs work

It is messy to put the upgrade path for one part in update_fix_d7_requirements() and for other part in locale_update_*() function. I suggest we simply catch the PDO exception in locale() and return the English version of the string in that case. This way we can move all the upgrade path in locale_update_7000() where it belongs.

That will make the updated effectively fallback to English until we have created the missing column. It doesn't matter a lot: because the new translations are not already loaded, the updated process will be mostly in English anyway.

andypost’s picture

@Damien Tournoud catching of exceptions in this situation brings nothing except performance loss!!!

Because it breaks all logic with locale()

I strongly disagreed with changing locale() on this manner (exceptions should be visible)

Patch from #69 works but with removing locale_update_7000() current installs have to visit update.php

andypost’s picture

I cant make a good documentation of functions so need help with it

 * @param $string
 *   A string containing the English string to translate.
 * @param $args
 *   An associative array of replacements to make after translation. Incidences
 *   of any key in this array are replaced with the corresponding value. Based
 *   on the first character of the key, the value is escaped and/or themed:
 *    - !variable: inserted as is
 *    - @variable: escape plain text to HTML (check_plain)
 *    - %variable: escape text and theme as a placeholder for user-submitted
 *      content (check_plain + theme_placeholder)
 * @param $options
 *   An associative array of additional options, with the following keys:
 *     - 'langcode' (default to the current language) The language code to
 *       translate to a language other than what is used to display the page.
 *     - 'context' (default to the empty context) The context the source string
 *       belongs to.
 * @return
 *   The translated string.
 */
function t($string, array $args = array(), array $options = array()) {
/**
 * Provides interface translation services.
 *
 * This function is called from t() to translate a string if needed.
 *
 * @param $string
 *   A string to look up translation for. If omitted, all the
 *   cached strings will be returned in all languages already
 *   used on the page.
 * @param $context
 *   The context of this string.
 * @param $langcode
 *   Language code to use for the lookup.
 * @param $reset
 *   Set to TRUE to reset the in-memory cache.
 */
function locale($string = NULL, $context = NULL, $langcode = NULL, $reset = FALSE) {
andypost’s picture

Reroll for current HEAD

Because get_t() cannot be changed after full bootstrap we need add new field before migration process in update_fix_d7_requirements()
Also we need to change primary key for locale_source table:

next two patches

1) t-context-noupN.patch primary key updated inside update_fix_d7_requirements() so no hook_update_N

2) t-context-up7.patch with this function listed bellow

function locale_update_7000() {
  $ret = array();
  db_drop_index($ret, 'locales_source', 'source');
  db_add_index($ret, 'locales_source', 'source_context', array(array('source', 30), 'context'));
  return $ret;
}

Suppose first much better then ugly update_N

catch’s picture

The index update should be in locale_update_7000()

Documention for update_fix_VERSION_requirements() says that you shouldn't do anything destructive in there which would prevent dropping back to D6 - since just visiting update.php at all is going to change the state of your database. IMO adding and dropping an index comes under this.

andypost’s picture

FileSize
56.94 KB

Reroll tracking changes in HEAD

As proposed by catch

+ index changes moved into locale_update_7000()
+ field addition is in update_fix_7_requirements()

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

Status: Needs work » Needs review

HEAD was broke.

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
57.01 KB

reroll

Dries’s picture

Given that DamZ kicks this off, I'd really like him to review this patch.

Damien Tournoud’s picture

One little piece was missing: we need to add context to the string overrides example in default.settings.php... The only issue is that I have no idea how to explain how this works:

/**
 * String overrides:
 *
 * To override specific strings on your site with or without enabling locale
 * module, add an entry to this list. This functionality allows you to change
 * a small number of your site's default English language interface strings.
 *
 * Remove the leading hash signs to enable.
 */
-# $conf['locale_custom_strings_en'] = array(
+# $conf['locale_custom_strings_en'][''] = array(
 #   'forum'      => 'Discussion board',
 #   '@count min' => '@count minutes',
 # );

How to explain the '' magic?

Damien Tournoud’s picture

An updated patch with the change to default.settings.php.

andypost’s picture

Yes, this change was lost

As for me it works fine (tested upgrade d6-d7 on mysql)

andypost’s picture

Status: Needs review » Reviewed & tested by the community

suppose it should be commited then needs update docs

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks!

Dries’s picture

Status: Fixed » Needs work

Marking 'needs work' until documentation was updated.

webchick’s picture

Issue tags: +Needs documentation

tagging.

Gábor Hojtsy’s picture

Three notes on this:

- Drupal.t() and Drupal.formatPlural() was not updated to support contexts; it is very likely that people would like to print "May" in Javascript too with proper contexts; also we tried to do our best so far to match up the JS translation API to the PHP one
- While the tests use the "Short month name" and "Long month name" context, the actual common.inc code only uses the "Long month name" context, and uses empty context for the short one. This might be confusing. We need to have some guidelines on context usage. Should the "default" translation have empty context and others marked with their respective contexts (like on common.inc) or should they all have contexts (like in the tests).
- It would probably make sense to add a context-less item to the test .po file and test that too.

Sorry that I did not have time to review the actual patch in more detail earlier.

Damien Tournoud’s picture

@Gabor: I suggest we open separate issues for the javascript translation. Defining guidelines for context is also a separate issue.

Gábor Hojtsy’s picture

Issue tags: -Needs documentation

@Damien: even if defining guidelines is a separate issue, we should probably not use two different approaches in the testing code and the live code. Created #488496: Implement missing msgctx context in JS translation for feature parity with t() for the missing implementation of JS support for contexts.

Also, documented this change at http://drupal.org/node/224333#locale_context with:

Added string context support to t() and format_plural(), changed parameters

(issue) Before Drupal 7, the localization system was only capable of mapping the same character sequence to the same translation for a given language, regardless of the usage context. This results in errors in translations when simple strings like "view" or "track" are translated. "Track" for example can be used as a label when tracking users or a label for a music track in a media module. While English has the same string for both cases, other languages need different translations. The same issue applies to "May" as a month name in Drupal core. English uses "May" for both the short and long name of the month, so it was impossible to translate it to a short and long version in other languages. Therefore Drupal 6 introduced the !long-month-name prefix for long month names, but that was a one-off hack.

In Drupal 7, optional translation contexts were added. To make the optional parameters to t() and format_plural() easier to use, this also resulted in the existing $langcode parameter being folded into an $options array.

In Drupal 6:

// Translate to German.
t('Welcome to our site', array(), 'de');
// Translate to German with replacement.
t('!user, welcome to our site', array('!user' => theme('username', $user)), 'de');
// Translate May in the long version to current page language.
t('!long-month-name May');
// Translate May in the short version to current page language.
t('May');
// Translate May in the long version to German.
t('!long-month-name May', array(), 'de');
// Translate May in the short version to German.
t('May', array(), 'de');

In Drupal 7:

// Translate to German.
t('Welcome to our site', array(), array('langcode' => 'de'));
// Translate to German with replacement.
t('!user, welcome to our site', array('!user' => theme('username', $user)), array('langcode' => 'de'));
// Translate May in the long version to current page language.
t('May', array(), array('context' => 'Long month name'));
// Translate May in the short version to current page language.
t('May');
// Translate May in the long version to German.
t('!long-month-name May', array(), array('context' => 'Long month name', 'langcode' => 'de'));
// Translate May in the short version to German.
t('May', array(), array('langcode' => 'de'));

To recap, $langcode becomes an $options array with the possible keys 'langcode' and 'context'. The 'context' is freely defined by your module, Drupal core only defines the 'Long month name' context which is used for the long month names. Because setting a custom 'context' disables sharing translations of your module strings with other modules not using the same context, you should think twice before introducing contexts.

Please review and correct as needed. (Also not sure we have a place to document settings.php changes and changes for translators).

Keeping on needs work for fixing the test!

Damien Tournoud’s picture

Assigned: Damien Tournoud » Unassigned
Priority: Critical » Normal
Issue tags: +Novice

Tagging as novice to fix the tests, and lowering the priority.

hass’s picture

I'm not sure why we use !long-month-name in the D7 examples if this is the currently used hack/workaround for missing context support. I would suggest that we recommend NOT to use t('May'); at all as this will cause context sensitive issues... I'm not sure if we should recommend to use contexts for all single/two word strings.

// Translate May in the long version to German.
t('!long-month-name May', array(), array('context' => 'Long month name', 'langcode' => 'de'));
Summit’s picture

Hi, Would this also be possible to use in D6, adding the msgctxttype, and langcode. I would like to get to the english locale stored equivalent msg-string through the dutch stored msg-string, while using dutch as default language in locale.
I am sorry if this is not the appropreate place to ask.

EDIT: it seems looking at http://drupal.org/node/82499 t($string, $vars, $langcode) could work..
Sorry to disturb you if this is the case.
Martijn

macgirvin’s picture

a bit late but subscribe

hass’s picture

Status: Needs work » Needs review

@Garbor: I suggest to change the D7 examples to the below for better understanding. I have introduced February with a real short and long version for better understanding and removed the D6 hack completely:

// Translate to German.
t('Welcome to our site', array(), array('langcode' => 'de'));
// Translate to German with replacement.
t('!user, welcome to our site', array('!user' => theme('username', $user)), array('langcode' => 'de'));
// Translate May in the long version to current page language.
t('May', array(), array('context' => 'Long month name'));
// Translate May in the short version to current page language.
t('May', array(), array('context' => 'Short month name'));
// Translate February in the long version to current page language.
t('February', array(), array('context' => 'Long month name'));
// Translate February in the abbreviated version to current page language.
t('Feb', array(), array('context' => 'Short month name'));
// Translate May in the long version to German.
t('May', array(), array('context' => 'Long month name', 'langcode' => 'de'));
// Translate May in the short version to German.
t('May', array(), array('context' => 'Short month name'), 'langcode' => 'de'));

Status: Needs review » Needs work

The last submitted patch failed testing.

Gábor Hojtsy’s picture

hass: As I explained above, 'Short month name' is a context mistakenly used in the .test, while it is not actually used by Drupal core. This issue is marked "needs work" to either fix the test or fix the lookup of short month names, because they are inconsistent. Otherwise you noticed perfectly, that there was a copy-paste error, !long-month-name is not in D7 :) So I've fixed that to:

// Translate to German.
t('Welcome to our site', array(), array('langcode' => 'de'));
// Translate to German with replacement.
t('!user, welcome to our site', array('!user' => theme('username', $user)), array('langcode' => 'de'));
// Translate May in the long version to current page language.
t('May', array(), array('context' => 'Long month name'));
// Translate May in the short version to current page language.
t('May');
// Translate May in the long version to German.
t('May', array(), array('context' => 'Long month name', 'langcode' => 'de'));
// Translate May in the short version to German.
t('May', array(), array('langcode' => 'de'));

Otherwise we need resolution on the contexts used for month names in test/core wise.

hass’s picture

Maybe I do not understand contexts, too - but only because of the short version is not yet used in core - why shouldn't we use or explain it? Please see my above examples. I think they should better be added to the examples.

The short version of "May" is a very bad example/idea... we have seen many more difficulties in the date module as in core. I cannot remember all, but a few are "Second" what could be a "time range" or "2." or "the second of something" (difficult to explain if something have the same word in English) or "min" for "minute" and as abbreviation of "minimum". Therefore I would never ever use single words without contexts if I cannot be 100% sure this is a unique word like "February" (however I cannot say myself if there are different meanings in other languages for this word). I'm pretty sure all 3-letter-words could easily become context sensitive issues.

Bad and could cause context sensitive issues:

// Translate May in the short version to current page language.
t('May');

Good and much saver about context sensitive issues:

// Translate May in the long version to current page language.
t('May', array(), array('context' => 'Long month name'));
// Translate May in the short version to current page language.
t('May', array(), array('context' => 'Short month name'));
// Translate February in the long version to current page language.
t('February', array(), array('context' => 'Long month name'));
// Translate February in the abbreviated version to current page language.
t('Feb', array(), array('context' => 'Short month name'));
// Translate May in the long version to German.
t('May', array(), array('context' => 'Long month name', 'langcode' => 'de'));
// Translate May in the short version to German.
t('May', array(), array('context' => 'Short month name'), 'langcode' => 'de'));
Gábor Hojtsy’s picture

hass: Adding the "Short month name" context to the example does not make it work. Core does not look up the short "May" with that context.

hass’s picture

Yes, if not used in core, but if myModule use makes use of "Short month name" as context - it is looked up if the string is shown in myModule.

Gábor Hojtsy’s picture

Well, we are providing core examples in this upgrade guide, so if let's stick to how that works. I agree that 'May' standalone might be misleading and that adding the explicit "Short month name" context could be useful.

andypost’s picture

Suppose we forget about short month names

So I think better to make "Date" strings in context or provide special handling of "Short month name" as already for "F"

    if (strpos('AaeDlMT', $c) !== FALSE) {
-      $date .= t(date_format($date_time, $c), array(), array('langcode' => $langcode));
+      $date .= t(date_format($date_time, $c), array(), array('context' => 'Date', 'langcode' => $langcode));
    }
    elseif ($c == 'F') {
      // Special treatment for long month names: May is both an abbreviation
      // and a full month name in English, but other languages have
      // different abbreviations.
      $date .= t(date_format($date_time, $c), array(), array('context' => 'Long month name', 'langcode' => $langcode));
    }
hass’s picture

There is a need to add a context to none strings. Today all this "none" and "None" strings are not translated properly (context sensitive) in German.

Damien Tournoud’s picture

@hass: that's what contexts are for, please open a separate issue!

Gábor Hojtsy’s picture

I'm in the middle of implementing parsing for contexts in potx module, so that actual translators can provide proper translations. Some interesting tidbits I found:

- menu hook strings (tab titles, descriptions) cannot support contexts, however, ambiguous tab titles such as "Track" can easily show up; only if custom title callback is used
- region names and any other .info strings are in the same situation; given that .info strings are mostly unique module names and descriptions, this should not be much of an issue
- as said above, there is no JS level support yet for contexts

This is it.

Gábor Hojtsy’s picture

Also, since st() and friends did not support anything beyond replacement variables, context support is not on install time strings either. Complicates the maze of functions quite a bit frankly.

Damien Tournoud’s picture

I would argue that tab titles, tab descriptions, region names, module names, etc. should all have their how context. Context cannot solve the confusion if two tabs are already confusing in English (for example "View" as a verb or as a noun).

Gábor Hojtsy’s picture

@Damien did you mean tabs should have their *own* context? That separates them from the rest of the strings, which would generally be bad and would not solve that there can be a tab View which is a noun and another one which is a verb. Context is exactly supposed to solve this problem, and it does solve it with a title callback, so an answer might be that people need to use a title callback in such a case.

Damien Tournoud’s picture

@Gabor: Yes, I mean that tab should have their own "Menu tab" context. As I said above: "contexts cannot solve the confusion if two tabs are already confusing in English (for example having two tabs labeled "View" as a verb or as a noun)."

Gábor Hojtsy’s picture

@Damien: I think having contexts for where it is used is not a good idea. Contexts should be used to resolve translation conflicts. A "menu tab" context will not solve this problem, it would segregate menu items and make it harder to reuse translations for menu items which are otherwise the same in page titles, etc.

@all: I've implemented initial support for contexts in the D7 branch of the template extractor (potx) module, so translators could actually extract string with context. My implementation diff is on the comment at http://drupal.org/node/369943#comment-1819430 so you can glance over that and see places where I used POTX_CONTEXT_NONE to denote things which don't support context (but there are things not marked with this one even). Things like $t, st, Drupal.t, Drupal.formatPlural, watchdog(), hook_menu() etc translate strings but are context-less. I'll go on to backport these changes to the D6 version of potx and adding context support to l10n_server so that translators can start translating D7 stuff when the time comes with context supported.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
1.8 KB

Here is the test fixes and one follow up bugfix attached.

1. As discussed above, Drupal core does not use the "Short month name" context, so it should not test for it either. That is easily misleading.
2. Also, the import code in locale.inc was missing a $. Found out while working on import support for l10n_server and ported code from this patch to that Drupal 6 module at #524112: Support for working with Drupal 7 context data.

Dries’s picture

I committed the bug fix. We can continue to discuss the other issues. :)

Status: Needs review » Needs work

The last submitted patch failed testing.

Sutharsan’s picture

The Views example Damian uses in #107 can be solved by using a linguistic context instead of a web or drupal context. "Verb" and "Noun" should be contexts to translate "View" correctly.

Gábor Hojtsy’s picture

@Sutharsan: well, view as a noun can still mean multiple stuff :) "view" as in "what a nice view you have here in Paris" or "I've just set up this database view in PostgreSQL" or "SimpleViews module just makes setting up a basic view much easier". When appearing in a module, a webcam module might have "view" as a noun in the above sense (a bit contrived I admit), a database management module could have the database view thing and Views surely has the Views view thing.

sun’s picture

Status: Needs work » Fixed

I think this issue is fixed and we should move to (at least one) separate issue to discuss potential contexts, no?

Let's discuss that stuff in a separate issue. Please add a link here. (I'd create one if I'd know all details of this thread to post a sane summary as starting point.)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

andypost’s picture

Priority: Normal » Critical
Status: Closed (fixed) » Needs review
FileSize
1.5 KB
1.03 KB

Upgrade path is broken http://cvs.drupal.org/viewvc.py/drupal/drupal/modules/locale/locale.inst...

See attached diff within locale_schema(), but we still have no db_add_foreign_key()

Patch trying to fix this and reorder update functions

If issue #221020: locales_source.location is just varchar(255) and it is not truncated changed for D6 then 7001 should go to 6-extra section

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
792 bytes

Reroll

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Thanks.

andypost’s picture

Issue tags: +msgctx

Tagged

Gábor Hojtsy’s picture

If the last msgstr value in the .po file is empty, the .po file is considered broken because of the changes in the patch. Follow up bug report opened at #611786: .po file should not be considered broken if last msgstr is empty.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

moshe weitzman’s picture

Gábor Hojtsy’s picture

Issue tags: +string context

Adding tag.

hedac’s picture

subscribing.. interested also in drupal 6 version for this