Problem

  • Some info hooks require you to declare human-readable values with t(). Some others don't. Getting it wrong potentially leads to security issues.
  • Info hook values for low-level stuff defined in code and similar things are always the same, but Drupal creates different, language-specific caches for them.

Goal

  • Restore sanity.
  • Drop language-specific caches. Only translate/localize strings on presentation.

Details

  • #503550: Translated strings are cached in _info() hooks and #813370: Hook_menu_alter() not supported by potx made us (inconsistently) introduce two patterns:
    1. Info hooks that define things using t() for human-readable values
    2. Info hooks that define things NOT using t() for human-readable values

    Nonsense? Nonsense.

  • Some (most) info hooks require you to use t() now, and we cache (the same) information for each language, just translated.
  • The mere reason for doing so is that we're not able to determine translatable/localizable strings through our potx template extractor otherwise.
  • Lacking a proper mechanism for doing so, we helped ourselves by introducing weird magic for functions like watchdog() that requires you to put the log message string within the function arguments on the same line in order to be identified by potx' string extractor.

Proposal

  1. Introduce tl(), standing for "translate later/lazy", [bikeshed], to signify a string to be extracted for string translation.
  2. Remove language-specific caches for info hooks, everywhere.
Files: 
CommentFileSizeAuthor
#12 d8_transltable.patch7.95 KBfago
FAILED: [[SimpleTest]]: [MySQL] 48,960 pass(es), 0 fail(s), and 16 exception(s).
[ View ]
#3 drupal8.tl_.3.patch1.5 KBsun
PASSED: [[SimpleTest]]: [MySQL] 35,353 pass(es).
[ View ]

Comments

Restoring sanity sounds nice :)

+1

Would we use that in hook_menu() as well?

This would also be helpful for contrib which have similar patterns (not just info hooks, but e.g. writing untranslated strings to the db and translate on display like menu/watchdog).

However, wouldn't we also need something like t_dear_potx_i_know_what_i_am_doing($variable) to prevent it from throwing those warnings that it currently does if you do t($variable)?

Status:Active» Needs review
StatusFileSize
new1.5 KB
PASSED: [[SimpleTest]]: [MySQL] 35,353 pass(es).
[ View ]

Perhaps we'd just remove the warning for t($variable) for D8?

Attached patch implements tl().

Of course, there will be a lot to convert after introducing this. I'd suggest to handle every single subsystem/API on its own in separate issues, since each of those patches will have to:

  1. Add tl() to all strings in the API hook implementations.
  2. Remove the per-language caching and whatever else from the API.
  3. Adjust all forms and view/output code to call t() on the strings.

Another thing that might make use of this are exception messages.

Those are currently a huge mess in core and there are open issues for that. Simply using tl() won't be enough for them because they might have arguments which need to be passed through (and either used when the exception is displayed and/or passed to watchdog() to save them).

Docblock in the patch looks nice, probably makes sense to add a reference to t() as well?

One of the main reasons this was not done before is performance. Why introduce a function call even for the most innocent thing as to return a string, which would slow down the page a bit. If we only use these with info hooks, it would of course only be called when the info caches are regenerated but it is a small hit nonetheless. Magic comments were suggested before, but those could be ugly too. Think /* t( */ and /* )t */ or something along those lines.

Also consider that we'd need to support format_plural() type singular/plural pairs as well. These are stored and retrieved in pair, so they not equal to t($singular) and t($plural). They were not equal in D7 and much less so in D8 that we changed their storage model too earlier. So with that in mind, you'd need format_plurall() too (note double l).

Finally, if this is all only needed for potx, and we can agree on a code convention that does not require API changes (no new function), it is easily introduced in any Drupal version. Potx is a contrib module and it can react to anything in the code so long as we code it in :) So we might not need a core change at all to make this happen. We might only need a coding style change.

(This is a recurring discussion, there are likely many places in issues and g.d.o where you can find previous instances of the same discussion. The above is my short recollection of what we discussed before in previous instances).

Also not sure why this would be a critical, especially that it can be solved outside core even.

@Gábor Hojtsy: I filed this as critical, because of the first and foremost problem mentioned in the OP:

Some info hooks require you to declare human-readable values with t(). Some others don't. Getting it wrong potentially leads to security issues.

As mentioned in #3, a tl() facility cannot be simply tacked on the current code. The actual API/module code needs to actively support and be aware of it. If hook implementations don't return translated strings anymore, then the strings need to be translated later on (on presentation, when translations are actually needed).

I don't consider performance to be an issue. It's only used in cases of info hook definitions (of which all are cached), and this is actually the typical no-op function being used in performance benchmarks. Even if you'd call this a million of times, like so:

const ITERATIONS = 1000000;
$data = 'text is the argument';
$start = microtime(true);
for ($i=0; $i < ITERATIONS; ++$i) {
}
$stop = microtime(true);
echo 'nothing: ' . ($stop - $start) . ' seconds' . PHP_EOL;
$start = microtime(true);
for ($i=0; $i < ITERATIONS; ++$i) {
  $text = tl($data);
}
$stop = microtime(true);
echo 'function tl(): ' . ($stop - $start) . ' seconds' . PHP_EOL;

then the total time is negligible:
> php debug.php
nothing: 0.26343202590942 seconds
function tl(): 1.5744860172272 seconds

On format_plural(), eek. ;) I'm not sure whether that use-case actually exists within the tl() scope. I'm only aware of one issue that wanted to add format_plural() support to watchdog() strings, but I think I marked that won't fix, because no one really cares whether log messages use proper plural language rules.

Drop language-specific caches. Only translate/localize strings on presentation.

That's ... not practical? Good caching involves storing pieces of rendered HTML, you can't really disassemble that into the source strings to translate that on the fly.

-1 from me on using a noop function as a marker. That has performance implications as well as hard dependency problems. Can we not use annotations somehow?

How does anyone else do this? Or do they simply use keyed lookup files (something I know we considered but decided against for some reason...)?

Whatever solution is implemented should then be used for annotations' Translation objects. (See #1683644: Use Annotations for plugin discovery).

Coming back to this one from #1853096: Integrate symfony validation violations with Drupal translation. Over there, we need a way to pass on a translated message such that afterwards we can output the translation + the untranslated message, e.g. to log it to watchdog. Problem being, our regular way to do this (pass on message template + args) does not cope with format_plural(), what we need absolutely for validation violation messages.

Thus, I was thinking about introducing a simple class holding translatable messages

<?php
class Translatable {
  
// Provide ways to construct it with
   // - single message + args
   // - plural message + args
   // - optionally, with context also
 
public function getMessage() {
     return
t($this->message, $this->arguments);
}
?>

Once we'd have such a simple value-object we could easily
- write potx support for that
- pass it on to functions that need to postpone translation like watchdog() or violation constraints.
- put it into our caches (that's why I'm posting it here ;)

Not sure whether having those objects in caches is a performance concern - that would obviously require benchmarks. But it would be a simple way to unify per-language caches and solve potx issues. Actually, I'm surprised I've not found any talk about a solution like this one here?

StatusFileSize
new7.95 KB
FAILED: [[SimpleTest]]: [MySQL] 48,960 pass(es), 0 fail(s), and 16 exception(s).
[ View ]

I did a quick patch and performance test. I converted t() calls from system_data_type_info() and tested deserialization performance via that code snippet.

I was not able to get a measurable performance difference, e.g.
Translatable classes
1424.64ms for 10000 runs.
0.142464ms per run

with pre-translated strings
1439.51ms for 10000 runs.
0.143951ms per run

Simple patch attached. Note, that we could use that class from plugin annotations would result in a better DX.

@fago: the patch at #1813762: Introduce unified interfaces, use dependency injection for interface translation proposes to swap most of what is under t() out to be pluggable, so I think there are probably lots of opportunities to work together here, instead of making things pluggable on different levels in parallel, somehow figure out the best way :)

Thanks for the pointer I was not aware of that issue, it doesn't seem to address the problem of postponing translation though?

No it does not do that.

Status:Needs review» Needs work

The last submitted patch, d8_transltable.patch, failed testing.

Also potentially relevant: #1843798: [meta] Refactor Render API to be OO

That issue isn't targeting translation per se, but the parallels are strong.

Version:8.x-dev» 9.x-dev

Does not seem to have any chance to happen in Drupal 8.

Issue summary:View changes

Updated issue summary.