This is a forked off issue of Steven's comment on the installer localization patch. Here he suggests we should extend t() to support escaping and escaping + placeholder display of text without calling the actual functions. Much like db_query() does the escaping magic when needed.

My approach to the issue is to extend on what we have now, instead of redefining the prefixes as Steven indicated. %value would be kept for simple substitution, but %@value is introduced for placeholder substitution (so that check_plain() is called and em tags are added around the value by default). Pros of this approach is that a lot of code becomes more readable, and that the installer support would become easier (by using this syntax there, not by using the t() function itself). The con is that t() always performs a foreach on arguments, which takes a small amount of time.

The attached patch is only a small fraction of t() calls changed (only include files). Since this needs to be done manually, I would not go into modules and core themes before a method is agreed upon, so my work is not thrown out. I have also been unable to find only check_plain()-d arguments in include files, so not implemented support for them (yet?).

Comments

Steven’s picture

StatusFileSize
new280.35 KB

Actually I already have a finished conversion of core sitting here. I just got busy with eaton's install patch :/.

For those who just tuned in:

This patch tries to address the fact that nearly every call to t() uses check_plain() or theme('placeholder') on its arguments (or at least that's what people should be doing). Putting theme('placeholder') calls inside a t()-argument array means a lot of nesting, a lot of long lines and code that is hard to read.

So, why not let the conversion be done by t() itself, and make st() identical? We already implemented a similar technique in install.inc's st() function (rather on a whim), which is hardcoded to pass its arguments through theme_placeholder(). This has the benefit of reducing code bloat, but forces everyone to use the emphasized placeholder style. You might have noticed that the installer tends to have lots of italics.

So, here's an even better solution: we let the first character of the substitution key define the type of value:

  • %value: The item is passed through theme('placeholder') (and thus also check_plain()d)
  • @value: The item is passed through check_plain() (useful when you don't want to draw attention with excessive emphasizing).
  • !value: The item is inserted as is (useful for non-HTML contexts and for inserting whole chunks of HTML into HTML).

Note that contrary to Goba's solution, I chose to change all t() calls around by changing the meaning of "%value". Why? Because too many contributed modules (and some parts of core) still don't use placeholders properly. Using this convention, we can fix a bunch of possible XSS exploit vectors at once. Of course, those modules that already work will need small changes, but those authors should already be familiar with the needed steps. They can remove any theme('placeholder') or check_plain() calls of your own, or use the !-prefix.

I also picked ! as the prefix for as-is insertion because it denotes "warning" and "danger": you're not doing any escaping here.

Examples:


// Here we choose not to emphasize the type (@), but only the title and revision (%)
watchdog('content',
  t('@type: reverted %title revision %revision.',
    array(
      '@type' => t($node->type),
      '%title' => $node->title,
      '%revision' => $revision
    )
  )
);


// Here we are assembling two pieces of existing HTML. No escaping, but safe in this case.
t('!date by !username',
  array(
    '%date' => l(format_date($revision->timestamp, 'small'), "node/$node->nid"),
    '%username' => theme('username', $revision)
  )
);

The @ prefix is very useful for st() / install.php in fact where we don't want to emphasize the name of the install profile (e.g. Drupal or Civicspace).

The only downside of this scheme is that it affects tons of translatable strings.

Steven’s picture

That last example should of course be:

// Here we are assembling two pieces of existing HTML. No escaping, but safe in this case.
t('!date by !username',
  array(
    '!date' => l(format_date($revision->timestamp, 'small'), "node/$node->nid"),
    '!username' => theme('username', $revision)
  )
);
gábor hojtsy’s picture

Assigned: gábor hojtsy » Unassigned
Status: Needs review » Reviewed & tested by the community

I intended to keep backwards compatibility but if it is for the sake of XSS fixing then so be it :) Translations will need to look out for a huge amount of small string changes. Maybe we can cook a simple script to help them update their translations? Hm. Anyway, I am fine with the changes. I have reviewed the patch, and the array_walk() approach with the new _t() and _st() seems to be fine. IMHO RTBC.

moshe weitzman’s picture

seems like a good idea to me. this will cause loads of modules to review/change their escaping, but i guess that could be considered a good thing.

Jaza’s picture

Are there any precedents from other applications, programming languages, or environments, that we should be considering when making our choice of placeholder prefixes? Although I think that Steven's logic behind the choices is sound, I'd still like to see some evidence that research has done into existing use of prefixes in the wider world, before we go ahead and commit a patch that will potentially break with already-established conventions.

One precedent that comes to my mind: the '@' symbol is the 'shut-up operator' (a.k.a. the 'error control operator') in PHP. I think that our proposed use of the '@' symbol for placeholders fits reasonably well with the way that PHP uses it (i.e. overriding the default 'safe behaviour'). Can't think of a precedent for the '!' character - except that some developers might confuse it with the PHP/C/Java/etc '!' (negation, or 'bang') operator. But the argument that '!' indicates 'warning' sounds like good enough justification to me.

drumm’s picture

Status: Reviewed & tested by the community » Needs work

+1, except:

- No longer applies.
- There is line-wrapping (search for 'gator/opml').

chx’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new266.23 KB

Rerolled.

Steven’s picture

Status: Reviewed & tested by the community » Needs review

Updated with HEAD (and fixed wrapping...).

Proposed update instructions:

The t() function was changed to be able to transparently escape and format its arguments to be safe for output. The idea is that where you previously called check_plain() or theme('placeholder') on an argument before passing it to t(), this is now done for you, based on the first character of the substitution key:

// Before:
print t('%type: %title was posted', array('%type' => check_plain($node->type),  '%title' => theme('placeholder', $node->title)));
print t('Submitted on %date', array('%date' => theme('username', $node)));

// After:
print t('@type: %title was posted', array('@type' => $node->type,  '%title' => $node->title));
print t('Submitted on !date', array('!date' => theme('username', $node)));

Note that in the first print statement, we removed the calls to check_plain() and theme('placeholder') as they are now applied automatically to arguments whose key starts with respectively '@' and '%'. In the second case, there was no escaping before, so we use the '!' prefix to insert the argument as is.

Using the above code as a template for upgrading your module only works if your module used check_plain() and theme('placeholder') when appropriate. It is important to use them in output to avoid XSS security problems. If you are not sure when to use them, it is better to try '%' or '@' first, and only switch to '!' when the first is causing problems. It is advised to read the book page about dealing with text in Drupal in a secure way as it contains coding guidelines and examples of good and bad code.

And obviously, you could just be lazy and replace all '%placeholders' with '!placeholders' to get the same behaviour as before. But that doesn't help you make your code more readable.

dries’s picture

1. Before this can be committed, I'd like to understand the performance implications of this change. After all, t() is called a lot.

2. Can someone summarize the advantages of this patch? (Or the disadvantages of the current approach.) At the end of the day, st() looks like a small function and the code duplication might not that bad as the additional performance cost?

dries’s picture

(All in all I'm OK with this patch. I just want us to benchmark it properly, and weigh the advantages and disadvantages. Rushing his sounds like a bad idea.)

gábor hojtsy’s picture

@Dries, the advantage of this patch is that escaping and placeholder marking will be a lot easier. We know that XSS is so common because it needs unnecessarily extra work to prevent it. Drupal has a nice SQL injection prevention layer with escaping SQL parameters as needed. This patch introduces the same concept to help prevent XSS.

Quoting from the link above:

XSS bugs happen because in most web dev environments, the default is to not escape entities when writing variable data into HTML. If escaping were the default, and developers had to specifically un-escape variable data that was supposed to be HTML (and the parameter to do so were called something like “unsafe”) the incidence of XSS would drop by orders of magnitude. It would also become much easier to audit unsafe usages in such apps.

Unfortunaltely I am going to be missing in action until tuesday, so I cannot help in performance testing of this patch, but the development benefits are big IMHO.

dries’s picture

I agree that the benefits are big, and I can see this patch go in if there are no significant disadvantages. Unfortunately, I won't be able to do performance testing myself this weekend. Maybe early next week.

Steven’s picture

Advantages:

  • t() and st() calls become much more readable and shorter by internalizing check_plain() and theme('placeholder') calls on the arguments.
  • It is harder to forget check_plain() or theme('placeholder') when using t() and st() (xss!)
  • t() and st() now do exactly the same to their arguments, which means they can be used interchangably like this:
    $t = $is_install ? 'st' : 't';
    ...
    print $t('My string %foobar', ....);
    

    This is needed in code that runs both at installation time as well as run-time (think install-time/run-time requirements checking patch). The alternative would be for every string to make an if else or ? : construct. That's not handy.

Re performance: I don't have a good environment to benchmark at the moment.

dries’s picture

I don't have a good environment to benchmark at the moment.

We don't need a super-duper setup for this. The following might give us some gut feeling:

 -print time-
 for (int $i = 0; $i < 100000; $i++) {
   $foo = t('some stuff here');
   $foo = t('some stuff here');
   $foo = t('some stuff here', arg1);
   $foo = t('some stuff here', arg2, arg 3);
 }
 -print time-
heine’s picture

 $arg1 = 'xxxxx xx <b>xxx</b> xxxxxxxx';
 for ($i = 0; $i < 100000; $i++) {
   $foo = t('Filter by message type');      
   $foo = t('Filter by message type %arg', array('%arg' => $arg1));   
   $foo = t('Filter by message type @arg', array('%arg' => $arg1));
   $foo = t('Filter by message type !arg', array('%arg' => $arg1));
 }

Prepatch: 5923.72 ms
Postpatch: 22853.34 ms

Pre--------Post
1085.92 -> 1123.57 t(string)
1644.35 -> 7137.03 t(string %arg)
1611.61 -> 3170.20 t(string @arg)            
1658.33 -> 2481.58 t(string !arg)
1806.89 -> 8952.25 t(string %arg @arg !arg)
heine’s picture

Ignore the above. Will redo with proper prepatch checks.

heine’s picture

Now done with proper prepatch checks:

Prepatch: 11363.03 ms
Postpatch: 22853.34 ms

Pre --- Post
1133 -> 1123 t(string)
6443 -> 7137 t(string %arg) theme('placeholder')
2498 -> 3170 t(string @arg) check_plain()           
1638 -> 2481 t(string !arg)
8470 -> 8952 t(string %arg @arg !arg)
Steven’s picture

Heine: Summing the values you pasted in your last followup, gives me the following totals:

before: 20182ms
after: 22863ms

This is quite a different picture. Also, looking at the benchmark from #15, where the xss checks were absent (and the total time was only 5900ms), shows that most of the work is spent in actually performing the xss checks, not in the surrounding code. The performance impact of the changes to t() are minimal enough IMO.

heine’s picture

Steven: How embarrasing; that's indeed an incorrect before value.

I've some averages & spreads from 10x 10.000 x iteration

Prepatch              Postpatch				
  110.84 +/-   4.23     110.29 +/-   8.26	(~1)   string
  656.99 +/- 114.15     740.11 +/- 188.25	(1.13)	string placeholder
  248.99 +/-  14.30     322.24 +/-  37.87	(1.29)	string check_plain
  161.02 +/-   8.69     258.04 +/-  31.85	(1.60)	string var (no check)
  734.50 +/-  26.54     991.84 +/-  35.15	(1.35)	string pcn 
1,320.80 +/-  51.92   2,175.18 +/-  70.60	(1.65)	string pcn pcn
1,913.15 +/-  21.84   3,380.97 +/-  56.38	(1.77)	string pcn pcn pcn

time in ms
pcn = placeholder, check_plain, no check

Mostly small differences as seen by

  161.02 +/-   8.69     258.04 +/-  31.85	(1.60)	string var (no check)

+1 for this patch

gábor hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Ready to get committed then.

dries’s picture

Status: Reviewed & tested by the community » Needs work

Wanted to benchmark this patch but it no longer applies. :/

gábor hojtsy’s picture

Component: locale.module » base system
Status: Needs work » Needs review
StatusFileSize
new270.42 KB

Dries, at your service! I checked out a CVS copy of Drupal for the date when the last patch was generated, applied the patch and then took an hour to fix all CVS conflicts. Updated patch attached.

Would be nice to get this reviewed and comitted soon. It is not easy to keep this up to date...

gábor hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new270.43 KB

Although the previous patch would probably apply with some offsets (due to several changes in Drupal core lately), here is an up-to-date version. I would like to reiterate that this is an important security improvement, and also note that other patches are on hold awaiting the commit of this patch: Installer translations and proper watchdog translations are mine awaiting for a final decision on this.

drumm’s picture

Status: Reviewed & tested by the community » Needs review

I'll wait for Dries to do the promised benchmarks or until I hear otherwise.

Steven’s picture

StatusFileSize
new36.01 KB

Updating with latest HEAD.

dries’s picture

Did you upload the proper patch? I was going to benchmark it, but the last version doesn't apply at all. Got 20+ failures or so ...

Steven’s picture

StatusFileSize
new294.46 KB

Not sure what happened there. I think my CVS tree might have been a bit messed up.

Trying again... I verified that it applies cleanly to a fresh CVS checkout.

Steven’s picture

StatusFileSize
new294.46 KB

Properly wrapping doxygen comments...

Steven’s picture

Status: Needs review » Fixed

Dries' benchmarks showed a slowdown of less than 1% on the front page. Committed to HEAD.

eaton’s picture

Priority: Normal » Critical
Status: Fixed » Needs work

This patch seems to have hosed things. with a fresh HEAD, I see weird errors on hitting install.php:
Warning: array_walk() [function.array-walk]: Unable to call <em>Database name() - function does not exist in /home/bonsai/public_html/test/includes/common.inc on line 601

form element titles show the check_plain'd output of their HTML rather than the properly rendered text, and so on. I'm not even able to load the registration form to create the first user.

When I reverse this patch against HEAD, everything starts working again.

Anonymous’s picture

i can't confirm the install problems - it worked fine for me, but i'm also seeing the form title issues eaton mentioned.

eaton’s picture

also, the bug italicized this issue!

webchick’s picture

Better..?

eaton’s picture

Yes, better, thanks webchick :)

Having dug around, part of the problem appears to be that the case for handling "!" prefixes was completely removed from the patch when the last version of it was rolled. Is this, er, intentional?

eaton’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new482 bytes

No, I'm betting it wasn't.

webchick’s picture

Category: task » bug

+1.. confirmed that HEAD is totally broken without this patch (install.php is just a blank screen for me), and applying it fixes stuff again.

Steven’s picture

Status: Reviewed & tested by the community » Fixed


Committed to HEAD, thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)