A note in advance: Even though this includes some specifics of Twig and the idea is to implement it with Twig, most of the ideas are written in abstract form with an hypothetical "secure" phptemplate engine in mind, so that the whole community can participate in the discussion.

Motivation

The past has shown that custom themes often were prone to XSS (Cross Site Scripting), where user input has not been filtered properly.

Often this can be as easy as the following line in node.tpl.php:

echo $node->title;

which seems totally harmless (at least for those not reading the docs properly).

But also more "interesting" constructs have been found in the wild like full MySQL queries, function calls, etc. inside templates that exposed raw data and such an attacker can evaluate custom JS. An easy way to test this is setting for example title to <script>alert('xss');</script>.

Approach

One solution to this never ending spree of security problems is sandboxing and auto escaping. In sandboxing the allowed PHP commands are restricted, in auto escaping everything that is output is automatically escaped (think of each string being run through check_plain() before it is output).

Twig is a system, which allows compiling templates from a simple markup language to PHP. Therefore twig templates can be restricted in what functions are callable. Twig also allows auto escaping of output.

Problems

So restrict function set and enable auto escaping and all is well? Unfortunately not quite in the Drupal World.
(There is another problem that auto escaping needs to know the context for the proper escaping strategy, but this is for now not relevant to this issue.)

In Drupal we already escape many variables and provide them in a secure way to templates and as the current plan is to continue to support phpengine / contrib engine templates, this won't change so soon (#1537050: [meta] Should we keep / improve multiple theme engine functionality?).

The problem we are facing is essentially: Double-Escaping

Lets see the current node preprocess code:

  $variables['label'] = check_plain($node->label());

and now lets hypothetically think of that all echo's in the node.tpl.php are auto-escaped with: echo check_plain(x):
(this is essentially what twig is doing, but we simplify it here)

  <h2 class="title"><?php echo check_plain($label); 

?>

We end up with a double escaped title, which leads to for example the output of "Drupal & Friends" to appear like
&quot;Drupal &amp; Friends&quot;.

Solutions

To solve this problem we have several possible solutions, but all have different drawbacks and advantages and here I need community input:

In Twig it is possible to mark "output" as safe by wrapping it as part of a Twig_Markup object. Lets assume we had such a concept in Drupal, which is called SafeString and that we have a function to mark a string safe and check if a string is safe:

drupal_mark_safe() / drupal_is_safe()

Lets further assume for a moment that we have a function called drupal_autoescape(), which is aware of that and does not double escape already safe strings, but rather returns them as is:

  function drupal_autoescape($str) {
    if (drupal_is_safe($str)) {
      return $str;
    }
    return check_plain($str);
  }

We now have 3 different approaches to fix the double escape that will be discussed in the next three paragraphs in more detail:

  1. First - escape functions return safe strings (SafeString): Have check_plain return a string that was processed by drupal_mark_safe
  2. Second - Modules need to mark escaped strings as safe manually: Mark a string safe manually while assigning to $variables
    ($variables['label'] = drupal_mark_safe(check_plain($node->label()));)
  3. Third - The engine guesses safe strings: Mark all strings found in $variables safe for an auto escape aware engine
    (this can also happen on run-time depending on the engine)

First - escape functions return safe strings (SafeString)

The first is the most secure, but also the most complex variant ( implemented in #1712444: Twig: Activate autoescape mode as proof of concept). All strings that are escaped in Drupal are then objects of instance SafeString. Such calls to is_string are failing, while is_object calls can be misleading.

The advantage of the first variant is that besides fixing tests and code, the effort is rather simple as only all escaping functions need to return "SafeString"s. Also it is secure, because only a string that has been escaped, is not escaped again.

Second - Modules need to mark escaped strings as safe manually

The second is less secure than the first as it is more prone to errors like "I totally escaped that string in the other function. That is safe here.", but it still protects template authors of unintentionally leaking data like: echo $node->title; (as it is wrapped in drupal_autoescape). And even if core or contrib accidentally exposes an insecure variable like $title to the template ([#bartik-prone-to-xss]) as long as they did not deliberately marked it safe, it is still safe and auto escaped properly.

The disadvantages are (besides being not totally secure) that all output needs to be marked safe manually and that means changing all preprocess / process functions. While in the first case escaped, but not marked safe strings could not happen, here the template is broken and returns double escaped output till it gets fixed and marks the strings safe correctly. However as all templates are changed anyway as part of the Twig effort this might be a good opportunity to make them safe at the same time.

The advantage is that there are no new side effects of calling escaping functions like check_plain() as they return strings like normal and only templates need to deal with the (printable) SafeString objects. And it prevents a series of accidentally forgotten escaped variables.

Third - The engine guesses safe strings

In the third variant the auto escape able engine marks all strings in variables with drupal_mark_safe before passing them to the template.
(This makes sense as template authors are encouraged to use these variables as safe strings already now in templates.)

Therefore, in this third variant the theme author is only a little more protected than he is now: First, he cannot call insecure functions and second, code like echo $node->title; would still be detected as being insecure. However code like echo $title; would be not if title was insecure (like in the example in the second variant), because it was marked as safe.

Advantages are ease of implementation and less code changes. Disadvantages are the reduced security and less protected variables, which might give a false sense of security.

Refinements

There are some further issues that need to be taken into account:

TypedData Interface and SafeStrings

Using Twig_Markup objects for safe strings was objected, it was proposed to use the TypedData Interface instead and provide a SafeString extends String. However SafeString would need to implement __toString() method to work efficiently and I am not sure if that is "allowed" as part of TypedData Interface. A possibility could be to use a class SafeStringMarkup, which further extends SafeString and then provides the __toString method. Thoughts are welcome here.

Render Arrays / Output from render() / drupal_render()

node.tpl.php has for example

  print render($content);

which auto escape would transform into

  print drupal_autoescape(render($content));

Such we would need to define that output from render arrays is secure, which means that

a) render() / drupal_render() is only concat-ing safe strings in the first place
b) everything that is returned from render() is marked with drupal_mark_safe()

Given that render in essence calls theme() and template wrappers and adds #suffix / #prefix (and of course lots of nice logic), this seems reasonable, but it of course introduces possibilities for module authors and themers that change template.php to introduce XSS accidentally.

Overall however this just means that the security for these parts is not better than what we have now, but it is still note worthy.

API - Public API for these functions

I now proposed drupal_mark_safe() / drupal_is_safe() as easy to understand interface. However for performance reasons a check for $str instanceof SafeString would probably be used instead.

Drawbacks

The biggest drawback of any auto-escape solution is performance, but benchmarks in twig_engine and twig-in-core patches show that an additional function call (auto escape) doing "instanceof" and wrapping within an object for marking strings safe is really fast and that the overhead of marking something as secure is minimal compared to the overhead of escaping itself (which we _need_ to do).

Summary

I personally would love to see a secure theme system in Drupal 8. There are certain challenges to overcome. I think we can overcome them and have a much more secure theming layer by default.

I would love if all escaped strings in Drupal 8 were typed and Safe (Solution 1), but I see that this is really complex.

My second preference is Solution 2, where there is more work for the module author and/or themer, but the effort is directly paid out in security.

I will minimally integrate Solution 3 into Twig as that has no side-effects and if a string needs to be output it can still be marked safe or raw as part of the template (that is always possible like {{ obj.myproperty|raw }} in Twig or echo drupal_mark_safe($obj->myproperty); in PHP.

I think it might be a good idea to start with 3, then move to either 2 or 1 as follow-up.

Please keep this focused to the topic on hand, because this is not necessarily about Twig - it applies to any template language doing auto escape / sandboxing. And please do not discuss the advanced topic of that auto escape for HTML is wrong in a JS or CSS context, etc. This is all known and really complex but it is not within the scope of this issue to solve (see http://groups.drupal.org/node/255293). Thank you!

Thoughts, ideas, praise, objections are all very welcome!

#1825952: Turn on twig autoescape by default

CommentFileSizeAuthor
#15 xhprof-diff-core-twig-ra-opt.png36.54 KBFabianx
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Fabianx’s picture

Issue summary: View changes

change markup

chx’s picture

A few important things to consider: wholesale autoescaping all user input is not a good idea for sure because it can be rich text which is made safe elsewhere. No, the role of autoescaping and everything Twig in security is twofold: remove PHP from templates and put the onus of security on people who write PHP.

That said. While 1. is a very alluring option I started implementing it's a nightmare in reality because of strings not being strings. The complexity and side effects are really frightening.

Nowadays we already presume that variables provided for a template are made safe previously, somewhere. Some are check_plain'd in preprocess, some come from t(), some come from l(), some come from drupal_render (which is presumed to return safe things). If all of them are safe anyways, then why bother manually? That's not nice a DX.

So, my vote is on 3. It's the simplest, has very few if any side effects because it operates within the boundaries of the template system and still puts the kibosh on making XSS holes via {{ node.title }}.

chx’s picture

Issue summary: View changes

Added issue links

Fabianx’s picture

Issue summary: View changes

add summary

Fabianx’s picture

Issue summary: View changes

clarified summary

David_Rothstein’s picture

Great writeup.

#3 seems a little iffy because as soon as you have a scenario where a themer wants to print something like $node->field_whatever[0]['safe'] (which contains safe HTML), the auto-escaping will come along and break that, so they'll have to explicitly mark it as safe instead? And I think the security benefit of this becomes questionable as soon as you start having things like {{ variable|raw }} all over the templates... because then people start learning that it's a good idea to use that everywhere whenever they run into a problem.

#1 reminds me of Taint PHP (http://jaspan.com/tainted-love). I've always wondered if there was a way to do that without modifying PHP itself. But treating strings as objects definitely seems pretty iffy... I can actually imagine some ways to do it without that (keeping a registry of already-escaped strings or their hashes somewhere in memory and checking against that before auto-escaping?); the performance or memory implications of that might not be good, but I wonder if anyone has tested it?

chx’s picture

So, what if we do $GLOBALS['safestrings'][$string] = TRUE to mark safe strings? I know we hate globals but this is the perfect case for it -- if a string is safe, it is. That won't change, ever. As for testability, it's still testable. The performance implication are very likely to be super small, there's this assignement and there's one isset in Twig when wanting to escape. Looks doable to me. We need to memory benchmark this. My gut feeling is that even with the inefficiency of PHP storing things this will not be very horrific. I quickly tested it and 10 000 random strings a total of 32 912 113 characters long stored as shown above took 33 822 512 bytes. Of course it can't be known without actually testing this but if we presume that the sanitized strings total size are comparable to the HTML source size (that stands to reason if we do not produce tons of sanitized strings which arent printed) then, well, I looked at a 300 comment issue and that's 438 574 bytes. All in all, it stands to reason we are looking at worst a .5MB memory and negligible performance loss.

Crell’s picture

Globals != testable.

Also, that would likely cause all sorts of weirdness as soon as we issued a subrequest.

spekkionu’s picture

Both the Twig twig_escape_filter() method and the drupal check_plain() function are wrappers for htmlspecialchars().

htmlspecialchars() takes a 4th parameter $double_encode which according to the php docs:

When double_encode is turned off PHP will not encode existing html entities, the default is to convert everything.

This parameter is not used in either implementation. I might be missing something but is there a reason just passing false here wouldn't solve the problem of double escaping?

David_Rothstein’s picture

@spekkionu: Interesting point. However, that wouldn't help with all cases because in addition to check_plain() followed by an (auto-escaped) check_plain(), you can also having something like filter_xss_admin() (which produces a safe string but one that is expected to contain HTML), and in that case we still need to prevent the auto-escaping code from calling check_plain() on it even once.

It does look like there is a previous issue that proposed using the $double_encode parameter in Drupal: #882438: Globally prevent double encoding in check_plain() by raising minimum PHP to 5.2.3 (It was marked "won't fix" but I haven't read the arguments carefully myself.)

David_Rothstein’s picture

@chx makes a good point that a typical HTML page really only contains so many strings. Maybe this could work. It wouldn't have to be a global necessarily (it could be stored somewhere else... inside the request container, maybe?)

I did just think of a couple complications, though:

  1. We're basically assuming that code which sanitizes strings will only call Drupal functions to do it, not PHP functions directly (or functions provided by some outside library). Is that OK? I guess this issue would only ever cause double-escaping, though (never cause a problem in the other direction), so perhaps it's OK to just say that's enough of an edge case where sites that do that will need to do {{ variable|raw }} when necessary, because we can't help them.
  2. We're assuming we know how to identify that a string has been made safe. That's true for check_plain() and some other functions, but gets a little trickier with check_markup(), and even with filter_xss(). We're kind of trusting that people using those functions are using them correctly, I guess.
Fabianx’s picture

@David_Rothstein:

I would say that 1) is an acceptable tradeoff. After all it could also use the more complicated syntax of drupal_mark_safe(my_custom_xss_escape($string)). PHP authors should be able to use that.

For 2): We already trust them now. Nothing changes on that front, we are just trying to get more security into the templates, but yes it needs some thought.

One thing I thought of still was "caches". Especially for check_markup as part of the filter cache, but I guess that are exceptions we need to deal with ...

Thanks for all the great feedback so far!

chx’s picture

Yes, we trust people that stuff coming out of check_markup is safe. Our handle text in a secure fashion documentation page says explicitly "All you need to do is pass the rich text through check_markup() and you'll get HTML returned, safe for outputting." Similarly, if you add <embed> to the list of tags in filter_xss, hats off, I neither can or want to help you.

As for caches, no worries, consider that drupal_render or check_markup cache their own output which if known to be safe can be marked safe on cache retrieval as well. (Note that we already presume drupal_render returning safe strings by printing render($element) always without further escaping.) Ie. caching when handled inside a function does not need more special handling than adding it to the safe strings based on the same conditions as the normal return would be.

Fabianx’s picture

Everything I have here can be found in a fork()ed sandbox for performance testing of twig / autoescape:

http://drupal.org/sandbox/Fabianx/1819382 (to not clutter the main branches)

Benchmarks - Setup

Measuring Method

* 42 random nodes
* Testing: /node

with:

for i in {1..10}; do ab -n100 http://127.0.0.1/node/; done

All benchmarks done with the GLOBALS method outlined in #2 (because the SafeString objects method is way slower - even though the overhead is not directly visible in XHProf). Attributes and twig_render() calls still extend / return Twig_Markup if safe in these benchmarks.

Baselines

remotes/sandbox/d8-core-1712444-core : Vanilla Core (Baseline)
remotes/sandbox/d8-core-1712444-twig-node: Core with Twig running node.tpl.php from Template (Overhead)
remotes/sandbox/d8-core-1712444-twig : Core with Twig (Twig Baseline)

Autoescaping

remotes/sandbox/d8-core-1712444: V1 with function calls
remotes/sandbox/d8-core-1712444-v1: V1 optimized by inlining isset() / $GLOBALS['...'] statements into check_plain / theme / l.
remotes/sandbox/d8-core-1712444-v2: V2 having safe variables specified and marking only those as safe
remotes/sandbox/d8-core-1712444-v3: V3 marking all strings in $vars as safe (not recursively)

Extra

remotes/sandbox/d8-core-1712444-v2-node: V2 marking some vars as safe, then calling node.tpl.php from Template. (Overhead)
remotes/sandbox/d8-core-1712444-v2-ra-opt: Render Arrays optimized
remotes/sandbox/d8-core-1712444-twig-ra-opt: Render Arrays optimized

Benchmarks - Results

Baselines

Vanilla Core: 267 +/- 5.6
Overhead: Twig (node.tpl.php): 272 +/- 5.4
Baseline: Twig: 275 +/- 5.8

Autoescaping

V1 (non optimized): 288 +/- 6.0
V1: 287 +/- 5.9
V2: 279 +/- 6.0
V3: 281 +/- 5.4

Extra

Overhead: V2 with node.tpl.php: 273 +/- 5.2
Optim: V2: Render Array Optimized: 278 +/- 5.6
Optim: Twig: Render Array Optimized: 274 +/- 5.4

Interpretation: Compared to Vanilla Core | Core with Twig

Vanilla Core: 100.00% | 97.09%
Overhead: Twig (node.tpl.php): 101.87% | 98.91%
Baseline: Twig: 102.99% | 100.00 %

Autoescaping

V1 (non optimized): 107.87% | 104.73%
V1: 107.49% | 104.36%
V2: 104.49% | 101.45%
V3: 105.25% | 102.18%

Extra

Overhead: V2 with node.tpl.php: 102.24% | 99.27%
Optim: V2: Render Array Optimized: 104.12% | 101.09%
Optim: Twig: Render Array Optimized: 102.62% | 99.64%

Conclusion

* Twig in itself has overhead. Some of that comes from the Twig Engine directly; some is related to the DIC and double engine code (overall: 2.5-3% slower)
* The overhead of the template itself compared to node.tpl.php is marginal (0.74-1.1% slower)
* Autoescape costs additionally, the slowdown compared to Twig ranges from 1.45% - 4.36% depending on the used method
* Therefore in whole the overhead ranges from: 2.5-3% (without autoescape) to 4.5-8% (with autoescape)

Now we need to decide what overhead is worth the security.

chx’s picture

5% is something I am happy to pay any day for security. Still I would like to ask you to re-benchmark the base now that the compile DIC patch is in.

David_Rothstein’s picture

#8 and #9 make sense to me. And we can always add some "don't be stupid" documentation to filter_xss() :)

I haven't had time to dig into the rest, but:

remotes/sandbox/d8-core-1712444-v1: V1 optimized by inlining isset() / $GLOBALS['...'] statements into check_plain / theme / l.

Wouldn't t() need this also?

With the inlined $GLOBALS it likely wouldn't affect performance much, but I'm wondering about memory because I suspect we do call t() on a lot of strings that don't actually get printed to the page :(

Fabianx’s picture

#11: Measured it again:

Baselines with DIC in

Vanilla Core: 260 +/- 7.2
Overhead: Twig (node.tpl.php): 266 +/- 5.9
Baseline: Twig: 270 +/- 7.3

I can't really see where those 6 ms are coming from, but will check now again with XHProf ...

#12:

>> remotes/sandbox/d8-core-1712444-v1: V1 optimized by inlining isset() / $GLOBALS['...'] statements into check_plain / theme / l.

Wouldn't t() need this also?

With the inlined $GLOBALS it likely wouldn't affect performance much, but I'm wondering about memory because I suspect we do call t() on a lot of strings that don't actually get printed to the page :(

The usage of t() was not that much (for nodes at least). I'll do some memory benchmarks. I guess modules/list page has the most strings translated?

But yes, should inline everywhere probably ...

I'll do some (manual) memory benchmarks as well ...

Fabianx’s picture

New and automated (!) benchmarks, run via:

/var/www$ $HOME/benchmark-twig-ra-opt.sh

The code and all the other scripts can be found in this gist:

https://gist.github.com/3952753

This first chunk is just comparing against optimized code as per #1815250: Type twig variables in PHP and remove superfluous render_var() and getAttribute() calls from the generated code and also has APCLoader enabled to mitigate class loading overhead, which is just once and the difference is 8 ms vs. 4 ms in XHProf.

The next will be the core twig patch plus one of the variants here:

Benchmark Method

Again this is all called against /node with 42 random nodes.

The method used to make this very granular benchmarks is to use a Monte Carlo algorithm and use the minimum of 1000 runs on a dedicated machine with speedstep disabled.

That works _really_ really well and diminishes measurement error.

Re-testing baselines to make sure we have hit the minimum

=== twig-ra-opt..d8-core-1712444-core--rebase compared (508939efc2f7b..50893d208fa2c):

wt  : 371,806|361,565|-10,241|-2.8%
cpu : 368,022|360,023|-7,999|-2.2%
mu  : 12,326,648|11,608,136|-718,512|-5.8%
pmu : 12,733,320|12,016,016|-717,304|-5.6%


=== core..d8-core-1712444-core--rebase compared (5088f8fb987c9..50893d208fa2c):

wt  : 361,014|361,565|551|0.2%
cpu : 360,022|360,023|1|0.0%
mu  : 11,593,592|11,608,136|14,544|0.1%
pmu : 11,998,696|12,016,016|17,320|0.1%


=== twig-ra-opt..d8-core-1712444-twig-ra-opt--rebase compared (508939efc2f7b..50893ebd5aeee):

wt  : 371,806|371,607|-199|-0.1%
cpu : 368,022|368,023|1|0.0%
mu  : 12,326,648|12,326,648|0|0.0%
pmu : 12,733,320|12,733,320|0|0.0%


=== core..d8-core-1712444-twig-ra-opt--rebase compared (5088f8fb987c9..50893ebd5aeee):

wt  : 361,014|371,607|10,593|2.9%
cpu : 360,022|368,023|8,001|2.2%
mu  : 11,593,592|12,326,648|733,056|6.3%
pmu : 11,998,696|12,733,320|734,624|6.1%

The last one I also made a screenshot of, to show how this would be looking in a normal XHProf diff report:

XHProf Diff Core vs. Twig RA Opt

V1, V2, V3 - optimized

Against non-autoescape optimized twig and core.


=== twig-ra-opt..d8-core-1712444-v1-ra-opt--rebase compared (508939efc2f7b..508940009f863):

wt  : 371,806|383,532|11,726|3.2%
cpu : 368,022|384,024|16,002|4.3%
mu  : 12,326,648|13,326,040|999,392|8.1%
pmu : 12,733,320|13,620,056|886,736|7.0%


=== core..d8-core-1712444-v1-ra-opt--rebase compared (5088f8fb987c9..508940009f863):

wt  : 361,014|383,532|22,518|6.2%
cpu : 360,022|384,024|24,002|6.7%
mu  : 11,593,592|13,326,040|1,732,448|14.9%
pmu : 11,998,696|13,620,056|1,621,360|13.5%


=== twig-ra-opt..d8-core-1712444-v2-ra-opt--rebase compared (508939efc2f7b..508942e3490f2):

wt  : 371,806|377,583|5,777|1.6%
cpu : 368,022|380,023|12,001|3.3%
mu  : 12,326,648|12,412,424|85,776|0.7%
pmu : 12,733,320|12,827,616|94,296|0.7%


=== core..d8-core-1712444-v2-ra-opt--rebase compared (5088f8fb987c9..508942e3490f2):

wt  : 361,014|377,583|16,569|4.6%
cpu : 360,022|380,023|20,001|5.6%
mu  : 11,593,592|12,412,424|818,832|7.1%
pmu : 11,998,696|12,827,616|828,920|6.9%


=== twig-ra-opt..d8-core-1712444-v3-ra-opt--rebase compared (508939efc2f7b..508943c7883f7):

wt  : 371,806|382,068|10,262|2.8%
cpu : 368,022|380,024|12,002|3.3%
mu  : 12,326,648|12,455,400|128,752|1.0%
pmu : 12,733,320|12,871,688|138,368|1.1%


=== core..d8-core-1712444-v3-ra-opt--rebase compared (5088f8fb987c9..508943c7883f7):

wt  : 361,014|382,068|21,054|5.8%
cpu : 360,022|380,024|20,002|5.6%
mu  : 11,593,592|12,455,400|861,808|7.4%
pmu : 11,998,696|12,871,688|872,992|7.3%

Additional Benchmark to twig test class loading overhead

It is pretty much split up between loading the whole Twig toolset and loading the template.


=== twig-ra-opt..d8-core-1712444-twig-node--rebase compared (508939efc2f7b..508946279e070):

wt  : 371,806|366,149|-5,657|-1.5%
cpu : 368,022|364,022|-4,000|-1.1%
mu  : 12,326,648|12,276,480|-50,168|-0.4%
pmu : 12,733,320|12,687,496|-45,824|-0.4%


=== core..d8-core-1712444-twig-node--rebase compared (5088f8fb987c9..508946279e070):

wt  : 361,014|366,149|5,135|1.4%
cpu : 360,022|364,022|4,000|1.1%
mu  : 11,593,592|12,276,480|682,888|5.9%
pmu : 11,998,696|12,687,496|688,800|5.7%

Other benchmarks coming as soon as they are finished.

Summary

TL; DR:

V1:

WT : +22,518 +6.2%
CPU : +24,002 +6.7%
MU : +1,732,448 +14.9%

V2:

WT : +16,569 +4.6%
CPU : +20,001 +5.6%
MU : +818,832 +7.1%

V3:

WT : +21,054 +5.8%
CPU : +20,002 +5.6%
MU : +861,808 +7.4%

For 42 nodes the time used by the approaches really does not differ much ... The memory is differing, but this could very well change to the maximum of 14.9% overhead, if all templates are converted to twig.

Fabianx’s picture

webchick’s picture

Priority: Normal » Major
Status: Active » Postponed

Also raising to at least major, since this was a big win we were counting on with Twig, so makes sense to make use of it. :)

Also, I believe it's blocked on #1757550: [Meta] Convert core theme functions to Twig templates, per Jen's comment at #1825952-3: Turn on twig autoescape by default.

webchick’s picture

Curious about something, since I would really love to commit these Twig conversions that are kicking around, but haven't because of Dries's concern that it could leave core in an unreleasable state by July 1 if they're not all done *and* architecture issues like this aren't figured out (which are dependent on all the conversions getting done).

The main reason this issue is postponed on every single template file being converted is because it effectively removes the security layer from preprocess and makes it the theme engine responsible. How hard would it be to create a patch that did this paradigm shift for PHPTemplate, so they both behaved the same? So that we could hash out these really gnarly architectural issues *now* and not have to wait N months?

moshe weitzman’s picture

Crickets .... Angie's request is a good one. Anyone?

Fabianx’s picture

So, there is of course a way:

Exchange any and all "print" statements in templates with "print_secure" or short "p".

Then you can wrap this within a kind of auto-escape that is compatible with twig - using one of the methods above.

webchick’s picture

Priority: Major » Critical

Talking this over with Moshe earlier today, he really views this as a "make it or break it" issue for Twig. Auto-security was one of the primary stated benefits we get from replacing our theme system, to help counter-balance other significant disadvantages such as the deployment challenges around adding much more compiled PHP. Escalating to Critical in response.

So essentially, whatever we can do to expedite proving this capability of Twig sooner, the better off we will be. It would be bad if it turns out we can't pull this off after all.

webchick’s picture

I also spun off #1967118: Figure out recommendations for Drupal 8 deployment challenges since there's quite a bit more here than I previously knew.

David_Rothstein’s picture

It would be bad if it turns out we can't pull this off after all.

As far as I understand, it's quite possible to do this, but the question is how much of a performance and memory hit we're willing to take for it? (Probably this issue needs some more up-to-date benchmarks anyway.)

Talking this over with Moshe earlier today, he really views this as a "make it or break it" issue for Twig. Auto-security was one of the primary stated benefits we get from replacing our theme system, to help counter-balance other significant disadvantages such as the deployment challenges around adding much more compiled PHP.

I get where this is coming from (and to some extent even agree with it), but "primary stated benefits"... as far as I remember, this was discussed at length in some of the original issues and I thought it became pretty clear in that discussion that Twig itself provided no real security benefit for Drupal. Rather it was more like (exactly as stated in the issue summary here) if we can independently build a more secure theme system on the side, Twig will work very very cleanly with it.

In the original discussions, I think a lot of people felt like Twig had enough other benefits that even if there was no security improvement, it was worth doing for other reasons?

Crell’s picture

There's 2 parts to Twig's security benefits, I believe:

1) It's really really hard to throw random SQL into the template if it's a Twig file. By the time you coax it into working, you should have realized that it's hard because you're not supposed to do it. :-)
2) Twig natively handles escaping of variables automatically, and lets you specify in the template what kind of escaping to do. That's not a guarantee of correct filtering and secure output, but it's more reliable than "hope that someone got it right back in one of the 12 preprocess functions that fondled the variable before it got here".

Currently we're not using the second, AFAIK, because we're still doing escaping back in those 12 preprocess functions and don't want to double-escape everything. We want to switch over to Twig doing the escaping, but that's tricky to time.

The first point we already have by virtue of using .twig files.

thedavidmeister’s picture

Status: Postponed » Active

all the templates are now in, so this is active again.

effulgentsia’s picture

Priority: Critical » Major
Issue tags: -Security +Security improvements

Per https://groups.drupal.org/node/298298, this will not block a release, but is still very important, so adjusting priority accordingly.

chx’s picture

Priority: Major » Critical

I fundamentally disagree. If critical means "release blocking" then this IS the release blocker. Everything else is fluff. Security trumps everything.

c4rl’s picture

I'm a bit confused here. With Twig now in core, D8's theme layer is no less secure than D7's theme layer sanitization-wise, and definitely more secure syntactically, i.e. Crell's first point in #23 (in addition to any other PHP).

So we are arguably better off already than D7, and D7 shipped. Which makes me want mark this closed (won't fix).

Inexperienced devs/themers will have security-related sins elsewhere anyway, as indicated by comment 27 on #1499460-27: [meta] New theme system

I brought up this point on the Twig call yesterday, but the whole "autoescape" notion seems like a unicorn to me. :) We are potentially introducing new APIs a month before code freeze in the name of a security "problem" that exists just as prominently in D7. So, if this is a "release blocker" for D8, does that arguably demand a critical SA for D7? You see my confusion. :)

Additionally, why not just provide support for the multiple types of sanitization functions. Why not instead of playing this auto-escape guessing game, we provide support Twig, ( example {{ foo|check_markup }} or {{ bar|check_plain }})?

To respond to #20 and #21, there are eight points listed in the "Pros" of #1499460: [meta] New theme system, one of which being "Improved security," some of which we get automatically in Crell's #23. Twig is good for many other reasons, and we are already better off security-wise than D7.

jenlampton’s picture

Priority: Critical » Major

I think "closed (won't fix)" is a little extreme, but then again so is making this issue critical. If we can do autoescape (and it sounds like we can) then we should.

I'm oaky with allowing things like check_plain and check_markup as special filters (or functions?) for use in Twig templates, as long as theme devs don't need to use them.

And by "need" I mean that all of the drillable variables that core provides should be sanitized automatically, so when some poor front end dev tries to print out {{ content.field_link.0.text}} they don't need to understand the inner workings of Drupal, or guess blindly at which point they need to start sanitizing themselves.

If more advanced theme devs want to add their own stuff in preprocess to print out whatever, then I am okay with them needing to sanitize that themselves.

c4rl’s picture

all of the drillable variables that core provides should be sanitized automatically

This may be better suited for #2008450: Provide for a drillable variable structure in Twig templates, but I agree that drillables should be safe. However, that simply connotes the drillable variables (which likely come from a preprocessor) just need to be sanitized, which (as of now) would be the job of the preprocessors' function body anyway. So I agree with your quoted statement above sans what the word "automatically" connotes; the theme preprocessor can been written to decide what vars it makes available and what it doesn't. If you write your own preprocessors or callbacks can be responsible for sanitizing your data, just as you were in prior versions of Drupal.

Fabianx’s picture

Whether this issue is critical or not depends heavily on how the entity system has changed.

If it is possible from a template to access an entity object or its properties or anything else that could be in a template with an unsanitized value, then this is indeed critical. And the reason is:

* We are selling security, so people feel safe, but they are not. => This is misleading.

If all properties are safe and reachable strings are made drillable and we can sanitize them there, then this is less of an issue than before.

Much has changed since this issue had been created, so this warrants more research of how to introduce XSS or other security issues now.

The research itself however is critical to do ...

Leaving as major for now as compromise.

Fabianx’s picture

Issue summary: View changes

wording of render part

star-szr’s picture

Issue summary: View changes

Add related auto-escape issue

star-szr’s picture

Status: Active » Fixed

Seems like this can be fixed now that autoescape is in unless there are other things to address. Great work everyone!

#1825952: Turn on twig autoescape by default

Fabianx’s picture

Finally! I can't believe it took 1.5 years to get there!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.