Download & Extend

[beta blocker blocker] Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()

Project:Drupal core
Version:7.x-dev
Component:base system
Category:bug report
Priority:critical
Assigned:sun
Status:closed (fixed)
Issue tags:API change, security

Issue Summary

http://api.drupal.org/api/function/theme_more_link/7

invokes http://api.drupal.org/api/function/check_url/7

which invokes http://api.drupal.org/api/function/filter_xss_bad_protocol/7

which invokes check_plain()

but http://api.drupal.org/api/function/theme_more_link/7 additionally uses @link to insert the URL:

<?php
 
return '<div class="more-link">' . t('<a href="@link" title="@title">more</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
?>

which invokes check_plain() again.

The result is check_plain(check_plain($variables['url'])).

Two options to resolve this:

1) Use !-placeholder instead of @-placeholder.

2) Remove check_url(), keep @-placeholder.

or:

3) Do nothing and update the documentation of check_url() with proper usage explanation + example code snippets.

Comments

#1

Title:theme_more_link() escapes the URL twice» theme_more_link() and theme_more_help_link() escape the URL twice
Status:active» needs review

Let's go for 1 + 3.

AttachmentSizeStatusTest resultOperations
theme-check-url.patch1.72 KBIdlePASSED: [[SimpleTest]]: [MySQL] 17,867 pass(es).View details

#2

Status:needs review» reviewed & tested by the community

looks great.

#3

Status:reviewed & tested by the community» needs work

How about a test?

#4

Status:needs work» reviewed & tested by the community

If you insist :)

Just a trivial test addition: straight to RTBC.

AttachmentSizeStatusTest resultOperations
theme-check-url-4.patch3.4 KBIdlePASSED: [[SimpleTest]]: [MySQL] 18,726 pass(es).View details

#5

#6

Hm. Looking at the list of functions that invoke http://api.drupal.org/api/function/check_url/7, I see a couple of other candidates, for example:

http://api.drupal.org/api/function/template_preprocess_html/7 does 'href' => check_url($favicon),

but the subsequently invoked http://api.drupal.org/api/function/drupal_add_html_head_link/7 does check_plain($attributes['href']) once again.

I'm starting to question why http://api.drupal.org/api/function/filter_xss_bad_protocol/7 invokes check_plain() at all.

#7

I think the question is whether one should assume that the result of a check_*() function or a filter_*() function can be safely printed. But then they're not chainable (i.e., you can't do check_plain(check_url($foo))). Or, if each one should only do its legitimate scope and be chainable, but then developers would need to know which ones can be safely printed, and which ones must go through another check_*() or filter_*() step.

I think either way is kind of a mess. Really not sure what a good answer is.

If we keep filter_xss_bad_protocol() calling check_plain(), then we have the problem in #555830: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title". One solution to that problem is to have drupal_attributes() not always do a check_plain(), but instead do a check_url() for some attributes (href, src), and a check_plain() for others, but that somehow feels fragile. Or to make drupal_attributes() more like t(), where you can pass in odd characters like @ and ! to control what is check_plain()'d, but man, that kind of sucks too.

#8

Title:theme_more_link() and theme_more_help_link() escape the URL twice» theme_more_link(), theme_more_help_link(), and other places escape the URL twice
Status:reviewed & tested by the community» needs work

.

#9

Subscribing. Curious about how you guys will fix this. :)

#10

Title:theme_more_link(), theme_more_help_link(), and other places escape the URL twice» Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()
Priority:normal» critical
Issue tags:+Security Advisory follow-up, +Security improvements

I'm not sure either. At the very least, I'd say that the check_plain() actually belongs into check_url(), and not into filter_xss_bad_protocol(). But that doesn't solve the unpredictable chaining problem.

The problem is not really new, but gets more apparent due to D7's multi-layer rendering system. Previously, we simply said: "Everything that gets output in a theme function has to be sanitized." Since there was only one theme function, no problem. In D7, we have an entire stack of #pre_render, #theme, #theme_wrappers functions, from which every single may - or may not - be the last instance in the rendering process. Yuck.

Therefore, bumping to critical + adding security tags.

#11

this patch is RTBC,
always use <a href="!url"> not <a href="@url"> and enclose it in check_url() when dealing with variables:

t('My website: !url', array('!url' => check_url($website)));

no need for check_url or anything else when using a string constant:
t('Drupal handbooks: !url', array('!url' => 'http://drupal.org/handbooks'));

check_url() is the function to sanitize urls, so I do not understand why someone would like to use check_plain() or @url..
#596770: !url in t()

#12

@Pasqualle: so what's the right way to add a 'src' or 'href' to a $attributes array, as in #555830-19: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title"? drupal_attributes() calls check_plain() on all attribute values.

#13

There is no point at all in check_url() encoding the URL. It is there to check it, nothing else. I would remove this behavior from check_url() (and as a consequence from filter_xss_protocol).

#14

Thanks for your input, Damien! We definitely need some more security team members on this issue.

While I agree that it's wonky that check_url() check_plain()s, I'm not sure whether it's a good idea to remove the check_plain() from it. Looking at the list of functions that call http://api.drupal.org/api/function/check_url/7, at least a couple of them want to use the returned value directly in XML/HTML output, so those would actually have to call:

check_plain(check_url($url));

And we have the issue that all check_* functions are - as of now - supposed to return values that are safe for outputting.

To resolve the issue(s) at hand, how about moving the check_plain() into check_url(), and updating many of the current calls to check_url() to invoke filter_xss_bad_protocol() instead? (This partially means to add a check_plain() further down in the rendering chain, as in one of the given examples above.)

#15

If we move check_plain() into check_url(), then instead of code that wants to check_url() without check_plain()ing having to call filter_xss_bad_protocol(), what about adding a optional parameter to check_url()? ($sanitize=TRUE or $check_plain=TRUE)

#16

You can ignore this follow-up. Just mentioning, similar problem, weird solution: http://api.drupal.org/api/function/drupal_set_title/7

#17

Status:needs work» needs review

The following patch:

  • Removes check_url(), because its proper use is just confusing
  • Removes the check_plain() in filter_xss_bad_protocol()
  • Removes the second argument ($decode) to filter_xss_bad_protocol(), which is just not useful at all
  • Standardize everywhere on check_plain(filter_xss_bad_protocol()). Most of the theme functions could be simplified by using drupal_attributes(), but that will be for another patch
AttachmentSizeStatusTest resultOperations
715142-remove-check-url.patch19.63 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 715142-remove-check-url.patch.View details

#18

Status:needs review» needs work

I'm very worried about themers, and would highly prefer to not remove check_url(), but move the check_plain() into it. The other changes of this patch look good.

However, didn't look up whether this patch actually solves all issues identified in this thread.

#19

This still needs work to take into account the rest of what this issue is about, but here's a start that goes in the direction that I think sun wants. Also, I'm not so sure about #17's removal of the $decode parameter from filter_xss_bad_protocol(). What about the needs of _filter_xss_attributes() and potentially similar functions? This just changes the default for it and couples the check_plain() at the end to it.

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-19.patch5.39 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,104 pass(es).View details

#20

Status:needs work» needs review

CNR for bot, but still "needs work".

#21

There is no point at all in check_url() encoding the URL.

Re-reading this, I guess it's true for module-generated URLs. But it's not valid for URLs coming from user input. The latter being the reason for additionally stripping bad protocols.

Considering this, I think the latest patch makes sense, but we want to ensure that we only invoke check_url()

  1. on URLs coming from user input and
  2. only in theme functions that actually return a HTML string

#22

Like this?

Please review both the patch, and apply it and grep for remaining places where calls to check_url() were left in tact.

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-22.patch16.51 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,167 pass(es).View details

#23

+++ includes/install.core.inc 10 Apr 2010 15:44:17 -0000
@@ -723,7 +723,7 @@ function install_verify_requirements(&$i
-      $status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_url(request_uri())));
+      $status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_plain(request_uri())));

!url + check_plain() => @url

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-23.patch16.5 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,172 pass(es).View details

#24

+++ includes/common.inc 10 Apr 2010 15:52:16 -0000
@@ -1422,25 +1432,28 @@ function _filter_xss_attributes($attr) {
-function filter_xss_bad_protocol($string, $decode = TRUE) {
+function filter_xss_bad_protocol($string, $html_encoded = FALSE) {

Perhaps we don't need to change the default of 2nd param from TRUE to FALSE? What do you think? Senseless kitten killing?

114 critical left. Go review some!

#25

With #24.

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-25.patch14.98 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,167 pass(es).View details

#26

+++ includes/theme.inc 13 Apr 2010 16:44:56 -0000
@@ -2174,9 +2174,9 @@ function template_preprocess_html(&$vari
   // Add favicon.
   if (theme_get_setting('toggle_favicon')) {
-    $favicon = theme_get_setting('favicon');
+    $favicon = filter_xss_bad_protocol(theme_get_setting('favicon'), FALSE);
     $type = theme_get_setting('favicon_mimetype');
-    drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => check_url($favicon), 'type' => $type));
+    drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => $favicon, 'type' => $type));
   }

@@ -2358,9 +2358,9 @@ function theme_get_suggestions($args, $b
function template_preprocess_maintenance_page(&$variables) {
   // Add favicon
   if (theme_get_setting('toggle_favicon')) {
-    $favicon = theme_get_setting('favicon');
+    $favicon = filter_xss_bad_protocol(theme_get_setting('favicon'), FALSE);
     $type = theme_get_setting('favicon_mimetype');
-    drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => check_url($favicon), 'type' => $type));
+    drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => $favicon, 'type' => $type));
   }

I dislike a function like this needing to call filter_xss_bad_protocol() directly. This patch fixes that, enabling theme functions that add to a $attributes array to still use check_url().

This patch also adds a unit test.

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-26.patch17.28 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,167 pass(es).View details

#27

I reviewed and tested filter_xss_bad_protocol-fix-715142-26.patch and found it to work fine.

Is there a reason that the theme functions use the t() function instead of the l() function? If not, I'd be happy to reroll the patch using the l() function.

#28

As per #27, here's a patch that offers l() function usage in appropriate places.

Please review.

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-27.patch17.23 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,932 pass(es).View details

#29

Status:needs review» reviewed & tested by the community

Submitter reivewed 17k patch.
Submitter re-rolled to make changes from t() to l() function.

I reviewed with code him.
We ran automated tests.
We manually tested the theme functions.

Looks good to me. Ship it!

#30

Status:reviewed & tested by the community» needs work

+++ includes/theme.inc 22 Apr 2010 22:43:07 -0000
@@ -1881,7 +1881,7 @@ function theme_more_help_link($variables
function theme_feed_icon($variables) {
...
-    return '<a href="' . check_url($variables['url']) . '" title="' . $text . '" class="feed-icon">' . $image . '</a>';
+    return l($image, $variables['url'], array('html'=>TRUE, 'attributes' => array('class' => 'feed-icon', 'title' => $text)));

1) Missing spaces for 'html' key.

2) The 'class' attribute is always an array.

+++ includes/theme.inc 22 Apr 2010 22:43:07 -0000
@@ -1931,7 +1931,7 @@ function theme_html_tag($variables) {
function theme_more_link($variables) {
-  return '<div class="more-link">' . t('<a href="@link" title="@title">More</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
+  return '<div class="more-link">' . l('More', $variables['url'], array('attributes' => array('title' => $variables['title']))) . '</div>';

Missing t() for 'More'. Not sure whether this change is good for translatability (probably not).

Powered by Dreditor.

#31

Status:needs work» needs review

Thanks.

Incorporated suggested changes.

Ready for another round of testing.

AttachmentSizeStatusTest resultOperations
filter_xss_bad_protocol-fix-715142-28.patch17.49 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,949 pass(es).View details

#32

Looks good to me.

+    // Not check_url(), because l() is not responsible for stripping bad
+    // protocols. Code that calls l() with a path that comes from user input
+    // needs to do that.

Do we have this documented somewhere?

#33

Good thinking. That inline comment should stay there, but we additionally want to add a detailed note about this to the function docblock of l().

#34

Adding a docblock to subsequent patch. Should this go in a separate issue?

AttachmentSizeStatusTest resultOperations
715142_docblock.patch944 bytesIdlePASSED: [[SimpleTest]]: [MySQL] 19,921 pass(es).View details

#35

No, this belongs into this patch. And we need to merge this addition into the existing patch. However, that note should be more prominent and not be buried into a @param description (i.e. above the function parameters instead).

#36

Rerolling.

AttachmentSizeStatusTest resultOperations
715142_filter-xss-bad-protocol.patch17.75 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,935 pass(es).View details

#37

Status:needs review» needs work

+++ update.php 23 Apr 2010 16:47:34 -0000
@@ -317,7 +317,7 @@ function update_check_requirements() {
-    $status_report .= 'Check the error messages and <a href="' . check_url(request_uri()) . '">try again</a>.';
+    $status_report .= 'Check the error messages and <a href="' . check_plain(request_uri()) . '">try again</a>.';

+++ includes/install.core.inc 23 Apr 2010 16:47:36 -0000
@@ -723,7 +723,7 @@ function install_verify_requirements(&$i
-      $status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_url(request_uri())));
+      $status_report .= st('Check the error messages and <a href="@url">proceed with the installation</a>.', array('@url' => request_uri()));

+++ includes/theme.maintenance.inc 23 Apr 2010 16:47:37 -0000
@@ -157,7 +157,7 @@ function theme_install_page($variables)
-    $variables['content'] .= '<p>' . st('Check the error messages and <a href="!url">try again</a>.', array('!url' => check_url(request_uri()))) . '</p>';
+    $variables['content'] .= '<p>' . st('Check the error messages and <a href="@url">try again</a>.', array('@url' => request_uri())) . '</p>';

I think that wherever request_uri() is used, we need to use check_url(), since I think the URI can be easily spoofed.

+++ includes/common.inc 23 Apr 2010 16:47:35 -0000
@@ -1422,25 +1433,36 @@ function _filter_xss_attributes($attr) {
  * @param $string
- *   The string with the attribute value.
...
+ *   The string that contains one or more URLs within it. Can either be a plain
+ *   text string or be encoded for HTML.

"one or more URLs within it" ?

Not sure whether we can drop the second sentence. (The $html_encoded @param directly follows and explains it better.)

+++ includes/common.inc 23 Apr 2010 16:47:35 -0000
@@ -2136,6 +2163,10 @@ function drupal_attributes(array $attrib
+ * This function is not responsible for stripping bad protocols from URLs
+ * that come from user input. You should use check_url() function in that
+ * case.
+ *
  * @param $text
  *   The link text for the anchor tag.
  * @param $path

I'm not sure I understand how to use check_url() in combination with l(), considering not only $path, but also $options...

We want to add a @code example here.

+++ includes/theme.inc 23 Apr 2010 16:47:37 -0000
@@ -1867,7 +1867,7 @@ function theme_item_list($variables) {
function theme_more_help_link($variables) {
-  return '<div class="more-help-link">' . t('<a href="@link">More help</a>', array('@link' => check_url($variables['url']))) . '</div>';
+  return '<div class="more-help-link">' . l(t('More help'), $variables['url']) . '</div>';

@@ -1881,7 +1881,7 @@ function theme_more_help_link($variables
function theme_feed_icon($variables) {
...
-    return '<a href="' . check_url($variables['url']) . '" title="' . $text . '" class="feed-icon">' . $image . '</a>';
+    return l($image, $variables['url'], array('html' => TRUE, 'attributes' => array('class' => array('feed-icon'), 'title' => $text)));

@@ -1931,7 +1931,7 @@ function theme_html_tag($variables) {
function theme_more_link($variables) {
-  return '<div class="more-link">' . t('<a href="@link" title="@title">More</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
+  return '<div class="more-link">' . l(t('More'), $variables['url'], array('attributes' => array('title' => $variables['title']))) . '</div>';

(mapping to the above) So where is $variables['url'] filtered?

Powered by Dreditor.

#38

Assigned to:Anonymous» sun
Status:needs work» needs review
Issue tags:-Needs tests

Looks ready to fly for me.

AttachmentSizeStatusTest resultOperations
drupal.check-url.38.patch13.16 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,683 pass(es).View details

#39

This looks good to me, but I'd like someone on the security team, rather than me, to RTBC it.

#40

#41

Issue tags:+security

I'm not really sure what the different "security" tags mean, but adding the base one, in case that's the one that people best able to review this issue have bookmarked.

#42

+++ includes/common.inc 29 Apr 2010 20:12:29 -0000
@@ -1199,10 +1199,21 @@ function flood_is_allowed($name, $thresh
+ * Strip harmful protocols (e.g. 'javascript:') from a URL and optionally make it safe for output to HTML.
+ *
+ * @param $uri
+ *   A plain-text URI that might contain harmful protocols.
+ * @param $check_plain
+ *   (optional) Whether to also call check_plain() so that the result can be
+ *   safely output to HTML. Defaults to TRUE, but can be set to FALSE when the
+ *   result of this function will not be output to HTML (e.g. to a plain-text
+ *   e-mail instead), or when it will be passed to another function that expects
+ *   a plain-text string instead of an HTML-encoded string (e.g. l(), t(), or
+ *   drupal_attributes()).
+ */
+function check_url($uri, $check_plain = TRUE) {
+  $uri = filter_xss_bad_protocol($uri, FALSE);
+  return $check_plain ? check_plain($uri) : $uri;

I don't like the fact that we added the $check_plain parameter. It is better not to clobber those IMO and to call things as follows:

check_plain(check_url($foo))

That is actually more transparent. Thoughts on that? I might be the only one that thinks that.

#43

@Dries: you're not the only one who thinks that. Damien agrees with you (#13). But sun is concerned about themers (#18) and I agree with him. I think it's too much to ask themers implementing theme functions or preprocess functions to remember that check_markup() and check_plain() return strings that are safe for printing, but that check_url() does not.

check_url() in HEAD, and in D5 and D6 returns a string that has already been check_plain()'d (via filter_xss_bad_protocol()). This patch is about keeping that a default, but making it possible to not check_plain() so that the result can be passed to t(), l(), drupal_attributes(), and similar functions. IMO, changing the default of what check_url() returns, especially to something less secure, seems unwise for D7.

#44

eff nailed it.

#45

sounds like eff is supporting the patch (and not suggesting it needs work) ... so back to rtbc to see what Dries says about #43... OH, wait. Needs retesting first, and a security review before rtbc, I think.

#46

#38: drupal.check-url.38.patch queued for re-testing.

#47

Title:Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()» [needs security review] Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()

According to #40, not only is this issue critical, but it is a pre-req of another critical. So, changing the title in the hopes that it attracts attention from someone who can provide a sufficient security analysis of the patch, and if it looks good, RTBC it.

#48

Component:theme system» base system

.

#50

#38: drupal.check-url.38.patch queued for re-testing.

#51

needs a security review (others have said it's ok to rtbc after security review).

#52

Somebody needs to bring in the contexts from http://acko.net/blog/safe-string-theory-for-the-web and add to the issue at least what kind of escapes the different checking functions do. Only then can we determine whether the patch is correct. Changing check_url to check_plain makes me go hm, but I can't say for sure it's wrong.

#53

I stick with #17, we are mixing things that are way too different to be mixed:

  • Filtering "bad protocols": this is done by filter_xss_bad_protocol(). It's a security feature: we don't allow potentially harmful protocols such as javascript: to get through
  • Escaping the URL when it is output in an HTML (argument) context: this is done by check_plain() or drupal_attributes() depending on the way the argument is output. This is *not* a security feature, but just something mandatory to output any text in an HTML context.

Currently, check_url() is basically a wrapper around check_plain(filter_xss_bad_protocol($url)). This is bad, because nobody (including the colloquial "themer") understand what this means.

Most of the time, when you are properly using t() or drupal_attributes(), you don't need to take care about the check_plain() yourself:

<?php
$status_report
.= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_url(request_uri()))));
?>

should be written as:

<?php
$status_report
.= st('Check the error messages and <a href="@url">proceed with the installation</a>.', array('@url' => filter_xss_bad_protocol(request_uri())));
?>

#54

#17: 715142-remove-check-url.patch queued for re-testing.

#55

patch in #17 was suggested in #53, seeing what the testing bot thinks of it.

also, sounds like there is no consensus on what approach to take... does that make this issue needs work? or more reviews?

#56

I think this should stay as "needs review" until there's consensus on what the next re-roll needs to look like. Then it can be "needs work".

From #17:

+++ modules/comment/comment.tokens.inc
@@ -167,7 +167,7 @@ function comment_tokens($type, $tokens, array $data = array(), array $options =
-          $replacements[$original] = $sanitize ? filter_xss_bad_protocol($comment->homepage) : $comment->homepage;
+          $replacements[$original] = $sanitize ? check_plain(filter_xss_bad_protocol($comment->homepage)) : $comment->homepage;

So, that patch changes filter_xss_bad_protocol() from returning something that can be printed (HEAD) to returning something that can't be printed (patch). Is this an acceptable thing to do this late in D7? Note that filter_xss() returns something that can be printed. This can be somewhat mitigated by retaining check_url() as a wrapper for check_plain(filter_xss_bad_protocol()), but it still means any contrib code currently calling filter_xss_bad_protocol() directly and expecting back a string that's safe for printing will need to be changed.

My reading of HEAD is that we currently have two patterns: filter_xss() and filter_xss_bad_protocol() take strings that are already HTML-encoded and return HTML-encoded strings. check_plain() and check_url() take plain-text strings and return HTML-encoded strings. What we need to add is something that takes a plain-text string and returns a plain-text string, but with bad protocols stripped. Some options:

  1. Add a 2nd param to check_url() so that we can call check_url($foo, FALSE) as per #38. Dries and Damien don't like this, and I can understand why.
  2. Change the API of filter_xss_bad_protocol() to take and return a plain-text string instead of taking and returning an HTML-encoded string, as per #17. I'm concerned about such a change this late in D7, but if there's buy-in from Damien, sun, and Dries or webchick, then I'm okay with it.
  3. Leave the 2nd parameter of filter_xss_bad_protocol(), but make the check_plain() at the end only run if it's TRUE, and then use filter_xss_bad_protocol($foo, FALSE). I'm concerned that's overly cumbersome syntax to use everywhere we need to pass a filtered plain-text URL to t() or drupal_attributes().
  4. Introduce a new function (e.g., drupal_strip_dangerous_protocols())? That way, there's no expectation that it returns something safe for printing, because it does not start with filter_* or check_*.

Powered by Dreditor.

#57

Fair analysis. I'm happy with solution #4 above, if there are buy-ins from others.

#58

I would add a new placeholder to t() handling this, too. * perhaps?

Edit: and yes 4)

#59

This is a re-roll of #38, adjusted for #56.4.

I would add a new placeholder to t() handling this, too. * perhaps?

I didn't do this in this patch, because it seems like it would introduce too many permutations needing special symbols (i.e., a symbol for drupal_strip_dangerous_protocols(), a separate symbol for check_url(), and maybe others). Since protocol stripping only needs to happen for user-entered text, I'm not sure there's enough use-cases to warrant adding more complexity within t().

The reason I'm leaving [needs security review] in this issue title is that in addition to the decision reached in #56.4, this patch continues what #38 did in additionally changing some places that used to call check_url() to just calling check_plain() (as discussed in earlier comments). The idea being that protocol stripping is something to be done for user-entered text only, not generically whenever we have a URL. This makes sense to me, but I'd feel better knowing someone from the security team had a chance to approve this (@chx, @Damien: I don't know if you're officially on the security team, but if you feel comfortable RTBC'ing, please do so).

AttachmentSizeStatusTest resultOperations
drupal.check-url.59.patch18.35 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,936 pass(es).View details

#60

Re-roll of #59 because of failure in form.inc alterations. Looking into security concerns, so far I can't see any fault.

AttachmentSizeStatusTest resultOperations
drupal.check-url.60.patch18.24 KBIdlePASSED: [[SimpleTest]]: [MySQL] 21,242 pass(es).View details

#61

Coltrane doesn't see any security problems. How many security reviews do we need?

#62

How many security reviews do we need?

1, but it needs to be definitive, by the reviewer RTBC'ing the issue.

Confirming that #60 is a straight re-roll with no actual code changes. Therefore, @coltrane: feel free to RTBC if you're confident about this being ok from a security standpoint.

#63

For anyone joining this issue late, see last paragraph of #59 for the security review that's needed.

#64

Status:needs review» reviewed & tested by the community

Setting to RTBC

#65

Title:[needs security review] Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()» [beta blocker blocker] Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()

Thanks coltrane! Therefore, removing the [needs security review] issue title prefix.

But, this issue is a pre-req for #690980: Disabled form elements not properly rendered, which itself is a beta blocker, which makes this issue a "beta blocker blocker", so let's get it in. Thanks!

#66

I'm... honestly scratching my head how we got from the initial bug report (a module accidentally double-check-plained a URL in a string which could be solved in 8 seconds by switching @ to !) to a sweeping API change with security implications, several months and several alphas into our release cycle. I'm even more scratching my head why the behaviour of check_url(), filter_xss_protocol(), etc. that's been part of Drupal since time immemorial is suddenly a critical, beta blocking issue.

drupal_strip_dangerous_protocols() looks like a nice helper function when you don't want the HTML-escaping behaviour of the other functions. What I don't understand is why so many functions needed to change here, and of course they'll need to change in contrib as well.

I've read the issue, twice. I'm still scratching my head. Could someone maybe write the documentation that will need to go out to the devel list to explain what the hell happened and how they need to change all their modules and themes? That might help me understand.

I also tried running a URL through a bunch of functions:

original url: javascript://www.google.com/?q=blah&amp;something=something
check_url(): //www.google.com/?q=blah&amp;something=something
filter_xss_bad_protocol(): //www.google.com/?q=blah&amp;something=something
url(): /core/javascript%3A//www.google.com/%3Fq%3Dblah%26something%3Dsomething

It would be great to hear what of those outputs is undesirable, and what effect this patch has specifically.

#67

Status:reviewed & tested by the community» needs work

Some of this patch is a necessary fix to avoid double HTML escaping. That's why we need a new function. Aside from adding a new function, I can't see another API change. Correct me if I am wrong.

However, quite some hunks (like the more link itself which sparked this issue which can easily be fixed by changing a @ into a !) look like a performance optimization to me and makes this patch into the proverbial kitten killer monster. As far as I can see, changing check_url(url()) to check_plain(url()) saves an expensive protocol-stripping call but there is no change in the output whatsoever. These do not belong in this patch and make it very hard to read and evaluate.

Also:

-  $action = $element['#action'] ? 'action="' . check_url($element['#action']) . '" ' : '';
+  $action = $element['#action'] ? 'action="' . check_plain($element['#action']) . '" ' : '';

Why exactly we do not need a protocol stripping off actions??

#68

Also, why

+    // Not check_url(), because l() is not responsible for stripping bad
+    // protocols. Code that calls l() with a path that comes from user input
+    // needs to do that.

is this in the tests and not l() doxygen?

#69

Status:needs work» needs review

I'm... honestly scratching my head how we got from the initial bug report (a module accidentally double-check-plained a URL in a string which could be solved in 8 seconds by switching @ to !) to a sweeping API change with security implications, several months and several alphas into our release cycle. I'm even more scratching my head why the behaviour of check_url(), filter_xss_protocol(), etc. that's been part of Drupal since time immemorial is suddenly a critical, beta blocking issue.

Yes, the initial bug report was just about theme_more_link() where a change from @ to ! was all that was needed. Then we started fixing problems in theme_image() in #555830: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title", and discovered that it was impossible to set $attributes['src'] = check_url(...), because drupal_attributes() does a check_plain(), and unlike t(), there's no special symbol for turning that off: i.e., there's nothing like $attributes['!src']. #10 summarizes why this is critical in D7, even though it's a legacy problem from D6. So most of this thread debated how to decouple protocol stripping from check_plain(): this is summarized in #56, and #56.4 was chosen.

Could someone maybe write the documentation that will need to go out to the devel list to explain what the hell happened and how they need to change all their modules and themes? That might help me understand.
...
It would be great to hear what of those outputs is undesirable, and what effect this patch has specifically.

This patch deals with 2 questions:

1) When should protocol stripping be done?
2) What is the API of check_url() and filter_xss_bad_protocol() and the new function drupal_strip_dangerous_protocols()?

#1 will be answered later, in context of chx's review.

As for #2, there is no API change in this patch with respect to check_url() and filter_xss_bad_protocol(). check_url() takes a plain-text string, strips protocols, and returns an HTML-encouded string. That's true in HEAD and is unaffected by this patch. filter_xss_bad_protocol() without the 2nd argument passed takes an HTML-encoded string, strips protocols, and returns an HTML-encouded string. That's true in HEAD and is unaffected by this patch. filter_xss_bad_protocol() with a 2nd argument of FALSE takes a plain-text string, strips protocols, and returns an HTML-encouded string. That's true in HEAD and is unaffected by this patch in order to retain BC, though comments are added in the patch to makes it clear that such a parameter in filter_xss_bad_protocol() is useless and should be removed in D8. This patch adds a new function, drupal_strip_dangerous_protocols(), for taking a plain-text string, stripping protocols, and returning a plain-text string, addressing the drupal_attributes() use-case.

However, quite some hunks look like a performance optimization to me and makes this patch into the proverbial kitten killer monster. As far as I can see, changing check_url(url()) to check_plain(url()) saves an expensive protocol-stripping call but there is no change in the output whatsoever. These do not belong in this patch and make it very hard to read and evaluate.
...
Why exactly we do not need a protocol stripping off actions??

Ok, back to #1 from above: it's not just a performance optimization: it's a question of when should protocols be stripped and when they should not be. The output is the same when the input string doesn't contain dangerous protocols, but the output is different when the input string does contain dangerous protocols. Why does l() in HEAD call check_plain(url()) and not check_url(url())? And why does theme_image() in D6 call check_url() and not check_plain()? (Note: theme_image() for D7 was changed in the previously mentioned issue). Which functions should be responsible for stripping protocols and which should not be? sun answers this question in #21: the only functions that should strip protocols are the ones that are dealing with strings that come from user input. l() doesn't strip protocols because it's a generic function: maybe a module wants to call it with $path = 'javascript:...' or $path = 'somecustomprotocol:...', and l() should not force that to be invalid. The same answer applies to theme_image(), which is why it's been changed already in that other issue. And the same answer applies to theme_form() with respect to #action. It's a mistake that HEAD calls check_url() in so many places, because protocols should only be stripped in the functions that deal with strings that come from user input.

Maybe this change to limiting where protocol stripping is done should have been a separate issue. But it seemed to be relevant to this one, since this is where we realized the need to decouple protocol stripping from html encoding. Anyway, it is this aspect that made it take so long to reach RTBC status, because someone with real, in-the-wild, Drupal XSS attack experience needed to review all these hunks that change check_url() to check_plain() and similar removals of protocol stripping. But look at the comments from #59-#64, and I think as long as we trust that coltrane performed an adequate security review of these hunks, then I feel good about them.

Re #68:
The patch does add this info to the PHPDoc of l(). The comment is also added in the test, because the test in HEAD is wrong (i.e., inconsistent with l()) and is being fixed.

CNR for chx to review this comment and decide whether to RTBC or give additional feedback.

#70

Issue tags:+API change

This issue needs the "API Change" tag, because of things like the change to theme_form(). In other words, with this patch, a module is responsible for stripping protocols when setting a form's #action to user-entered text. The module would no longer be able to rely on theme_form() stripping those protocols. This is applying the same pattern that is already the case with l() and theme_image() to the rest of core.

#71

Status:needs review» needs work

We can go that way but massive amounts of doxygen is needed. It needs to be crystal clear which function are such 'generics' that can take anything and it's the caller who needs to the protocol stripping via the new function. Also as l runs through url() it will not recognize custom protocols (say, skype: or magnet:) as an external URL due to the external URL checking happens via filter_xss_bad_protocol. So the documentation for l should mention that if it is indeed your wish to allow for skype: then use the filter_allowed_protocols $conf override.

Edit: also, although it looks like we took a long time to reach RTBC status, the patch / issue was essentially restarted in #56.

#72

Status:needs work» needs review
Issue tags:-API change, -Security Advisory follow-up, -Security improvements

OK, I'm splitting this up into 2 issues. This patch now only addresses #69.2, which is the title and original use-cases of this issue. I will open a new issue for #69.1, since in #71, chx says that will require more discussion and documentation.

Removing "API Change" tag, because this patch does not have an API change. It has an API addition of drupal_strip_dangerous_protocols() only.

Removing specific Security tags, because this patch contains no change with respect to where protocol stripping happens.

Leaving general "security" tag, because this patch touches on security issues, including an extra comment in url() and l() explaining the already existing behavior of those functions. Also, this patch includes a hunk that fixes the comment:homepage token.

This patch is still critical, and beta blocking, because it's a pre-req for #690980: Disabled form elements not properly rendered. Comment #89 on that issue shows the dependent hunks.

AttachmentSizeStatusTest resultOperations
drupal.check-url.72.patch16.18 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,017 pass(es).View details

#73

I will open a new issue for #69.1, since in #71, chx says that will require more discussion and documentation.

#845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols

#74

Status:needs review» reviewed & tested by the community

Little kitty saved! :)

#75

Status:reviewed & tested by the community» needs work

Nope. The new doxygen is incorrect. See #66 url() and #845000-3: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols for a detailed analysis.

#76

Which doxygen is incorrect?

#77

Status:needs work» needs review

Very interesting analysis in #75. Understanding and properly documenting url() and l() in that issue will be very important. In the meantime, simplifying this patch even further based on that feedback:

+++ includes/common.inc 4 Jul 2010 18:55:02 -0000
@@ -1942,6 +1983,10 @@ function format_username($account) {
+ *   - This function is not responsible for stripping bad protocols from URLs
+ *     that come from user input. Code that calls this function with a
+ *     user-entered path/URL is responsible for stripping harmful protocols from
+ *     this argument by calling drupal_strip_dangerous_protocols().

This patch removes this PHPDoc addition to url(). To be taken up in the other issue.

+++ includes/common.inc 4 Jul 2010 18:55:02 -0000
@@ -2174,10 +2219,14 @@ function drupal_attributes(array $attrib
- *   "http://example.com/foo". After the url() function is called to construct
- *   the URL from $path and $options, the resulting URL is passed through
- *   check_plain() before it is inserted into the HTML anchor tag, to ensure
- *   well-formed HTML. See url() for more information and notes.
+ *   "http://example.com/foo". This function is not responsible for stripping
+ *   bad protocols from URLs that come from user input. Code that calls this
+ *   function with a user-entered path/URL is responsible for stripping harmful
+ *   protocols from this argument by calling drupal_strip_dangerous_protocols().
+ *   After the url() function is called to construct the URL from $path and
+ *   $options, the resulting URL is passed through check_plain() before it is
+ *   inserted into the HTML anchor tag, to ensure well-formed HTML. See url()
+ *   for more information and notes.

This patch removes this PHPDoc change to l(). To be taken up in the other issue.

+++ includes/theme.inc 4 Jul 2010 18:55:04 -0000
@@ -1854,7 +1854,7 @@ function theme_item_list($variables) {
-  return '<div class="more-help-link">' . t('<a href="@link">More help</a>', array('@link' => check_url($variables['url']))) . '</div>';
+  return '<div class="more-help-link">' . l(t('More help'), drupal_strip_dangerous_protocols($variables['url'])) . '</div>';

This patch removes drupal_strip_dangerous_protocols() here. We're calling l(). l() itself is one of:
1) safe, as per chx's analysis in the other issue
2) not safe, and this is a problem to be solved within l() in the other issue
3) intentionally not safe to place the onus of protocol stripping on the caller, but this theme function is just an intermediary, so if that's the case, the onus should be on this function's caller, not on this function.
Regardless which of these 3 is the case, we should not be calling drupal_strip_dangerous_protocols() here, and we'll figure out and document the situation with l() in the other issue.

+++ includes/theme.inc 4 Jul 2010 18:55:04 -0000
@@ -1868,7 +1868,7 @@ function theme_more_help_link($variables
-    return '<a href="' . check_url($variables['url']) . '" title="' . $text . '" class="feed-icon">' . $image . '</a>';
+    return l($image, drupal_strip_dangerous_protocols($variables['url']), array('html' => TRUE, 'attributes' => array('class' => array('feed-icon'), 'title' => $text)));

same as above

+++ includes/theme.inc 4 Jul 2010 18:55:04 -0000
@@ -1918,7 +1918,7 @@ function theme_html_tag($variables) {
-  return '<div class="more-link">' . t('<a href="@link" title="@title">More</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
+  return '<div class="more-link">' . l(t('More'), drupal_strip_dangerous_protocols($variables['url']), array('attributes' => array('title' => $variables['title']))) . '</div>';

same as above

+++ modules/simpletest/tests/common.test 4 Jul 2010 18:55:07 -0000
@@ -82,7 +82,10 @@ class CommonURLUnitTest extends DrupalWe
-    $sanitized_path = check_url(url($path));
+    // Not check_url(), because l() is not responsible for stripping bad
+    // protocols. Code that calls l() with a path that comes from user input
+    // needs to do that.
+    $sanitized_path = check_plain(url($path));

This patch removes this hunk. To be addressed in the other issue, once we can properly document l().

Powered by Dreditor.

AttachmentSizeStatusTest resultOperations
drupal.check-url.77.patch13.57 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,027 pass(es).View details

#78

Status:needs review» reviewed & tested by the community

The smaller it gets the better.

#79

Status:reviewed & tested by the community» needs work
Issue tags:+API change

Thanks, this patch is much easier to follow now.

A couple of small things:

1. As far as I can tell, if we remove the $decode param of filter_xss_bad_protocol(), then this function is exactly the same as check_url(), so we should probably deprecate the entire function, and remove it in D8?

2. Let's change www.google.com to www.example.com in the test.

3. We still need explicit instructions for module/theme developers on how their code needs to change. I can almost piece it together from the patch, but would prefer a sentence or two from someone who worked on this patch.

#80

Status:needs work» reviewed & tested by the community

1. The $decode parameter defaults to TRUE. When we can remove it as a parameter, we will still have the decode_entities() call. That's the difference between filter_xss_bad_protocol() and check_url(): the former takes an already HTML string and the latter takes a plain-text string. For more info, see #56.

2. Done in this patch.

3. Module and theme developers don't *have* to do anything as a result of this patch. This patch makes a new function, drupal_strip_dangerous_protocols(), available to them, which they can use if they want to output 'src', 'href', 'action', and similar attributes using the drupal_attributes() function. This is the feature that #690980: Disabled form elements not properly rendered is waiting for. Anything that would *require* changes by modules/themes have been deferred to #845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols, and are still being debated there.

This patch only contains a minor test change as per #2, so straight to RTBC.

AttachmentSizeStatusTest resultOperations
drupal.check-url.80.patch13.57 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,102 pass(es).View details

#81

Issue tags:-API change

.

#82

Status:reviewed & tested by the community» fixed
Issue tags:+API change

Oh, d'oh! Thanks for pointing that out about decode defaulting to TRUE. /me clearly needs better glasses.

I guess I'm a bit dubious of the "opt-in"ness of this new function, because so many places in core are changing as a result of adding it. I can't imagine why contrib wouldn't want/need to make similar changes as well to keep step. But I think your description in #80.3 should hopefully be enough to inform developers of how they "can" react to the change, versus a "need to" react in #845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols, if that happens.

I'm restoring the API change tag though, since there was an API addition here, in any case. I'll let Randy make the call if it's worth an announcement to the devlist about it.

Committed to HEAD.

#83

I can't imagine why contrib wouldn't want/need to make similar changes as well to keep step.

Good point. I guess I would say that if a contrib module/theme has broken code that results in double check_plain(), similar to the broken HEAD code that started this issue, then the contrib module/theme now has a way to fix it. So if contrib code is passing the result of check_url() to something that will end up calling check_plain() (for example, an @ placeholder in t(), or an attribute value passed to drupal_attributes()), it should replace the call to check_url() with a call to drupal_strip_dangerous_protocols() instead.

#84

I'm only trying to announce BC breaks, not API additions, so as I read this there's no announcement needed. Let me know if I'm wrong.

#85

#86

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#87

Just today, I learned that we could have solved this perhaps less elegantly, but once for all:

#882438: Globally prevent double encoding in check_plain() by raising minimum PHP to 5.2.3

nobody click here