Posted by sun on February 15, 2010 at 7:44pm
17 followers
| Project: | Drupal core |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | sun |
| Status: | closed (fixed) |
| Issue tags: | API change, security |
Issue Summary
http://api.drupal.org/api/function/theme_more_link/7
invokes http://api.drupal.org/api/function/check_url/7
which invokes http://api.drupal.org/api/function/filter_xss_bad_protocol/7
which invokes check_plain()
but http://api.drupal.org/api/function/theme_more_link/7 additionally uses @link to insert the URL:
<?php
return '<div class="more-link">' . t('<a href="@link" title="@title">more</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
?>which invokes check_plain() again.
The result is check_plain(check_plain($variables['url'])).
Two options to resolve this:
1) Use !-placeholder instead of @-placeholder.
2) Remove check_url(), keep @-placeholder.
or:
3) Do nothing and update the documentation of check_url() with proper usage explanation + example code snippets.
Comments
#1
Let's go for 1 + 3.
#2
looks great.
#3
How about a test?
#4
If you insist :)
Just a trivial test addition: straight to RTBC.
#5
More or less the same problem has been identified in #555830: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title"
#6
Hm. Looking at the list of functions that invoke http://api.drupal.org/api/function/check_url/7, I see a couple of other candidates, for example:
http://api.drupal.org/api/function/template_preprocess_html/7 does
'href' => check_url($favicon),but the subsequently invoked http://api.drupal.org/api/function/drupal_add_html_head_link/7 does
check_plain($attributes['href'])once again.I'm starting to question why http://api.drupal.org/api/function/filter_xss_bad_protocol/7 invokes check_plain() at all.
#7
I think the question is whether one should assume that the result of a check_*() function or a filter_*() function can be safely printed. But then they're not chainable (i.e., you can't do check_plain(check_url($foo))). Or, if each one should only do its legitimate scope and be chainable, but then developers would need to know which ones can be safely printed, and which ones must go through another check_*() or filter_*() step.
I think either way is kind of a mess. Really not sure what a good answer is.
If we keep filter_xss_bad_protocol() calling check_plain(), then we have the problem in #555830: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title". One solution to that problem is to have drupal_attributes() not always do a check_plain(), but instead do a check_url() for some attributes (href, src), and a check_plain() for others, but that somehow feels fragile. Or to make drupal_attributes() more like t(), where you can pass in odd characters like @ and ! to control what is check_plain()'d, but man, that kind of sucks too.
#8
.
#9
Subscribing. Curious about how you guys will fix this. :)
#10
I'm not sure either. At the very least, I'd say that the check_plain() actually belongs into check_url(), and not into filter_xss_bad_protocol(). But that doesn't solve the unpredictable chaining problem.
The problem is not really new, but gets more apparent due to D7's multi-layer rendering system. Previously, we simply said: "Everything that gets output in a theme function has to be sanitized." Since there was only one theme function, no problem. In D7, we have an entire stack of #pre_render, #theme, #theme_wrappers functions, from which every single may - or may not - be the last instance in the rendering process. Yuck.
Therefore, bumping to critical + adding security tags.
#11
this patch is RTBC,
always use
<a href="!url">notand enclose it in check_url() when dealing with variables:<a href="@url">t('My website: !url', array('!url' => check_url($website)));no need for check_url or anything else when using a string constant:
t('Drupal handbooks: !url', array('!url' => 'http://drupal.org/handbooks'));check_url() is the function to sanitize urls, so I do not understand why someone would like to use check_plain() or @url..
#596770: !url in t()
#12
@Pasqualle: so what's the right way to add a 'src' or 'href' to a $attributes array, as in #555830-19: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title"? drupal_attributes() calls check_plain() on all attribute values.
#13
There is no point at all in check_url() encoding the URL. It is there to check it, nothing else. I would remove this behavior from check_url() (and as a consequence from filter_xss_protocol).
#14
Thanks for your input, Damien! We definitely need some more security team members on this issue.
While I agree that it's wonky that check_url() check_plain()s, I'm not sure whether it's a good idea to remove the check_plain() from it. Looking at the list of functions that call http://api.drupal.org/api/function/check_url/7, at least a couple of them want to use the returned value directly in XML/HTML output, so those would actually have to call:
check_plain(check_url($url));And we have the issue that all check_* functions are - as of now - supposed to return values that are safe for outputting.
To resolve the issue(s) at hand, how about moving the check_plain() into check_url(), and updating many of the current calls to check_url() to invoke filter_xss_bad_protocol() instead? (This partially means to add a check_plain() further down in the rendering chain, as in one of the given examples above.)
#15
If we move check_plain() into check_url(), then instead of code that wants to check_url() without check_plain()ing having to call filter_xss_bad_protocol(), what about adding a optional parameter to check_url()? ($sanitize=TRUE or $check_plain=TRUE)
#16
You can ignore this follow-up. Just mentioning, similar problem, weird solution: http://api.drupal.org/api/function/drupal_set_title/7
#17
The following patch:
#18
I'm very worried about themers, and would highly prefer to not remove check_url(), but move the check_plain() into it. The other changes of this patch look good.
However, didn't look up whether this patch actually solves all issues identified in this thread.
#19
This still needs work to take into account the rest of what this issue is about, but here's a start that goes in the direction that I think sun wants. Also, I'm not so sure about #17's removal of the $decode parameter from filter_xss_bad_protocol(). What about the needs of _filter_xss_attributes() and potentially similar functions? This just changes the default for it and couples the check_plain() at the end to it.
#20
CNR for bot, but still "needs work".
#21
Re-reading this, I guess it's true for module-generated URLs. But it's not valid for URLs coming from user input. The latter being the reason for additionally stripping bad protocols.
Considering this, I think the latest patch makes sense, but we want to ensure that we only invoke check_url()
#22
Like this?
Please review both the patch, and apply it and grep for remaining places where calls to check_url() were left in tact.
#23
+++ includes/install.core.inc 10 Apr 2010 15:44:17 -0000@@ -723,7 +723,7 @@ function install_verify_requirements(&$i
- $status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_url(request_uri())));
+ $status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_plain(request_uri())));
!url + check_plain() => @url
#24
+++ includes/common.inc 10 Apr 2010 15:52:16 -0000@@ -1422,25 +1432,28 @@ function _filter_xss_attributes($attr) {
-function filter_xss_bad_protocol($string, $decode = TRUE) {
+function filter_xss_bad_protocol($string, $html_encoded = FALSE) {
Perhaps we don't need to change the default of 2nd param from TRUE to FALSE? What do you think? Senseless kitten killing?
114 critical left. Go review some!
#25
With #24.
#26
+++ includes/theme.inc 13 Apr 2010 16:44:56 -0000
@@ -2174,9 +2174,9 @@ function template_preprocess_html(&$vari
// Add favicon.
if (theme_get_setting('toggle_favicon')) {
- $favicon = theme_get_setting('favicon');
+ $favicon = filter_xss_bad_protocol(theme_get_setting('favicon'), FALSE);
$type = theme_get_setting('favicon_mimetype');
- drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => check_url($favicon), 'type' => $type));
+ drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => $favicon, 'type' => $type));
}
@@ -2358,9 +2358,9 @@ function theme_get_suggestions($args, $b
function template_preprocess_maintenance_page(&$variables) {
// Add favicon
if (theme_get_setting('toggle_favicon')) {
- $favicon = theme_get_setting('favicon');
+ $favicon = filter_xss_bad_protocol(theme_get_setting('favicon'), FALSE);
$type = theme_get_setting('favicon_mimetype');
- drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => check_url($favicon), 'type' => $type));
+ drupal_add_html_head_link(array('rel' => 'shortcut icon', 'href' => $favicon, 'type' => $type));
}
I dislike a function like this needing to call filter_xss_bad_protocol() directly. This patch fixes that, enabling theme functions that add to a $attributes array to still use check_url().
This patch also adds a unit test.
#27
I reviewed and tested filter_xss_bad_protocol-fix-715142-26.patch and found it to work fine.
Is there a reason that the theme functions use the t() function instead of the l() function? If not, I'd be happy to reroll the patch using the l() function.
#28
As per #27, here's a patch that offers l() function usage in appropriate places.
Please review.
#29
Submitter reivewed 17k patch.
Submitter re-rolled to make changes from t() to l() function.
I reviewed with code him.
We ran automated tests.
We manually tested the theme functions.
Looks good to me. Ship it!
#30
+++ includes/theme.inc 22 Apr 2010 22:43:07 -0000@@ -1881,7 +1881,7 @@ function theme_more_help_link($variables
function theme_feed_icon($variables) {
...
- return '<a href="' . check_url($variables['url']) . '" title="' . $text . '" class="feed-icon">' . $image . '</a>';
+ return l($image, $variables['url'], array('html'=>TRUE, 'attributes' => array('class' => 'feed-icon', 'title' => $text)));
1) Missing spaces for 'html' key.
2) The 'class' attribute is always an array.
+++ includes/theme.inc 22 Apr 2010 22:43:07 -0000@@ -1931,7 +1931,7 @@ function theme_html_tag($variables) {
function theme_more_link($variables) {
- return '<div class="more-link">' . t('<a href="@link" title="@title">More</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
+ return '<div class="more-link">' . l('More', $variables['url'], array('attributes' => array('title' => $variables['title']))) . '</div>';
Missing t() for 'More'. Not sure whether this change is good for translatability (probably not).
Powered by Dreditor.
#31
Thanks.
Incorporated suggested changes.
Ready for another round of testing.
#32
Looks good to me.
+ // Not check_url(), because l() is not responsible for stripping bad+ // protocols. Code that calls l() with a path that comes from user input
+ // needs to do that.
Do we have this documented somewhere?
#33
Good thinking. That inline comment should stay there, but we additionally want to add a detailed note about this to the function docblock of l().
#34
Adding a docblock to subsequent patch. Should this go in a separate issue?
#35
No, this belongs into this patch. And we need to merge this addition into the existing patch. However, that note should be more prominent and not be buried into a @param description (i.e. above the function parameters instead).
#36
Rerolling.
#37
+++ update.php 23 Apr 2010 16:47:34 -0000
@@ -317,7 +317,7 @@ function update_check_requirements() {
- $status_report .= 'Check the error messages and <a href="' . check_url(request_uri()) . '">try again</a>.';
+ $status_report .= 'Check the error messages and <a href="' . check_plain(request_uri()) . '">try again</a>.';
+++ includes/install.core.inc 23 Apr 2010 16:47:36 -0000
@@ -723,7 +723,7 @@ function install_verify_requirements(&$i
- $status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_url(request_uri())));
+ $status_report .= st('Check the error messages and <a href="@url">proceed with the installation</a>.', array('@url' => request_uri()));
+++ includes/theme.maintenance.inc 23 Apr 2010 16:47:37 -0000
@@ -157,7 +157,7 @@ function theme_install_page($variables)
- $variables['content'] .= '<p>' . st('Check the error messages and <a href="!url">try again</a>.', array('!url' => check_url(request_uri()))) . '</p>';
+ $variables['content'] .= '<p>' . st('Check the error messages and <a href="@url">try again</a>.', array('@url' => request_uri())) . '</p>';
I think that wherever request_uri() is used, we need to use check_url(), since I think the URI can be easily spoofed.
+++ includes/common.inc 23 Apr 2010 16:47:35 -0000@@ -1422,25 +1433,36 @@ function _filter_xss_attributes($attr) {
* @param $string
- * The string with the attribute value.
...
+ * The string that contains one or more URLs within it. Can either be a plain
+ * text string or be encoded for HTML.
"one or more URLs within it" ?
Not sure whether we can drop the second sentence. (The $html_encoded @param directly follows and explains it better.)
+++ includes/common.inc 23 Apr 2010 16:47:35 -0000@@ -2136,6 +2163,10 @@ function drupal_attributes(array $attrib
+ * This function is not responsible for stripping bad protocols from URLs
+ * that come from user input. You should use check_url() function in that
+ * case.
+ *
* @param $text
* The link text for the anchor tag.
* @param $path
I'm not sure I understand how to use check_url() in combination with l(), considering not only $path, but also $options...
We want to add a @code example here.
+++ includes/theme.inc 23 Apr 2010 16:47:37 -0000
@@ -1867,7 +1867,7 @@ function theme_item_list($variables) {
function theme_more_help_link($variables) {
- return '<div class="more-help-link">' . t('<a href="@link">More help</a>', array('@link' => check_url($variables['url']))) . '</div>';
+ return '<div class="more-help-link">' . l(t('More help'), $variables['url']) . '</div>';
@@ -1881,7 +1881,7 @@ function theme_more_help_link($variables
function theme_feed_icon($variables) {
...
- return '<a href="' . check_url($variables['url']) . '" title="' . $text . '" class="feed-icon">' . $image . '</a>';
+ return l($image, $variables['url'], array('html' => TRUE, 'attributes' => array('class' => array('feed-icon'), 'title' => $text)));
@@ -1931,7 +1931,7 @@ function theme_html_tag($variables) {
function theme_more_link($variables) {
- return '<div class="more-link">' . t('<a href="@link" title="@title">More</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
+ return '<div class="more-link">' . l(t('More'), $variables['url'], array('attributes' => array('title' => $variables['title']))) . '</div>';
(mapping to the above) So where is $variables['url'] filtered?
Powered by Dreditor.
#38
Looks ready to fly for me.
#39
This looks good to me, but I'd like someone on the security team, rather than me, to RTBC it.
#40
This patch is a prerequisite for #690980: Disabled form elements not properly rendered
#41
I'm not really sure what the different "security" tags mean, but adding the base one, in case that's the one that people best able to review this issue have bookmarked.
#42
+++ includes/common.inc 29 Apr 2010 20:12:29 -0000@@ -1199,10 +1199,21 @@ function flood_is_allowed($name, $thresh
+ * Strip harmful protocols (e.g. 'javascript:') from a URL and optionally make it safe for output to HTML.
+ *
+ * @param $uri
+ * A plain-text URI that might contain harmful protocols.
+ * @param $check_plain
+ * (optional) Whether to also call check_plain() so that the result can be
+ * safely output to HTML. Defaults to TRUE, but can be set to FALSE when the
+ * result of this function will not be output to HTML (e.g. to a plain-text
+ * e-mail instead), or when it will be passed to another function that expects
+ * a plain-text string instead of an HTML-encoded string (e.g. l(), t(), or
+ * drupal_attributes()).
+ */
+function check_url($uri, $check_plain = TRUE) {
+ $uri = filter_xss_bad_protocol($uri, FALSE);
+ return $check_plain ? check_plain($uri) : $uri;
I don't like the fact that we added the $check_plain parameter. It is better not to clobber those IMO and to call things as follows:
check_plain(check_url($foo))
That is actually more transparent. Thoughts on that? I might be the only one that thinks that.
#43
@Dries: you're not the only one who thinks that. Damien agrees with you (#13). But sun is concerned about themers (#18) and I agree with him. I think it's too much to ask themers implementing theme functions or preprocess functions to remember that check_markup() and check_plain() return strings that are safe for printing, but that check_url() does not.
check_url() in HEAD, and in D5 and D6 returns a string that has already been check_plain()'d (via filter_xss_bad_protocol()). This patch is about keeping that a default, but making it possible to not check_plain() so that the result can be passed to t(), l(), drupal_attributes(), and similar functions. IMO, changing the default of what check_url() returns, especially to something less secure, seems unwise for D7.
#44
eff nailed it.
#45
sounds like eff is supporting the patch (and not suggesting it needs work) ... so back to rtbc to see what Dries says about #43... OH, wait. Needs retesting first, and a security review before rtbc, I think.
#46
#38: drupal.check-url.38.patch queued for re-testing.
#47
According to #40, not only is this issue critical, but it is a pre-req of another critical. So, changing the title in the hopes that it attracts attention from someone who can provide a sufficient security analysis of the patch, and if it looks good, RTBC it.
#48
.
#50
#38: drupal.check-url.38.patch queued for re-testing.
#51
needs a security review (others have said it's ok to rtbc after security review).
#52
Somebody needs to bring in the contexts from http://acko.net/blog/safe-string-theory-for-the-web and add to the issue at least what kind of escapes the different checking functions do. Only then can we determine whether the patch is correct. Changing check_url to check_plain makes me go hm, but I can't say for sure it's wrong.
#53
I stick with #17, we are mixing things that are way too different to be mixed:
javascript:to get throughCurrently,
check_url()is basically a wrapper aroundcheck_plain(filter_xss_bad_protocol($url)). This is bad, because nobody (including the colloquial "themer") understand what this means.Most of the time, when you are properly using t() or drupal_attributes(), you don't need to take care about the check_plain() yourself:
<?php$status_report .= st('Check the error messages and <a href="!url">proceed with the installation</a>.', array('!url' => check_url(request_uri()))));
?>
should be written as:
<?php$status_report .= st('Check the error messages and <a href="@url">proceed with the installation</a>.', array('@url' => filter_xss_bad_protocol(request_uri())));
?>
#54
#17: 715142-remove-check-url.patch queued for re-testing.
#55
patch in #17 was suggested in #53, seeing what the testing bot thinks of it.
also, sounds like there is no consensus on what approach to take... does that make this issue needs work? or more reviews?
#56
I think this should stay as "needs review" until there's consensus on what the next re-roll needs to look like. Then it can be "needs work".
From #17:
+++ modules/comment/comment.tokens.inc@@ -167,7 +167,7 @@ function comment_tokens($type, $tokens, array $data = array(), array $options =
- $replacements[$original] = $sanitize ? filter_xss_bad_protocol($comment->homepage) : $comment->homepage;
+ $replacements[$original] = $sanitize ? check_plain(filter_xss_bad_protocol($comment->homepage)) : $comment->homepage;
So, that patch changes filter_xss_bad_protocol() from returning something that can be printed (HEAD) to returning something that can't be printed (patch). Is this an acceptable thing to do this late in D7? Note that filter_xss() returns something that can be printed. This can be somewhat mitigated by retaining check_url() as a wrapper for check_plain(filter_xss_bad_protocol()), but it still means any contrib code currently calling filter_xss_bad_protocol() directly and expecting back a string that's safe for printing will need to be changed.
My reading of HEAD is that we currently have two patterns: filter_xss() and filter_xss_bad_protocol() take strings that are already HTML-encoded and return HTML-encoded strings. check_plain() and check_url() take plain-text strings and return HTML-encoded strings. What we need to add is something that takes a plain-text string and returns a plain-text string, but with bad protocols stripped. Some options:
Powered by Dreditor.
#57
Fair analysis. I'm happy with solution #4 above, if there are buy-ins from others.
#58
I would add a new placeholder to t() handling this, too. * perhaps?
Edit: and yes 4)
#59
This is a re-roll of #38, adjusted for #56.4.
I didn't do this in this patch, because it seems like it would introduce too many permutations needing special symbols (i.e., a symbol for drupal_strip_dangerous_protocols(), a separate symbol for check_url(), and maybe others). Since protocol stripping only needs to happen for user-entered text, I'm not sure there's enough use-cases to warrant adding more complexity within t().
The reason I'm leaving [needs security review] in this issue title is that in addition to the decision reached in #56.4, this patch continues what #38 did in additionally changing some places that used to call check_url() to just calling check_plain() (as discussed in earlier comments). The idea being that protocol stripping is something to be done for user-entered text only, not generically whenever we have a URL. This makes sense to me, but I'd feel better knowing someone from the security team had a chance to approve this (@chx, @Damien: I don't know if you're officially on the security team, but if you feel comfortable RTBC'ing, please do so).
#60
Re-roll of #59 because of failure in form.inc alterations. Looking into security concerns, so far I can't see any fault.
#61
Coltrane doesn't see any security problems. How many security reviews do we need?
#62
1, but it needs to be definitive, by the reviewer RTBC'ing the issue.
Confirming that #60 is a straight re-roll with no actual code changes. Therefore, @coltrane: feel free to RTBC if you're confident about this being ok from a security standpoint.
#63
For anyone joining this issue late, see last paragraph of #59 for the security review that's needed.
#64
Setting to RTBC
#65
Thanks coltrane! Therefore, removing the [needs security review] issue title prefix.
But, this issue is a pre-req for #690980: Disabled form elements not properly rendered, which itself is a beta blocker, which makes this issue a "beta blocker blocker", so let's get it in. Thanks!
#66
I'm... honestly scratching my head how we got from the initial bug report (a module accidentally double-check-plained a URL in a string which could be solved in 8 seconds by switching @ to !) to a sweeping API change with security implications, several months and several alphas into our release cycle. I'm even more scratching my head why the behaviour of check_url(), filter_xss_protocol(), etc. that's been part of Drupal since time immemorial is suddenly a critical, beta blocking issue.
drupal_strip_dangerous_protocols() looks like a nice helper function when you don't want the HTML-escaping behaviour of the other functions. What I don't understand is why so many functions needed to change here, and of course they'll need to change in contrib as well.
I've read the issue, twice. I'm still scratching my head. Could someone maybe write the documentation that will need to go out to the devel list to explain what the hell happened and how they need to change all their modules and themes? That might help me understand.
I also tried running a URL through a bunch of functions:
original url: javascript://www.google.com/?q=blah&something=somethingcheck_url(): //www.google.com/?q=blah&something=something
filter_xss_bad_protocol(): //www.google.com/?q=blah&something=something
url(): /core/javascript%3A//www.google.com/%3Fq%3Dblah%26something%3Dsomething
It would be great to hear what of those outputs is undesirable, and what effect this patch has specifically.
#67
Some of this patch is a necessary fix to avoid double HTML escaping. That's why we need a new function. Aside from adding a new function, I can't see another API change. Correct me if I am wrong.
However, quite some hunks (like the more link itself which sparked this issue which can easily be fixed by changing a @ into a !) look like a performance optimization to me and makes this patch into the proverbial kitten killer monster. As far as I can see, changing check_url(url()) to check_plain(url()) saves an expensive protocol-stripping call but there is no change in the output whatsoever. These do not belong in this patch and make it very hard to read and evaluate.
Also:
- $action = $element['#action'] ? 'action="' . check_url($element['#action']) . '" ' : '';+ $action = $element['#action'] ? 'action="' . check_plain($element['#action']) . '" ' : '';
Why exactly we do not need a protocol stripping off actions??
#68
Also, why
+ // Not check_url(), because l() is not responsible for stripping bad+ // protocols. Code that calls l() with a path that comes from user input
+ // needs to do that.
is this in the tests and not l() doxygen?
#69
Yes, the initial bug report was just about theme_more_link() where a change from @ to ! was all that was needed. Then we started fixing problems in theme_image() in #555830: Clean up theme_image() to use drupal_attributes() for all attributes and revisit defaults for "alt" and "title", and discovered that it was impossible to set
$attributes['src'] = check_url(...), because drupal_attributes() does a check_plain(), and unlike t(), there's no special symbol for turning that off: i.e., there's nothing like$attributes['!src']. #10 summarizes why this is critical in D7, even though it's a legacy problem from D6. So most of this thread debated how to decouple protocol stripping from check_plain(): this is summarized in #56, and #56.4 was chosen.This patch deals with 2 questions:
1) When should protocol stripping be done?
2) What is the API of check_url() and filter_xss_bad_protocol() and the new function drupal_strip_dangerous_protocols()?
#1 will be answered later, in context of chx's review.
As for #2, there is no API change in this patch with respect to check_url() and filter_xss_bad_protocol(). check_url() takes a plain-text string, strips protocols, and returns an HTML-encouded string. That's true in HEAD and is unaffected by this patch. filter_xss_bad_protocol() without the 2nd argument passed takes an HTML-encoded string, strips protocols, and returns an HTML-encouded string. That's true in HEAD and is unaffected by this patch. filter_xss_bad_protocol() with a 2nd argument of FALSE takes a plain-text string, strips protocols, and returns an HTML-encouded string. That's true in HEAD and is unaffected by this patch in order to retain BC, though comments are added in the patch to makes it clear that such a parameter in filter_xss_bad_protocol() is useless and should be removed in D8. This patch adds a new function, drupal_strip_dangerous_protocols(), for taking a plain-text string, stripping protocols, and returning a plain-text string, addressing the drupal_attributes() use-case.
Ok, back to #1 from above: it's not just a performance optimization: it's a question of when should protocols be stripped and when they should not be. The output is the same when the input string doesn't contain dangerous protocols, but the output is different when the input string does contain dangerous protocols. Why does l() in HEAD call check_plain(url()) and not check_url(url())? And why does theme_image() in D6 call check_url() and not check_plain()? (Note: theme_image() for D7 was changed in the previously mentioned issue). Which functions should be responsible for stripping protocols and which should not be? sun answers this question in #21: the only functions that should strip protocols are the ones that are dealing with strings that come from user input. l() doesn't strip protocols because it's a generic function: maybe a module wants to call it with $path = 'javascript:...' or $path = 'somecustomprotocol:...', and l() should not force that to be invalid. The same answer applies to theme_image(), which is why it's been changed already in that other issue. And the same answer applies to theme_form() with respect to #action. It's a mistake that HEAD calls check_url() in so many places, because protocols should only be stripped in the functions that deal with strings that come from user input.
Maybe this change to limiting where protocol stripping is done should have been a separate issue. But it seemed to be relevant to this one, since this is where we realized the need to decouple protocol stripping from html encoding. Anyway, it is this aspect that made it take so long to reach RTBC status, because someone with real, in-the-wild, Drupal XSS attack experience needed to review all these hunks that change check_url() to check_plain() and similar removals of protocol stripping. But look at the comments from #59-#64, and I think as long as we trust that coltrane performed an adequate security review of these hunks, then I feel good about them.
Re #68:
The patch does add this info to the PHPDoc of l(). The comment is also added in the test, because the test in HEAD is wrong (i.e., inconsistent with l()) and is being fixed.
CNR for chx to review this comment and decide whether to RTBC or give additional feedback.
#70
This issue needs the "API Change" tag, because of things like the change to theme_form(). In other words, with this patch, a module is responsible for stripping protocols when setting a form's #action to user-entered text. The module would no longer be able to rely on theme_form() stripping those protocols. This is applying the same pattern that is already the case with l() and theme_image() to the rest of core.
#71
We can go that way but massive amounts of doxygen is needed. It needs to be crystal clear which function are such 'generics' that can take anything and it's the caller who needs to the protocol stripping via the new function. Also as
lruns throughurl()it will not recognize custom protocols (say, skype: or magnet:) as an external URL due to the external URL checking happens viafilter_xss_bad_protocol. So the documentation for l should mention that if it is indeed your wish to allow for skype: then use the filter_allowed_protocols $conf override.Edit: also, although it looks like we took a long time to reach RTBC status, the patch / issue was essentially restarted in #56.
#72
OK, I'm splitting this up into 2 issues. This patch now only addresses #69.2, which is the title and original use-cases of this issue. I will open a new issue for #69.1, since in #71, chx says that will require more discussion and documentation.
Removing "API Change" tag, because this patch does not have an API change. It has an API addition of drupal_strip_dangerous_protocols() only.
Removing specific Security tags, because this patch contains no change with respect to where protocol stripping happens.
Leaving general "security" tag, because this patch touches on security issues, including an extra comment in url() and l() explaining the already existing behavior of those functions. Also, this patch includes a hunk that fixes the comment:homepage token.
This patch is still critical, and beta blocking, because it's a pre-req for #690980: Disabled form elements not properly rendered. Comment #89 on that issue shows the dependent hunks.
#73
#845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols
#74
Little kitty saved! :)
#75
Nope. The new doxygen is incorrect. See #66 url() and #845000-3: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols for a detailed analysis.
#76
Which doxygen is incorrect?
#77
Very interesting analysis in #75. Understanding and properly documenting url() and l() in that issue will be very important. In the meantime, simplifying this patch even further based on that feedback:
+++ includes/common.inc 4 Jul 2010 18:55:02 -0000@@ -1942,6 +1983,10 @@ function format_username($account) {
+ * - This function is not responsible for stripping bad protocols from URLs
+ * that come from user input. Code that calls this function with a
+ * user-entered path/URL is responsible for stripping harmful protocols from
+ * this argument by calling drupal_strip_dangerous_protocols().
This patch removes this PHPDoc addition to url(). To be taken up in the other issue.
+++ includes/common.inc 4 Jul 2010 18:55:02 -0000@@ -2174,10 +2219,14 @@ function drupal_attributes(array $attrib
- * "http://example.com/foo". After the url() function is called to construct
- * the URL from $path and $options, the resulting URL is passed through
- * check_plain() before it is inserted into the HTML anchor tag, to ensure
- * well-formed HTML. See url() for more information and notes.
+ * "http://example.com/foo". This function is not responsible for stripping
+ * bad protocols from URLs that come from user input. Code that calls this
+ * function with a user-entered path/URL is responsible for stripping harmful
+ * protocols from this argument by calling drupal_strip_dangerous_protocols().
+ * After the url() function is called to construct the URL from $path and
+ * $options, the resulting URL is passed through check_plain() before it is
+ * inserted into the HTML anchor tag, to ensure well-formed HTML. See url()
+ * for more information and notes.
This patch removes this PHPDoc change to l(). To be taken up in the other issue.
+++ includes/theme.inc 4 Jul 2010 18:55:04 -0000@@ -1854,7 +1854,7 @@ function theme_item_list($variables) {
- return '<div class="more-help-link">' . t('<a href="@link">More help</a>', array('@link' => check_url($variables['url']))) . '</div>';
+ return '<div class="more-help-link">' . l(t('More help'), drupal_strip_dangerous_protocols($variables['url'])) . '</div>';
This patch removes drupal_strip_dangerous_protocols() here. We're calling l(). l() itself is one of:
1) safe, as per chx's analysis in the other issue
2) not safe, and this is a problem to be solved within l() in the other issue
3) intentionally not safe to place the onus of protocol stripping on the caller, but this theme function is just an intermediary, so if that's the case, the onus should be on this function's caller, not on this function.
Regardless which of these 3 is the case, we should not be calling drupal_strip_dangerous_protocols() here, and we'll figure out and document the situation with l() in the other issue.
+++ includes/theme.inc 4 Jul 2010 18:55:04 -0000@@ -1868,7 +1868,7 @@ function theme_more_help_link($variables
- return '<a href="' . check_url($variables['url']) . '" title="' . $text . '" class="feed-icon">' . $image . '</a>';
+ return l($image, drupal_strip_dangerous_protocols($variables['url']), array('html' => TRUE, 'attributes' => array('class' => array('feed-icon'), 'title' => $text)));
same as above
+++ includes/theme.inc 4 Jul 2010 18:55:04 -0000@@ -1918,7 +1918,7 @@ function theme_html_tag($variables) {
- return '<div class="more-link">' . t('<a href="@link" title="@title">More</a>', array('@link' => check_url($variables['url']), '@title' => $variables['title'])) . '</div>';
+ return '<div class="more-link">' . l(t('More'), drupal_strip_dangerous_protocols($variables['url']), array('attributes' => array('title' => $variables['title']))) . '</div>';
same as above
+++ modules/simpletest/tests/common.test 4 Jul 2010 18:55:07 -0000@@ -82,7 +82,10 @@ class CommonURLUnitTest extends DrupalWe
- $sanitized_path = check_url(url($path));
+ // Not check_url(), because l() is not responsible for stripping bad
+ // protocols. Code that calls l() with a path that comes from user input
+ // needs to do that.
+ $sanitized_path = check_plain(url($path));
This patch removes this hunk. To be addressed in the other issue, once we can properly document l().
Powered by Dreditor.
#78
The smaller it gets the better.
#79
Thanks, this patch is much easier to follow now.
A couple of small things:
1. As far as I can tell, if we remove the $decode param of filter_xss_bad_protocol(), then this function is exactly the same as check_url(), so we should probably deprecate the entire function, and remove it in D8?
2. Let's change www.google.com to www.example.com in the test.
3. We still need explicit instructions for module/theme developers on how their code needs to change. I can almost piece it together from the patch, but would prefer a sentence or two from someone who worked on this patch.
#80
1. The $decode parameter defaults to TRUE. When we can remove it as a parameter, we will still have the decode_entities() call. That's the difference between filter_xss_bad_protocol() and check_url(): the former takes an already HTML string and the latter takes a plain-text string. For more info, see #56.
2. Done in this patch.
3. Module and theme developers don't *have* to do anything as a result of this patch. This patch makes a new function, drupal_strip_dangerous_protocols(), available to them, which they can use if they want to output 'src', 'href', 'action', and similar attributes using the drupal_attributes() function. This is the feature that #690980: Disabled form elements not properly rendered is waiting for. Anything that would *require* changes by modules/themes have been deferred to #845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols, and are still being debated there.
This patch only contains a minor test change as per #2, so straight to RTBC.
#81
.
#82
Oh, d'oh! Thanks for pointing that out about decode defaulting to TRUE. /me clearly needs better glasses.
I guess I'm a bit dubious of the "opt-in"ness of this new function, because so many places in core are changing as a result of adding it. I can't imagine why contrib wouldn't want/need to make similar changes as well to keep step. But I think your description in #80.3 should hopefully be enough to inform developers of how they "can" react to the change, versus a "need to" react in #845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols, if that happens.
I'm restoring the API change tag though, since there was an API addition here, in any case. I'll let Randy make the call if it's worth an announcement to the devlist about it.
Committed to HEAD.
#83
Good point. I guess I would say that if a contrib module/theme has broken code that results in double check_plain(), similar to the broken HEAD code that started this issue, then the contrib module/theme now has a way to fix it. So if contrib code is passing the result of check_url() to something that will end up calling check_plain() (for example, an @ placeholder in t(), or an attribute value passed to drupal_attributes()), it should replace the call to check_url() with a call to drupal_strip_dangerous_protocols() instead.
#84
I'm only trying to announce BC breaks, not API additions, so as I read this there's no announcement needed. Let me know if I'm wrong.
#85
FYI: #80 created a regression: #858874: The more link in the node module's "Recent content" block always links to http://admin/content, which I marked as a duplicate of #845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols, since the latter issue will have to encompass the former.
#86
Automatically closed -- issue fixed for 2 weeks with no activity.
#87
Just today, I learned that we could have solved this perhaps less elegantly, but once for all:
#882438: Globally prevent double encoding in check_plain() by raising minimum PHP to 5.2.3