Warn about potentially insecure filter configurations
catch - June 27, 2008 - 17:55
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | filter.module |
| Category: | task |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | needs review |
| Issue tags: | FilterSystemRevamp, Release blocker, Security improvements, wysiwyg |
Description
If an input format is configured with the php filter or withut the html filter, and is enabled for anonymous/authenticated users, we should have hook_requirements spit out a warning. Cwgordon noted it should only do this if registration is open, which might avoid it being overly annoying for sites where it's not likely to be a security risk

#1
#2
i would at least go one step further, to disallow any input format that contains the php filter or without the html filter for anonymous users...
+1 for warning when registration is public. (at least, mabye even disallow)
#3
Big -1 for absolutely disallowing users to do what they want with the filters. If this is on an intranet, there may be no need for such security. Of course, that's probably a small minority, but making that option impossible without hacking core is REALLY not the right way to do this.
I am, however, also against any warning to the user that is persistent. Anything that sits around and is red is incredibly annoying and hurts usability because of that. I would think something like a confirm form that further explains the reasons why this may be bad would be good, but we should be wary of adding warnings that aren't so much educational and just serve to annoy (just look at UAC in Vista).
#4
Patch implements the functionality described by catch:
For sites which allow open registration, an error status message is displayed when anonymous users have access to an input filter which does not filter HTML or the PHP input filter.
Like Susurrus I disagreed with comment #2's suggestion to disallow such configurations completely.
I think that using the error level of status problem is appropriate however. A site which allowed open and registration and did not have proper filtration but which would receive this message in vain would be incredibly rare. An example would be a site which was secured by htaccess or some other means, or a site with its own custom filtration mechanism. I think it's worth bothering such a rare site with a potentially unnecessary error message for the benefit of the vast majority of sites.
#5
patch looks visually solid, however I didn't test it yet.
I agree on error status, and on those edge cases, well... you shouldn't trust on a single line of defense ;)
maybe adding some tests?
#6
Yes, need a test.
#7
I wrote tests and improved the code a lot (writing the tests helped...)
I split the hook_requirements code into filter.install and php.install (as well as putting the tests separately.)
Filter functionality:
An error message is displayed on the status page when:
- The default format does not contain the HTML filter.
OR
- Anonymous users can access a format that does not contain the HTML filter.
OR
- Authenticated users can access a format that does not contain the HTML filter and user registration is open.
PHP Filter functionality:
An error message is displayed on the status page when:
- The default evaluates PHP.
OR
- Anonymous users can access a format that evaluates PHP.
OR
- Authenticated users can access a format that evaluates PHP and user registration is open.
The tests ensure error messages are displayed under these conditions.
#8
can't you call the database outside the loop?
#9
I looks like the database query could indeed be outside the loop.
Also I think we should warn about this on sites that have administrator approval for new users too - just because you restrict who can join your site doesn't mean they should be able to pwn it once their in.
#10
I'll work on those two changes.
#11
OK, new patch
Took the db query out of the loop and extended the error message to user registration set to open w/ approval. Updated tests.
#12
Patch applies fine for me so not sure why the testbed fails. Here's the same patch without the initial ? files.
#13
The testbed has been failing all weeks due to a memory issue on the server. I'll try to test this asap.
#14
The last submitted patch failed testing.
#15
This is stupid. What about input formats based on filters like Textile, Markdown or BBCode which escape all HTML? Forcing their input or output to be run through the HTML filter is undesirable. On the input side, you may have [code][/code] tags that are safe to use. On the output side, you may have advanced features (like a YouTube embed) that output tags whose general form is unsafe, but whose specific form isn't.
The proper solution is to let filters advertise themselves as "sanitizing", and have a standard testbed of XSS filters to verify this as much as possible. Or, to restructure the input format system so that every format must start with a special 'input filter' whose responsibility it is to output safe HTML (or advertise itself as unsafe), after which additional transformations can be made.
#16
hmm... based upon modular support... i think it's the filter order that is important for your problem here...
what if i want to use
[]and<>in my opinion... HTML filter should be the only one that filters html
(please give an example of html filtering that can't be done with the html filter)
#17
I totally agree with Steven. Additionally, HTML is just one language - and one of our goals is to support different output languages. See also http://groups.drupal.org/node/15996 and http://groups.drupal.org/node/15979
#18
they would run before the html-filter, giving escaped html (not
<a>but<a>) to the html filter which will apprear in the source code, displaying<a>on the screen.#19
@alexanderpas you totally missed the point. This example shows that it is already unnecessary to require the HTML filter in that case.
However, if you factor in advanced features (like a safe YouTube embed), this means that you are not just asking people to enable the HTML filter unnecessarily, but also to add unsafe tags like <object> and <embed> to the allowed tag list in order to make existing safe features work. And even more, in this situation, the HTML filter has to be placed later in the filter chain to not fuck things up, when in normal situations, the HTML filter should go first.
So you're requiring an unconventional and very specific configuration in situations that are already safe today. This is adding confusion to a part of Drupal that already confuses most people and in fact promotes unsafe behaviour.
Not to mention that this example demonstrates that the proposed patch fails to catch unsafe situations in the first place, like adding object and embed tags to the HTML filter.
#20
you're totally right about that...
now i'm wondering.... as a usecase... how to allow youtube
<object>tags, and disallow other<object>tags, allow bbcode including[code], disallow html except<code>I can only think of a three-step process....
CLAIM - SANITIZE - OUTPUT
CLAIM:
htmlfilter - claims
<code>tagsyoutube - claims
<object>tags that are for youtubebbcode - claims
[code]tagsSANITIZE:
youtube - ensures youtube objects follow specs.
htmlfilter - removes unwanted HTML (doesn't see youtube HTML and
[code], has already identified<code>blocks.)OUTPUT:
bbcode - modifies BBcode into HTML
in this model, htmlfilter won't see the youtube code, or the bbcode
[code], it only sees a placeholder or something....also, i think this fixes issues with filter-order.
#21
The following should be a good addition to this patch. Right now one can add script, iframe or object to the list of allowed tags and sleep happily -- and the filter's help string: "Allows you to restrict the HTML tags the user can use. It will also remove harmful content such as JavaScript events, JavaScript URLs and CSS styles from those tags that are not removed." doesn't mention that some tags should never be added. At least there should be a warning or a confirmation.
The HTML filter could actually refuse to run if given some tags -- I believe it doesn't make sense to filter if someone allows script, style, iframe, frame, meta, link, object, embed, applet, layer etc.
Maybe there could be also a confirmation/warning for anything with src or similar attribute (like img, bgsound, input), which is usually not exploitable by itself, but makes CSRF easier (and CSRF vulnerabilities are not so rare even in core: DRUPAL-SA-2006-025, DRUPAL-SA-2006-025, DRUPAL-SA-2007-017, DRUPAL-SA-2008-005, DRUPAL-SA-2008-044, DRUPAL-SA-2008-047). It's too late here, does that make sense? :)
#22
Well, too bad that Textile and Markdown do not generate safe HTML.
The Markdown filter doesn't even escape HTML. Used on it's own,
<script>whatever()</script>would do fine. Textile would need==<script>whatever()</script>==
For Textile after the HTML filter use:
!http://example.com(x"onerror="javascript:alert('xss')")!For Markdown after the HTML filter (src="javascript:" works fine in Opera 9.64) use:
)Barry Jaspan proposed a taint flag (or a set of taint flags) system.
Example filter orders:
Default start: sets taint flag
HTML Filter : unsets taint flag
-> Input format is safe
Default start: sets taint flag
HTML Filter : unsets taint flag
Textile : sets taint flag
-> Input format is unsafe
Default start: sets taint flag
Textile : sets taint flag
HTML Filter : unsets taint flag
-> Input format is safe
Default start: sets taint flag
Ideal Textile: unsets taint flag
-> Input format is safe
Default start : sets taint flag
HTML Filter : unsets taint flag
YouTube : -
-> Input format is safe
Obviously, we'd need more states to declare the PHP filter and other formats whose unsafeness doesn't hinge on HTML as Unsafe. It also relies on filter authors to properly mark their filters, but they are better suited to do that than end users.
#23
I'm marking this as a release blocker. There's not one week where a blog on Planet Drupal (!) doesn't have its filters misconfigured. If those people cannot figure it out, who can?
#24
alexanderpas:
There is currently at least one reason for making "Full HTML" the default; and that is blogapi module, which always uses the default format. Although perhaps that should be addressed elsewhere.
http://drupal.org/project/wysiwyg_filter does what the HTML filter cannot; it filters HTML attributes and style properties.
#25
#26
I hope this issue maintains its "release blocker" status -- as a minor API change but a major security improvement, it seems like a no-brainer to me.
The attached patch at least gets things started before code freeze, with a rough proof-of-concept implementation of the idea in #22. It successfully labels 'Full HTML' as unsafe and 'Filtered HTML' as safe, but switches 'Filtered HTML' to unsafe if you include a dangerous tag on the list of allowed tags for the HTML filter (currently just <object> or <embed>, but obviously that list can be expanded later), etc.
Obvious things that still need work:
#27
Yeah, I guess it's an honor that Steven came out of retirement for just long enough to call the patch stupid. Will review.
#28
Patch looks great.
I think we should expand it to handle the PHP filter, and do your UI 'option 1': Alert the admin through text on the regular permissions page. This combined with #11218 will bring a big increase in both security and convenience with input formats.
The other option with PHP filter (which I endorse 100% but not sure what's been said in past discussions) is to retire it from core. If people want to do crazy things with their site, they should at least have to go to contrib.
#29
Thanks for looking over the patch!
It seems to me that regardless of whether the PHP module is in core on contrib, we have to design something here that works with it. One possible option is in the attached patch. What I did there is make it so that after the filter module makes a determination about the security of the text format, it invokes a hook that gives other modules a chance to overturn that. The PHP module then labels any format insecure if it finds the PHP filter in it. That seems to work.
One thing I'm wondering a bit about this patch is where do we draw the line between secure and insecure? For example, is the <img> tag sufficient to label a text format insecure? The constants in the code refer to XSS only, but because it's currently implemented as a boolean, it's actually a broader security determination than that at the moment. I'm not sure we want to get into more complicated states than "secure" or "insecure", though - I do like the way the patch is pretty simple right now. Otherwise, it seems to me like this is getting close.
#30
Nifty.
* lets give more info to the poor admin who gets insecure msg
* filter_format_is_insecure() - can we do filter_format_is_secure() and try name things positively? lets only do this if it doesn't make the code awkward.
* needs a test
#31
The last submitted patch failed testing.
#32
Thank you for taking a first shot at this.
<script>is not a tag you would like to see in the allowed tags list (yes, I've seen cases where it was added).1. We need to get more information to the admin.
- Perhaps allow modules to add error messages to the flag: eg (don't pin me down on the texts).
$flags['html'][] = t('The tag %tag was added to the HTML filter. This tag, blah blah blah');
or
$flags['html'][] = t('The markdown filter does not ensure the output is safe. Add the HTML filter after this filter');
Then a properly configured HTML filter would do $flags['html'] = FALSE or $flags['html'] = array().
2. We need to cater to other filters as well, such as PHP
eg $flags['php'][] = t('The PHP filter allows users to execute PHP code and do X, Y, and Z to your site.);
#33
This should address #32 and most of #30. Also, added a check for $_POST to prevent the messages being set while the form is being built before submit (ugly, any better way to do this?).
Still needed: tests.
#34
Cool! I think the addition of the $errors array and separating out into 'html' and 'php' is definitely an improvement (it does mean that someone could write a filter that claims to "sanitize" already-executed PHP code, but hopefully they wouldn't do that).
I'll try to look at this a bit more later (and hopefully have a chance to write a test or two), but one thing that comes to mind is that I'm not sure we should be mixing the $errors array with the determination of whether or not the format is secure. For example, this code in the latest patch:
+ // If the output is already safe HTML, then allowing dangerous tags here+ // will do no harm. Otherwise we want to generate a useful warning.
+ if ($tags && !empty($errors['html'])) {
+ $errors['html'][] = t('Allowing these tags is dangerous: %tags', array('%tags' => implode(', ', $tags)));
+ }
It seems a little awkward, and difficult for filter module authors to understand how to use correctly. I think we want to make this as simple as possible for them, to make sure they get it right.
An alternative might be to still pass in the $errors array and let them add messages to it when they want to, but also have them return a simple statement, more like in my original patch. For example, something like:
<?phpreturn array('html' => FILTER_EXPLOIT_PREVENTS);
?>
or
<?phpreturn array('php' => FILTER_EXPLOIT_ALLOWS);
?>
The filter_format_is_secure() function could then be in charge of clearing data from the $errors array when appropriate, and module authors wouldn't have to worry about it.
Also, about the user interface, the only reason I added any kind of user interface at all to the original patch was so that there would be something simple to test it out with. However, once we have tests, we won't really need that anymore :) I actually think we should make this initial patch just about getting the API right (which can hopefully be finished before the code freeze), and then figure out exactly how and where to present this in the user interface as part of a followup. As mentioned above, I hope that #11218: Allow default text formats per role, and integrate text format permissions will be a part of that.
#35
Here's a new patch:
I think this might be getting pretty close.
We should maybe think about whether it is better to use a whitelist than a blacklist, when determining which tags are unsafe in the HTML filter...?
#36
This patch does not have to be ready today.
#37
Slightly improved patch - rather than working around a static caching bug in the tests, it's better to actually fix the bug :)
Also some small improvements to the API documentation.
#38
Definitely agreed.
But I figured it would be useful to at least have it at a pretty far along state.... especially because of its relevance for #11218: Allow default text formats per role, and integrate text format permissions.
#39
Sub.
#40
This test confused me at first:
<?php+ // Create a new format that is the same as the previous one, but escapes
+ // all HTML first; the text format should be secure.
+ unset($format->format);
+ $format->name = $this->randomName();
+ $format->roles = array();
+ $format->filters = array(
+ 'filter_html_escape' => 1,
+ 'filter_html' => 1,
+ );
?>
Now I think I understand. filter_html_escape sets html => PREVENTS_EXPLOITS. filter_html, even though configured to allow an unsafe tag, can assume that if someone previous asserts html => PREVENTS_EXPLOITS that the dangerous tag must not be present anymore, and therefore it does not have to say it allows exploits due to the dangerous tag. If this logic is not documented somewhere clearly, please add it. :-)
I think the tests would be more clear (and more safe against re-order bugs) if they set $format->settings explicitly for this case instead of inheriting the settings from the previous case. This is particularly true because I notice that you reset $format->roles after some of the cases, making me wonder if something is changing it behind the scenes that requires it to be reset. If that is possible, why is it not possible for $format->settings?
In fact, taking the previous paragraph to its logical conclusion, perhaps you should break testFormatSecurity() into separate functions for each case. Then there is no possibility of re-order bugs. The only downside is the tests will take longer to run, but that is what the testbot is for.
#41
Here is an updated patch:
Regarding the confusing test case, that is indeed inherently confusing, but it's really important we explain it well. The patch does already contain a fair amount of documentation, but any suggestions on improving it would be welcome.
The approach I took so far was to make it so that each filter is not at all responsible for thinking about what filters actually ran before them, but rather only imagining the worst case scenarios of content they might receive. The message we're really trying to communicate is something along the lines of the following: "Imagine completely sanitized text is passed in to your filter. If your filter is capable of creating a security flaw out of that, that is the only time you should return FILTER_EXPLOIT_ALLOWS."
Possible ways we can improve the message:
#42
+++ modules/filter/filter.api.php 10 Sep 2009 01:39:29 -0000@@ -122,6 +134,56 @@
+ * way. The warning messages should be keyed by the type of output in which
+ * the potential vulnerability exists. For example, use $warnings['html'][]
+ * for an exploit that occurs when content that passes through the text
+ * format is output as HTML (e.g., cross-site scripting exploits). This is
+ * the most common type of output, but others are possible; for example, the
+ * PHP module uses $warnings['php'][] to indicate that the PHP filter
+ * contains an inherent vulnerability due to the fact that content which
+ * passes through it is executed as PHP code.
We should be more precise about the possible keys here. Are html and php the only common? Would xss be valid, or would that be js? Do those keys refer to programming languages, maybe?
+++ modules/filter/filter.module 10 Sep 2009 01:39:29 -0000@@ -15,6 +15,18 @@
/**
+ * Status which indicates that a filter prevents security exploits. This should
+ * be used by filters that sanitize previously-unsafe content.
+ */
+define('FILTER_EXPLOIT_PREVENTS', 1);
+
+/**
+ * Status which indicates that a filter allows a new security exploit, even if
+ * the content passed in to the filter was previously sanitized.
+ */
+define('FILTER_EXPLOIT_ALLOWS', 2);
I don't see the point for those statics. Subtracting 1 and we get TRUE and FALSE.
If we really, really need defines, then we need better names. But I currently don't see the point in polluting defines with those.
+++ modules/filter/filter.module 10 Sep 2009 01:39:29 -0000@@ -188,8 +200,14 @@ function filter_format_save($format) {
foreach ($filters as $name => $status) {
$fields = array();
- // Add new filters to the bottom.
- $fields['weight'] = isset($current[$name]->weight) ? $current[$name]->weight : 10;
+ // If a specific weight was requested, use that.
+ if (isset($format->weights[$name])) {
+ $fields['weight'] = $format->weights[$name];
+ }
+ // Otherwise add new filters to the bottom.
+ else {
+ $fields['weight'] = isset($current[$name]->weight) ? $current[$name]->weight : 10;
+ }
$fields['status'] = $status;
$format->weights[] doesn't work out. Weight is stored per filter, so it needs to be retrieved/stored in $filter['weight'].
Apparently, we only have $status now, and turning that into $filter is done in #570930: Allow to retrieve all filters (including disabled) in a text format. Hence, we should get that patch in first.
+++ modules/filter/filter.module 10 Sep 2009 01:39:29 -0000@@ -516,6 +535,82 @@ function filter_list_format($format) {
/**
+ * Determines if a text format is configured securely.
+ *
+ * @param $format
+ * The ID of the format to check.
+ * @param $warnings
+ * An array which will be filled with warning messages if the format is
+ * insecure, explaining the reasons why it is insecure. The keys of the
+ * array represent types of output in which a security vulnerability exists,
+ * and the values are arrays of messages describing the vulnerability for
+ * that output type. Possible keys include 'html' (if content that passes
+ * through the text format will be insecure when output as HTML) and 'php'
+ * (if content that passes through the text format will be executed as PHP
+ * code).
+ * @return
+ * TRUE if the format is configured securely, FALSE otherwise.
+ */
+function filter_format_is_secure($format, &$warnings) {
+++ modules/filter/filter.test 10 Sep 2009 01:39:29 -0000
@@ -765,9 +796,83 @@ class FilterUnitTestCase extends DrupalW
+ $this->assertFalse(filter_format_is_secure($format->format, $warnings), t('A text format with no filters is labeled insecure.'));
The &$warnings argument doesn't make sense here. We only check one format at a time, and filters may only alter $warnings within the same format.
Because $warnings is not initialized in the test, the testbot will most likely fail with exceptions.
This review is powered by Dreditor.
#43
The last submitted patch failed testing.
#44
Here's a new patch updated for #570930: Allow to retrieve all filters (including disabled) in a text format (which also therefore addresses one of @sun's comments above).
Also:
I agree this is still confusing, and I'm starting to wonder if replacing 'html' with 'xss' might be worth thinking about...
Part of the confusion is that we only have these two examples, and both are somewhat different:
We might be able to improve this by coming up with other, specific examples that we can mention in the documentation... but I'm having trouble coming up with any good ones.
For example, I can certainly imagine other output languages besides HTML, but Drupal is used to build websites in HTML, so the others seem like they are not only rare, but also likely to lead to false positives if people start using them (i.e., I don't care if a format is insecure for a language besides HTML that I never intend to use it for).... so we'd have to start thinking about allowing a $types parameter to filter_format_is_secure() or something like that. Overall doesn't sound like a great road to go down.
For examples similar to PHP, it's possible to think of some (the format can run the text through whatever local program on the operating system it wants to, I suppose), but again, pretty rare.
Are there any good, useful examples that I'm overlooking?
I think we should leave them as defines, because
return array('html' => TRUE)seems to me like it would be easy to get backwards. This is already a little tricky for people to use correctly, so let's make it as clear as possible, IMO.Are you saying the &$warnings argument to filter_format_is_secure() doesn't make sense, or the &$warnings argument in the individual filter security callback functions?
I think the $warnings argument to filter_format_is_secure() is OK (and not sure I understand the comment above about one format at a time), but passing in an array by reference to the filter security callbacks does seem like a relic of an earlier version of the patch. Currently, we never pass anything but an empty array into those (since the idea is that each filter is not supposed to know, care, or be able to alter anything about the exact filters that ran before it).
I suppose we could get rid of that parameter and have the callback functions return something like this:
<?phpreturn array(
'status' => array('html' => FILTER_EXPLOIT_ALLOWS),
'warnings' => t('Text of the warning.'),
);
?>
Not sure if we should.
It doesn't, though :) I guess PHP is OK with passing undefined variables in by reference (which kind of makes some sense)...
#45
By the way, as mentioned above, we still need to deal with this line of the patch, which clearly needs some help:
<?php$dangerous_tags = array('iframe', 'object', 'embed', 'script', 'style'); // Obviously we can expand this later!
?>
If we continue to go with a blacklist, I discovered that the WYSIWYG Filter module seems to maintain one:
<?phpfunction wysiwyg_filter_get_elements_blacklist() {
return array(
'applet',
'area',
'base',
'basefont',
'body',
'button',
'embed',
'form',
'frame',
'frameset',
'head',
'html',
'iframe',
'input',
'isindex',
'label',
'link',
'map',
'meta',
'noframes',
'noscript',
'object',
'optgroup',
'option',
'param',
'script',
'select',
'style',
'textarea',
'title',
);
}
?>
I'm not sure if all of those are really necessary for security, but it might be a good list to start with. (Also, it seems that <img> is not on that list, which is kind of on the border of being considered a security risk, so not sure if we should include it.)
Obviously, whitelists are usually a better approach then blacklists, but the problem if we go with that is that we are then going to start getting false positive security warnings if someone configures the HTML filter to allow a tag that doesn't exist, or some obscure browser-specific tag that we didn't think of, etc.
So, I'm still not sure what to do about this issue either...
#46
The last submitted patch failed testing.
#47
To keep this moving, here's an updated patch to reflect the fact that #11218: Allow default text formats per role, and integrate text format permissions was committed.
Although I still think we should defer most user interface changes to a followup issue, this patch does make the one (obvious) change to user interface that we can now do as a result of the other commit.
See the attached screenshot of the permission page after the attached patch is applied - that looks very, very, very pretty, I think :)
#48
Adding a useful tag I just discovered...
#49
+++ modules/filter/filter.api.php 26 Sep 2009 21:25:34 -0000@@ -122,6 +134,56 @@
+ * warning messages in one of two cases: (1) If the filter opens up a new
+ * exploit in a previously safe text format, or (2) If the filter could have
(1)... (2)... This should be a
-list.+++ modules/filter/filter.api.php 26 Sep 2009 21:25:34 -0000@@ -122,6 +134,56 @@
+ * sanitized. For example, a filter that evaluates PHP code would return
+ * array('php' => FILTER_EXPLOIT_ALLOWS). If the security callback function
On a separate line with @code + return + @endcode tags, please.
+++ modules/filter/filter.module 26 Sep 2009 21:25:34 -0000@@ -7,6 +7,18 @@
+ * Status which indicates that a filter prevents security exploits. This should
+ * be used by filters that sanitize previously-unsafe content.
+ */
+define('FILTER_EXPLOIT_PREVENTS', 1);
PHPDoc for those constants...
"Filter security callback return value indicating possible security exploits.
This should..."
applying to both. (summary on one line)
+++ modules/filter/filter.module 26 Sep 2009 21:25:34 -0000@@ -648,6 +669,82 @@ function filter_list_format($format, $in
+function filter_format_is_secure($format, &$warnings = array()) {
+ // Start with a blank slate of warning messages.
+ $warnings = array();
1) $format should be a fully populated $format object, not a format id.
2) $warnings is still by reference, but taking $warnings by reference makes only sense for 'security callbacks', not for this function. If this is supposed to make any sense and I just don't get it by looking at this patch, then please update the PHPDoc description for this argument, explicitly stating that it is passed by reference and _why_. :)
And, if it should make any sense, then you also have to remove that re-initialization. ;)
I'm on crack. Are you, too?
#50
.
#51
<?phpforeach ($format_status as $status) {
if ($status != FILTER_EXPLOIT_PREVENTS) {
return FALSE;
}
}
return TRUE;
?>
Maybe
$return = array_flip(form_status); unset($return[FILTER_EXPLOIT_PREVENTS]); return !$return;this will only return TRUE if $return becomes empty -- ie if it only contained FILTER_EXPOLIT_PREVENTS values before we removed that. Just an idea.#52
re #47: The security warning on the permissions page may be pretty, but I almost couldn't see it despite knowing what I was looking for.
#53
Yes, it would be nice if all of these warnings were red or something (rather than themed as a normal placeholder), but we can't solve that issue here.
To enable that to easily happen, the patch to review is #248598-55: Label permissions which are warned about in the user interface.
#54
The last submitted patch failed testing.
#55
I made Sun's changes, except I didn't understand the part about PHPDoc for the constant definitions.
filter_format_is_secure seems correct in using filter ID (consistent with filter_list_format)
I'm also confused about $warnings in filter_format_is_secure. I can't see where these warnings are being used.
Regarding Chx's suggestion, that may be more efficient, but it's much less readable, and I don't think the tradeoff is worth it in this case.
#56
Cool! I've been meaning to get back to this (and hope to soon) but have been swamped with other things. I really hope I can get back to this in the next day or so. It would be very good to push this towards an initial commit by October 15 so we don't have to try to argue for a "code slush exception" or something...
I think I finally realize why people are confused about the optional $warnings parameter to filter_format_is_secure()... the fact is, it isn't used anywhere in this patch, so perhaps that's why it looks unnecessary :) But, the idea is that we are really going to need it at some point (see Heine's comment in #32). The goal here is to get this patch in with only the minimal UI changes that it currently contains, but then have until November 15 to really figure out what the best UI is - whatever it's going to be, though, somewhere we're going to want to present the site admin with a list saying "here are the reasons why your text format isn't configured securely". So, since we still want to have a boolean returned by filter_format_is_secure(), but occasionally need to get more info out of it, I did that by having an optional array that you pass in by reference. Suggestions for a better approach are welcome :)
Also, regarding @chx's suggestion, perhaps another thing we could do there is:
<?phpreturn array_search(FILTER_EXPLOIT_ALLOWS, $format_status) === FALSE;
?>
I think that would be OK, and would certainly be readable code. Maybe it protects a bit less against buggy hook implementations than the other approaches did, but not sure that's really a big concern.
#57
The last submitted patch failed testing.
#58
D7UX stuff ate up all my time prior to October 15, unfortunately, so I'm only getting back to this now. However, we weren't really on an October 15 deadline anyway :) This issue is a release blocker, and the number of modules that would be affected by this API change is small enough that they could (and in fact probably should) be contacted by hand if this patch goes in.
I think the attached patch addresses all remaining feedback, such as in #40, #42, #49, #56, etc. I also clarified and expanded a lot of the API documentation.
The remaining TODOs are:
Anyone up for a review and possibly RTBC?
#59
Hm, I hope I don't have to go around begging for patch reviews :) I think this is pretty close, and it would be good to get it in as soon as possible so as to leave time to work on a good user interface.
I've actually gone ahead and (preemptively) created a followup issue at #618902: Design and implement a user interface for warning about insecure text formats so anyone who is interested can start thinking about how best to communicate this security information in the user interface.
#60
+ return array_search(FILTER_EXPLOIT_ALLOWS, $format_security_status) === FALSE;should use in_array() I'd think. Also, you shoudl pass the 3rd param (strict comparisons) as TRUE.
Also, I'm not sure the logic is correct - for HTML a later filter can counter-act an earlier file. But a filter that executes PHP or perl code cannot really ever be counteracted, I'd imagine - at least not by a filter that runs after. I guess it's ok in that no one would be likely to define such a thing, but seems like exec() vs. markup are fundamenetally different?
#61
For the case of HTML, are 'allows' and 'prevents' accurate as categories? I seems like 'creates' might be more accurate - e.g. your hypothetical filter that transforms [script] into <script> creates new insecure content.
I find the signature of
+function filter_format_is_secure($format_id, &$warnings = array()) {a bit odd - why not rename it something like
function filter_format_warnings($format_id) {and have it return the array of warning, rather than the pass-by-reference. If the array of warnings come back empty... then you don't need to display any.
#62
should the warnings also be displayed in the overview table at ?q=admin/config/content/formats ?
#63
I think the approach could be simplified into 2 categories- 'exec; and 'html'.
Only for html-type filters can a later filter affect the easlier filter, so there is really no point in walking though the exec ones.
Also, it would probably be simpler in terms of the loop logic to just get all the items in a filter and then pop them off as long as they are ones that create/allow unsafe html.
#64
Thanks for the review.
Earlier patches in the issue addressed this by having the PHP filter use a different process for labeling a text format insecure - but ultimately it seems cleanest if all filters use the same API. I think, as you said, that it doesn't matter too much - ultimately, this entire mechanism relies on filter module authors labeling their filters properly, so any filter that tried to identify itself as "sanitizing" PHP would not be materially different from a filter that claimed to sanitize HTML when in fact it did not actually do that. Both are big mistakes.
I agree. At least from the point of view of the filter (rather than the format), "create" makes more sense - and that is the most important place to get the terminology right since it is module-facing. I suggested FILTER_EXPLOIT_CREATES at one point above but no one responded so I didn't change it. I'll count your comment as a vote of confidence :)
Hm, at first I thought this wouldn't work because from the point of view of the filter module author, the warnings should be optional - we need to make sure to always respect the return value of their security callback regardless of whether they add a warning or not.
However, I guess we have enough information to catch that case and use a default warning along the lines "The %filter filter lets people do bad things to your site", in which case it seems like this plan would work and would be a better API. We could still keep filter_format_is_secure() around as a simple wrapper function to filter_format_warnings(), of course.
Maybe yes, maybe no. The place to decide is #618902: Design and implement a user interface for warning about insecure text formats :)
I'll try to reroll the patch sometime later today with the above changes.
#65
Looks like #63 and #64 were cross-posted. I'd have to think about the 'exec' thing. Theoretically, a filter could have two different types of "execs" so ideally we'd want to continue collecting warning messages from all filters even in that case. It may make sense to put all such examples under the same 'exec' key though (rather than separate ones for 'php', 'perl', etc). On the other hand, that breaks the "rule" that the key is supposed to correspond to some kind of processing language.
Overall, we probably don't want to add too much special-case code for this, since it's a very rare use case. Almost everyone will be using 'html' here. (Are there any known examples besides the core PHP module that would even use something else?)
#66
here's an incomplete patch, but look at my changes to filter.module - I think that code can be greatly simplified to achieve the desired result.
#67
+ do {+ $security = array_pop($format_security_status['html']);
+ if (!empty($security['warning'])) {
+ $warnings[] = $security['warning'];
+ }
+ } while (!empty($format_security_status['html']) && $security['type'] != FILTER_EXPLOIT_PREVENTS);
This part won't work - we need to make sure that no warnings get returned except for a format that actually winds up as FILTER_EXPLOIT_CREATES in the end. (For example, the HTML filter does not actually create an exploit, but it still returns a warning in the case where it is configured to not sanitize completely. However, this warning is not relevant if a previous filter sanitized the text already.) We actually already have a test case for that in the code - see
testHTMLFilterUnsafeTagSanitizeBefore().Also, the "No filter has been added to sanitize raw HTML" message isn't supposed to show up unless it's the only one...
But in terms of the 'exec' stuff, it makes sense somewhat - the question is whether we want to make it so that 'html' and 'exec' are the only possible keys that people can use (is there anything we might be missing there?) vs the previous code which worked more generically for any keys that anyone might use, but did not enforce the fact that the PHP filter is impossible to sanitize.
#68
@David - well, we cold do some slightly more complex processing here - distinguishing between and allow and create.
#69
didn't fix up the tests yet, but I think the filter.module part works in this patch. Note revised logic in the array_pop() loop. Simplified the return to jsut be an array of warnings - really don't see why we need to separate out by type.
#70
ok, got the escape-all filter set right also.
setting to CNR is see where tests break.
#71
The last submitted patch failed testing.
#72
better - fixed some tests, remove debug code.
#73
The last submitted patch failed testing.
#74
For testHTMLFilterUnsafeTag(), we definitely want only one warning to be returned - so the fact that the test in the latest patch was changed to check for two warnings and passes means there is a bug in both the code and the test :)
I'm not really seeing the advantage of 'exec' at this point? I think it's starting to make the code more complex, and the API is now less generically useful; e.g., you could previously use these functions to get information about any "output type" you could dream up and it all worked the same, but now the function is assuming internally that it knows about all of them. As mentioned above, the fact that PHP is impossible to sanitize is not only an edge case but also not an actual problem with the previous approach -- it simply means that no filter should ever claim to sanitize it.
Some of the other changes look good, though, and we should keep those.
#75
Since we are not using the warnings right now - does it matter if it's 1 or 2? It's 2 now since we leave the default warning about unfiltered html.
I see more value in honing the code to the minimum feasible - why make it more complex than the 99% use case requires?
#76
consolidated the code even more - did remove the fixed html/exec categories. Code should again find any categories, but consolidates all warnings into a single return array.
#77
Yeah, this looks better - and I like the idea of having FILTER_EXPLOIT_ALLOWS and FILTER_EXPLOIT_CREATES as separate constants because it forces filter module authors to be explicit and think about the difference.
Will look more at this later - I think it's overall an improvement.
#78
I don't see why you'd ever need a boolean, since you're always just checking whether the returned array is empty or not as a boolean.
I'm really not sure of the use case for examining the warnings by category - other than unit testing. So, at most I think 2 functions.
I thought briefly about the issue of a filter that doesn't include a warning - though we could set some generic string as a warning in all cases.
#79
Here's a quick re-roll implmenting a single wrapper so you can get the warnings by category if desired.