Goal

Allow the default tags in filter_xss_admin to be modified.

Summary from @davereid in #96:

The arguments for making this a variable:

  1. We already have a pattern in core for allowing 'allowed' tags/html to be configured - as a variable that has no UI in core. filter_xss_bad_protocol() calls drupal_strip_dangerous_protocols() which has variable_get('filter_allowed_protocols', array('ftp', 'http', 'https', 'irc', 'mailto', 'news', 'nntp', 'rtsp', 'sftp', 'ssh', 'tel', 'telnet', 'webcal')).
  2. The approach used in #92 uses variable_get('filter_xss_admin_additional_tags', array())) which doesn't allow the existing list of allowed tags to be altered at all, as well as adds a call to array_merge() on each call to filter_xss_admin().
  3. With future versions of HTML standards down the road, site builders and administrators of Drupal 8 and/or 7 will no longer be 'locked' into a certain limit of tags where the only solution is to hack core.

Concerns for this approach:

  1. Performance: We are adding a variable_get() call where there was a static array before. We can use the same approach as drupal_strip_dangerous_protocols() and add a static variable to store the result. We should also do a quick performance test to show how much worse this is without a static variable or if we'll even need it.
  2. filter_xss_admin() is called prior to the database being available in the install process, as well as in error handlers. I don't think this is a huge concern if it may behavior slightly differently prior to variable_initialize() - since we do things like variable_get('lock_inc', 'includes/lock.inc'); prior the variables being initialized as well. This is nothing new.

We still need help deciding which tags should be added as well.

OP

The function only lists html4 elements as allowed ones, so html5 elements should be added.

Here's the function code:

function filter_xss_admin($string) {
  return filter_xss($string, array('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var'));
}
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Heine’s picture

Category: bug » feature

Support for soon to be future standards are feature request in my vocabulary.

Sp.Shut’s picture

Category: feature » bug
Sp.Shut’s picture

Issue tags: +html5, +xss
Damien Tournoud’s picture

Version: 6.x-dev » 8.x-dev
Category: bug » feature

And feature requests can't apply to any frozen Drupal version. Bumping for consideration for Drupal 8.

Sp.Shut’s picture

Version: 8.x-dev » 6.x-dev
Category: feature » bug

I guess Drupal should support exporting html, right?

It's not Drupal's business whether html5 is a future or present standard, it should produce what it's asked to.

Damien Tournoud’s picture

Version: 6.x-dev » 8.x-dev
Category: bug » feature

Please stop changing those selectors.

Drupal 6 produces XHTML 1.0 Transitional, Drupal 7 produces XHTML+RDFa 1.0. HTML5 is still far from being a standard at this point, and even if the lack of HTML5 elements in filter_xss_admin() could (eventually) be considered as a bug, we fix bugs in the most current development version first, before (eventually) backporting the fixes.

sreynen’s picture

"Drupal 6 produces XHTML 1.0 Transitional"

Without weighing in on the wider question of which version this issue should target, the above statement is a common fallacy I'd like to see us all stop repeating, as it makes Drupal look less capable than it actually is.

Drupal itself does not output a DOCTYPE at all. Specific themes output DOCTYPEs, and most D6 themes are XHTML 1, but there are also HTML5 themes for both D6 and D7 that Drupal can handle just fine. We may not be ready to fully support this in core, but we also shouldn't be discouraging it by suggesting it's impossible in Drupal.

Dave Reid’s picture

Let's take the first step and make this list configurable like http://api.drupal.org/api/drupal/includes--common.inc/function/drupal_st....

$allowed_tags = variable_get('filter_allowed_tags', array(...));

This will allow modules and themes to override this to allow HTML5 tags in D7 but doesn't change how the function works.

jensimmons’s picture

subscribe

I haven't fully looked into this yet, but for far it sounds like without this change (one version or another) Drupal *can't* do HTML5. Or can't output HTML5 markup from certain parts, I should say. Like from Views. If this is true, this is a big deal. We don't want there to be anything about Drupal core that BLOCKS people from doing HTML5 markup.

ericduran’s picture

Title: Html5 not allowed in filter_xss_admin() » Change filter_xss_admin() to take a variable by default instead of a hardcoded list.

I'm changing the version to d7 on purpose :) and also changing the title of the issue. Sorry but the html5 in the tittle seems to throw people off on the real issue.

This issue really has very little to do with html5 and/or markup.

Also no reason why this can't go into a point release of d7.

Proposed change is below:

<?php
function filter_xss_admin($string) {
  return filter_xss($string, variable_get('filter_xss_variable', ('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var')));
}
?>
ericduran’s picture

Version: 8.x-dev » 7.x-dev

forgot version change. Please read above as to why I'm changing version back to 7.

ericduran’s picture

Also we might want to do this in the filter_xss function as well.

adrinux’s picture

Status: Active » Needs work

Was planning on working this up as a patch but all git drupal repos currently time out and I can't face CVS. I'll do it tomorrow! Meanwhile here's what I have for filter_admin_xss, compared with eric's suggestion above - need to retain array, it's still and array :) and changed the variable name to filter_xss_admin_tags:

function filter_xss_admin($string) {
  return filter_xss($string, variable_get('filter_xss_admin_tags', array('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var')));
}

Should the variable be called 'filter_xss_admin_allowed_tags' to be completely explicit? Or is that just too verbose?

Tested by adding a $conf['filter_xss_admin_tags'] = in settings.php and adding an HTML5 tag to the array - was then able to use the tag in a Views 3 'rewrite this field' textarea (which gets filtered though filter_xss_admin. So it works, and should allow us to add to the array in html5 tools.

Probably should do the same for filter_xss, although I don't currently see a use case it would at least be consistent.

I'm also wondering what the security team feel about this change.

adrinux’s picture

First patch. as described in previous comment. Still need to add filter_xss allowed tags as a variable.

jensimmons’s picture

Priority: Normal » Major

I'm attempting to build a site in semantic HTML5, and I can't. Drupal 7, Views 3. Can't do it. Drupal strips out <footer>and<time>.

This must get fixed. Otherwise anyone who wants to build a website using HTML5 markup cannot use Drupal.

jensimmons’s picture

Meanwhile I'm hacking core.

For D7 it's in includes/common.inc, line 1293:

function filter_xss_admin($string) {
  return filter_xss($string, array('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var', 'article', 'aside', 'audio', 'canvas', 'command', 'datalist', 'details', 'figcaption', 'figure', 'footer', 'header', 'hgroup', 'keygen', 'link', 'mark', 'meter', 'nav', 'output', 'progress', 'rp', 'rt', 'ruby', 'section', 'source', 'summary', 'time', 'video', 'wbr'));
}

and line 1324:

function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd', 'article', 'aside', 'details', 'figcaption', 'figure', 'footer', 'header', 'hgroup', 'link', 'mark', 'meter', 'nav', 'rp', 'rt', 'ruby', 'section', 'source', 'summary', 'time', 'wbr')) {
jensimmons’s picture

FileSize
2.09 KB

Here's a patch that does what comment 16 describes.

I'm not saying this is a better way to fix Drupal 7 core for Drupal. I'm saying this is the patch you are going to need if you are building a site today and can't wait for Drupal to be changed.

Meanwhile..... let's continue the debate and fix Drupal 7 (and 6) soon, please, so those of us building sites today don't have to keep hacking core. (oh, the kittens!)

benschaaf’s picture

subscribing. Good conversation.

Also, amen @sreynen #7.

Drupal does not dictate the doctype, that is up to the theme. But, by stripping HTML5 elements it is certainly negatively impacting the movement.

adrinux’s picture

Status: Needs work » Needs review
FileSize
1.97 KB

@jensimmons shamed me into finishing this patch for D7, or at least working on it a bit more :)
This second patch expands on #13 adding a variable for the $allowed tags in filter_xss, as well as filter_xss_admin. Please review.

The logic here, for Drupal-7 at least, is that we can now add HTML5 tags via html5_tools module: http://drupal.org/node/1070780
Rather than patching Drupal-7 to accept every HTML5 tag under the sun as per #17.

Also note that this issue doesn't stop you using html5 tags with Views 3 as Jen says in #15, but it does force you to do it via field.tpl files (instead of using the views admin 'rewrite this field' option and tokens).

mfer’s picture

Status: Needs review » Needs work

subscribe

mfer’s picture

Status: Needs work » Needs review
adrinux’s picture

Version: 7.x-dev » 8.x-dev

moving to 8.x in the hope that test bot will test the patch.

adrinux’s picture

Renamed patch after checking that it also applies to D8. Since if will have to be applied there too. Currently this patch will apply to both D7 and D8 code bases.

jensimmons’s picture

Re: comment #18. Yes, there is already a highly-debated conversation that's happened regarding HTML5 support in Drupal 7. Here's my summary:

1) Drupal 7 is XHTML1.0 + RDFa1.0. The default markup that is actively outputted by Drupal core should always and forever be XHTML. We all agree on this. Nothing has been proposed for Drupal 7 (or 6) that would change the markup automatically coming from core.

2) There is debate about putting things in Drupal 7 that would add to APIs to allow contributed modules to actively output HTML5.

Some people think allowing contrib to 'convert' things to HTML5 might result in people building HTML5 websites on a XHTML1.0 doctype without knowing that's what they are doing, and create websites that technically fail the validators. (They will work just fine; they aren't broken — they just don't validate.) There is not agreement about this point, as other people think contrib should be allowed to do things that 'invalidate' websites that have the wrong doctype (and that contrib already does exactly that, see Embed Media Field or WYSIWYG module for two examples of many). We can take some time with this debate, because meanwhile, everything that people want to do can be done in contrib alone (make form elements using HTML5 types is what brought up this debate first). The question is really about deciding when it will be best to revise the APIs in Drupal: sooner to get everyone on the same page and prevent duplication of efforts in 17 different directions, vs later to maintain consistency, prevent dumb anti-validation mistakes… etc. Meanwhile we do all agree that any changes to any D7 API must be an addition only, not a change, and must maintain backwards compatibility for all Drupal 7 modules.

There's a longer debate about this is the forms API issue: #675348: META: Support HTML5 form input elements

but this issue doesn't fall into #2, it's a #3 issue:

3) If and when (now when) we find something in Drupal 7, that outright prevents someone from using HTML5, then we need to alter Drupal 7 to allow the use of HTML5. Not get Drupal 7 to do HTML5 by itself, but allow someone who is using tools to create an HTML5 website to do so. This issue is about changing the security filters so that they don't strip out the new elements.

This explanation might already be obvious to people :) I hope so. But in case it's new to other people, I wanted to try and explain where I think we are at regarding the high-level issues this brings up.

sun’s picture

My only concern is about the additional introduced variables. We have at least one global filter HTML tag list variable already, but it's only used in a few places.

sreynen’s picture

It has to be a variable of some sort to, well, vary. If the concern is making it a persistent variable, exposed in places where it will never be used, drupal_static() seems like a good alternative.

mfer’s picture

Status: Needs review » Needs work

The patch in #23 is bad. filter_xss_admin() makes a call to filter_xss in the form filter_xss($test, $allowed) but the filter_xss signatur is changed to be filter_xss($string). filter_xss() should take two arguments.

adrinux’s picture

Status: Needs work » Needs review
FileSize
1.12 KB

@mfer. damn. sorry. you're right.
I have to say for the specific problem I was trying to solve (allowing html5 tags to be used via the Views UI) my patch in #14 was enough. Just following ericduran's suggestion in #12 per filter_xss.

I've attached a rerolled, renamed #14.

@sun well, the problem we're trying to solve is allowing relevant html5 tags in D7 - would you be happier with hard coded tags as per jensimmons patch in #17 ? Though probably not exactly that array ('audio' and 'video' seem out of place in filter xss to me). It just seems more robust to have the tags array configurable, the way html is going there might well be new tags that need added in a year ;)
Or perhaps a mixed approach - add a variable for filter_xss_admin_tags as per attached patch, but also hard code some acceptable html5 tags into filter_xss?

laura s’s picture

Subscribe with +1 on #24 comments. If we force people to bypass or hack core for HTML5, we could end up hurting Drupal 7's usefulness (and security) for many projects. Re #6: HTML5 is happening now. In terms of allowing predefined html5 tags vs. filter_xss_admin overridable, since HTML5 is still evolving, we can expect the list of tags people will need will be changing quite quickly. A predefined list would potentially get new issues every few weeks.

jensimmons’s picture

Any movement on this?

mfer’s picture

The question that comes to my mind which is stopping movement is, what path forward are we taking in the short term? Are we going to add the additional elements in a hard coded manner (like comment #28) or as a variable which has been discussed?

coltrane’s picture

If there's enough concern about an additional variable then hard-coding the allowed tags is the shortest path. For the D7 though it seems acceptable to me to add an additional variable.

We should expose a UI for setting this since that will require string translation, but without an exposed UI field a user won't be able to revert to core's list of accepted tags. That would imply a mixed approach being best, something like (this untested code):

return filter_xss($string, array_merge(array('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var'), variable_get('filter_xss_admin_tags', array())));

adrinux’s picture

@coltrane "We should expose a UI for setting this" I disagree, this array will likely be modified by contrib modules, the modules should be responsible for reverting it.

In some ways different approaches fit for D8 and D7.
For D8 hard coding is possible, since it will likely have an html5 doctype by default, it becomes more a question of which tags to add. Though as has already been pointed out, making this a variable future proofs it. Modules can add to the array as new tags get implemented by browsers.

But D7 is built as XHTML, which makes adding new HTML5 tags that are effectively invalid in XHTML problematic, to say the least.

The approach is pretty clear: make it a variable; allow contrib modules to adjust the variable. D7 gets to remain largely as is, whilst still allowing contrib modules like HTMl5_tools to introduce new tags.

Without this approach the only other approach is going to every contrib module that uses filter_xss and add HTML5 support :(

coltrane’s picture

Sorry, I mistyped, I meant "we could expose a UI ... but"

Damien Tournoud’s picture

Variables are definitely not for things like managing a list of tags. Let's not abuse the poor variable system, folks :)

sreynen’s picture

Status: Needs review » Needs work

Shouldn't this be using drupal_static() for the static variable?

sun’s picture

Status: Needs work » Needs review

Oy. Starred like 15 minutes at this patch. Not sure about this change in direction.

Missing, centrally defined defaults for variables are definitely a con against using variables. Dynamically re-generating these tag lists on every effin' request, and even more importantly, having no definite list in place (that's enforced and might be [version-]controlled), as well as the possibility of poor modules and themes altering these crucial lists without notice under the hood, worries me.

Dave Reid’s picture

We've had the filter_allowed_protocols variable for quite a while and I think it's worked out well, I'm not sure why it's suddenly not going to work here?

sun’s picture

Status: Needs review » Needs work
+  return filter_xss($string, filter_xss_allowed_tags('admin'));

I guess I'd be happier if we'd change that to:

+  return filter_xss($string, variable_get('filter_xss_allowed_tags_admin', filter_xss_allowed_tags('admin')));

Thoughts?

coderintherye’s picture

Should this be split off into two issues? Because I think a couple of us would really appreciate the short-term solution of adding the hard-coded tags in a near-term point release (hopefully 7.1) while the longer-term issue is fleshed out.

Today is the first day where I've had to hack core in a long time and it doesn't feel pleasant.

coltrane’s picture

@sun, I'm in favor of return filter_xss($string, variable_get('filter_xss_allowed_tags_admin', filter_xss_allowed_tags('admin')));

bleen’s picture

@nowarniglabel: I dont think that the hard-coded list would (or should IMO) ever get committed. We'd just have to revisit this again and again as the HTML5 spec continues to evolve as well as other specs out there. No reason why Drupal cant output MATHML

@sun: Wont that variable_get end up creating potential conflicts between some modules/themes using the drupal_alter to affect this list and some using the variable? I guess I'm missing the virtue of adding this list as a variable.

Also, I dont think we should forget about #36...

sun’s picture

I'm not biased into either direction. One potential issue that just crossed my mind: Isn't filter_xss*() supposed to work before the module system is available?

sreynen’s picture

Though it's mostly called from within modules, filter_xss is also called from a few places before modules are loaded. For example, it's called indirectly from st(), which runs in place of t() during install, and it's called from _drupal_error_handler(), which could be called during bootstrap before modules are loaded, and also before _drupal_bootstrap_variables() is called, causing variable_get() to return the default even if a different value has been set.

So making it a variable would still leave a few places where it couldn't be altered, but those seem to be strings that I wouldn't expect to include additional tags anyway, so that doesn't seem like a problem.

Lars Toomre’s picture

Yuck... Hearing that the filter_xss() function and a variable_get() solution might produce different results depending on where it is called in the boot process makes this solution a no go for me. Big -1. We need to have a solution that produces consistent results all of the time, especially since we are doing a simple task of checking if a string is in a list of strings.

This is another example of a configuration variable that needs to be set "outside" of the data base system. Until the D8 Configuration/Deployment team comes up with their solution for these type of initialization variables, I think this list of strings will "temporarily" need to be set in settings.php for consistent operation through the boot process and full mode if we are to pursue the proposed solution.

An alternative way perhaps of coding this is to use a 'fixed' defined list of acceptable filter terms until some defined point (like full bootstrap has been completed). Then, after that switch over an administrative variable retrieved via variable_get() and set in a static variable for the remainder of the page generation process. I am very curious about what variables are checked with filter_xss() before full boot is achieved.

bleen’s picture

Correct me if I am wrong but sun's comment about filter_xss_admin() being called before the module system is available only precludes us from using the drupal_alter solution proposed in #35. If we use the variable_get based solution in proposed in #28 then this list of acceptable tags could be easily overridden by another module or by using strongarm or even by editing settings.php. What am I missing?

sreynen’s picture

bleen18, the variable_get() solution in #28 allows modules to alter the list earlier in bootstrap, because the alterations are saved to the database, but there are still calls to filter_xss that happen before the database is loaded, both in early bootstrap and in install, so modules would not be able to alter the filter list in those instances.

It would be nice to see a specific use case for where this might be a problem. Since the database hasn't loaded yet, the strings involved are all either hard-coded or direct user input, and I haven't found an example of user input yet. It seems like we can be reasonably confident these strings don't need additional allowed tags. These pre-database strings are currently filtered with an even more limited tag list. Has anyone ever had a need to alter filters for pre-database strings? No one has mentioned that need in this thread yet. I'd hate to see us waste a bunch of time and complicate the code in an effort to solve an entirely hypothetical problem.

Lars Toomre’s picture

Is there any way of identifying the calls to filter_xss() prior to full boot strap is achieved? And what calls to filter_xss() are made during the install process? I also am now wondering if there also might be filter_xss() issues in update.php or drush before they get to full boot status as well.

bleen’s picture

sreynen, but couldnt I simply put this in mymodule.install:

  function mymodule_install(){
     variable_set('filter_xss_allowed_tags_admin', array('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var', 'article', 'aside', 'audio', 'canvas', 'command', 'datalist', 'details', 'figcaption', 'figure', 'footer', 'header', 'hgroup', 'keygen', 'link', 'mark', 'meter', 'nav', 'output', 'progress', 'rp', 'rt', 'ruby', 'section', 'source', 'summary', 'time', 'video', 'wbr'));
  }

Now its "permanently" altered for all future bootstraps of drupal. Right?

sreynen’s picture

bleen18, that's permanent in the database, but the database value isn't always used. When variable_get() is called before the database is loaded, as it does during install and early bootstrap, it returns the passed in value rather than what's in the database.

Lars Toomre, every function page on api.drupal.org has a list of the functions that call it, so you can trace it all back to the source. But there's a ton of calls to filter_xss, so there's a ton of tracing to do.

bleen’s picture

sreynen: can we confirm that filter_xss_admin() is called before the db is available. I'm confident that sun is correct about it being used before the module system is up and running, but Im not so sure about the DB...

sreynen’s picture

filter_xss_admin() is called by _drupal_error_handler_real(), which is called by _drupal_error_handler(), which is defined as the error handlers in _drupal_bootstrap_configuration(), which is the first bootstrap phase, 2 before _drupal_bootstrap_database(), in which the database is loaded, and 3 before _drupal_bootstrap_variables(), where the variables are loaded. So any errors that happen during the first 3 bootstrap phases will call filter_xss_admin() using the passed filter list rather than the list saved to the database.

There are likely other cases, harder to trace. For example, filter_xss_admin() is also called form theme_form_element_label(), which seems likely to be called during install.

Lars Toomre’s picture

Thanks for that investigation @sreynen.

Here is what I think this function needs to do:

1. Define a fixed list of permissible HTML tags that are OK during install, partial bootstrap and perhaps other light-weight processes.
2. Allow the site administrator to override the permissible list as soon as available, through possibly:
2a. - Modules / Themes / Distributions
2b. - Settings.php (and whatever comes from the D8 variable initiative)
2c. - Variables DB
2d. - UI??

Some thoughts:
A. This needs to be efficient because xss filter is called often in a page generation.
B. Should there be two variables: Flag (use default list even if replacement available) & list of replacement terms?
C. How would a HTML5 theme add its new tags? Would it override what already was user defined?
D. Should this be implemented with its own CRUD API in D8 to do 'version control' as sun suggested above?
E. I suspect many sites will not want to touch this list except through possibly turning on a new theme. Since there is likely to be much experimentation around HTML5 themes in coming months/years, this incremental add/delete of the permissible list is going to bite administrators much more. Having incremental add/delete functions might be very useful for when themes are enabled/disabled.
F. Out of curiosity, do we identify with a variable anywhere that a theme for Drupal Dx uses HTML5 or HTML4? Probably should if not.
G. The video/media comment above about possible inclusion/exclusion suggests that a site administrator may want to alter the resulting list to exclude certain terms. This argues for a UI (which probably should be very buried).
H. As this list of permissible HTML tags gets longer and longer, perhaps it makes sense to attack the xss problem with a list of HTML tags to always exclude?

jensimmons’s picture

Should we just update the hardcoded list of elements? HTML5 is basically locked down (in their version of 'code freeze'). Development on HTML6 is what's happening now... HTML5 is a known, finite set of elements. We could do those without much worry that they will change.

I'm hoping that in three months we aren't still telling people they have to hack core to get Drupal to do HTML5.

coltrane’s picture

they have to hack core to get Drupal to do HTML5.

That's an overstatement :) but we don't have to rehash #9, #15 and #19

I have some concern over the tags we'd be allowing, specifically audio and video. filter_xss_admin() doesn't allow object for instance.

Has anyone tested the KSES filter and filter_xss against new HTML5 attributes? I need to spend some time looking at http://html5sec.org/

Update:
I tested a few of the items from http://html5sec.org/ but 1) most vulnerabilities require Javascript events or using javascript: and 2) some new attributes apply to form and input - elements we aren't allowing anyways.

jensimmons’s picture

I agree. That is an overstatement. A generalization. Core has to be hacked to get it to create (allow) HTML5 *markup* (which is only one part of what HTML5 has in store for us) from hand-coded text fields and from Views UI and from some other contrib modules and things. But not all things. Much of what might be considered HTML5 is not stopped by filter_xss. Yup. Yup.

Bevan’s picture

This should be hard-coded, not a variable (or at least included in the hard-coded default value for the variable). Unless there are advantages to having a variable? I can not think of any.

To back that up, HTML5 is implemented and in use in multiple high-usage browsers, and is widely accepted in the W3C standards group and related communities, including several published books on the topic—all of which include the assumption of the new HTML tags. There is no reason for the new HTML5 tags to be treated differently to "<div>" and "<p>" and the like.

Excluding them is already hurting Drupal, as it is harder to get HTML5 features working in Drupal than it needs to be, and thus Drupal is not keeping up with HTML5 developments as well as it could (and should, in order to be the best CMS there is).

bleen’s picture

Bevan: the big advantage to having a variable in some form or another is that as new tags are added to the HTML5 spec (and others for that matter) themers and developers do not need to wait for core updates - which can take a looooooong time - before taking advantage of the new tag(s)

sreynen’s picture

Two arguments against a static list:

1) HTML's tag list is not fixed. It changed as recently as this week, removing a tag in our proposed list, published books notwithstanding. Also, we're unlikely to ever have a stable list of tags in HTML now that HTML is a "living standard," no longer versioned.

2) There are use cases for which modules may want to expand the list beyond HTML itself changing. MathML and SVG come to mind.

It seems like the question of filter_xss() running before modules are loaded has derailed this discussion a bit. That's a non-issue. The only markup that goes through filter_xss() before the database is loaded is necessarily hardcoded. We're not changing any of that hardcoded markup here, so there's no reason the allowed tags need to change in that early process. The default list will still apply there just like it does now. This is not a real problem, so we should stop trying to solve it and focus on the real problem at hand: modules need to be able to alter the tag list as it's applied to non-hardcoded markup.

jensimmons’s picture

Isn't it a giant security vulnerability to allow modules to define this list? A malicious module could alter these security filters and cause havoc.

sreynen’s picture

A malicious module can already output insecure HTML directly, without passing it through filter_xss() at all. We already have security policies in place to deal with that risk (e.g. reviewing new contributors). This does not introduce a new vulnerability.

Dave Reid’s picture

Malicious modules can always do anything. That's never a real concern for something like this.

Bevan’s picture

Thanks bleen18 and sreynen; That clarifies a lot. Either way this patch is important for Drupal 7. I which there was an "almost critical" priority level for this issue. I can see how having it be a variable is smart. In addition to making it variable, should the default value for that variable include the most stable new HTML5 tags?

bleen’s picture

Bevan: In my opinion (and it seems the majority opinion) most (but perhaps not all - see #55) of the stable tags in html5 should be included in the default value.

However, that is less important as long as this list of tags is over-ridable by modules and/or themes. In that case adding more allowed tags will be trivial if anyone runs into an issue.

oh ... and the "major" priority is meant as "not-quite critical ... but close"

Lars Toomre’s picture

This issue has been hanging around for more six weeks without resolution. Based on reading through the above 64 comments, perhaps the following pseudo code will move things forward:

function filter_xss_admin($string) {
  return filter_xss($string, filter_xss_list());
}

function filter_xss_list() {
  $list = array();
  //  If in full boot mode
  if  (full_boot) {
  $list = variable_get('filter_xss_allowed_tags', array('a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var', 'article', 'aside', 'audio', 'canvas', 'command', 'datalist', 'details', 'figcaption', 'figure', 'footer', 'header', 'hgroup', 'keygen', 'link', 'mark', 'meter', 'nav', 'output', 'progress', 'rp', 'rt', 'ruby', 'section', 'source', 'summary', 'time', 'video', 'wbr'));
  }
  else {
    $list = // fixed list to use during bootstrap mode
  }
  return $list;
}
sreynen’s picture

Status: Needs work » Needs review

Lars Toomre, there's no need to check where the code is running in bootstrap, as variable_get() already reverts to the default list we pass in when it's called before the database variables have been loaded. Other than that, your pseudo code is basically the same as the actual code in #28.

The patch in #28 was marked as "needs review," but never directly reviewed, by my reading. Instead, we talked about sun's objection to "variables" in #25, which apparently was not an objection to the method in #28, as sun later suggested the same method in #39. Then in #35, still without a review of #28, we moved to a new patch. The #35 patch was marked as "needs work" in #39. After that, we got sidetracked by the bootstrap discussion.

Since then, two people have suggested the same method as #28. So let's review #28.

Lars Toomre’s picture

@sreynen: Maybe I am wrong, but I thought variable_get was not reliably available until we had reached full bootstrap state. Hence, I included in #65 the thought of a smaller fixed list while in bootstrap and a variable thereafter. Are you sure that variable_get() will always be available regardless of the boot state?

bleen’s picture

The variable_get function lives in bootstrap.inc ... it is therefore (by definition) available during all bootstrap states ... the actual variables on the other hand are not always available.

That said, sreynen's comments in #66 are correct.

Lars Toomre’s picture

Now understanding that variable_get() is always available, the patch in #28 looks fine. My only question is what the name of the variable should be. 'filter_xss_admin_tags' seems a bit obscure, and might be better named 'filter_xss_tags'?

Damien Tournoud’s picture

Again, collaborative lists of stuff like this should not be variables. So:

- either we finally say that there is no point in having this configurable, and we just extend the list
- or it has to be a info/alter hook

I'm not too concerned about the bootstrap issue. Worst case scenario, we default to the hardcoded list if called before DRUPAL_BOOTSTRAP_FULL.

sun’s picture

@Damien Tournoud: That's the second time you state the same proposal, but you didn't provide any reasoning for why you think that this list of tags should not be in a variable. I also don't quite get the "collaborative" part. Additionally, I've already counter-argued earlier that it would be completely inane to re-generate a perhaps configurable, but once configured, kinda static list of tags on every single request. I still don't think that makes sense, and seems like a poor workaround for not having a facility of global variable defaults in core yet. Care to elaborate on this?

sreynen’s picture

Damien Tournoud, it seems pretty clear now you and sun are using the word "variables" to refer to two different things, so it would help to be a little more explicit. Apparently sun wasn't objecting to system variables, as used in #28, because he suggested the same thing in #39. And apparently you aren't objecting to static variables, as you suggested using them in #35. So is your objection specifically to system variables? And if so, what exactly is the reason that system variables aren't an option? What about what Dave Reid said in #38 about filter_allowed_protocols using system variables to do essentially the same thing as #28?

Damien Tournoud’s picture

The reasoning is simple: this is typically things that several part of a Drupal site might have conflicting opinions about. The theme might want to allow some additional tags that it is using, a module might want to do the same for a different set of tags. In all the cases where everyone (as in "modules", "themes", "the site administrator") might have an opinion, system variables are not the way to go.

System variables are for each modules' own storage. Variables defined by a module should not ever be modified by another module... or things fall apart as soon as two modules want to modify the same variables.

adrinux’s picture

@Damien Tournoud "The theme might want to allow some additional tags that it is using" makes no sense to me. Theme's aren't going to be filtering content through filter_xss_admin.

@Lars Toomre the function is called filter_xss_admin(), there is another function called filter_xss(), thus the name of the variable: filter_xss_admin_tags, your suggestion of filter_xss_tags would cause confusion.

re: suggestions of adding the new tags instead of making this a variable.
For Drupal 8, it makes sense that core will be HTML5, in which case extending the allowed_tags with hard coded HTML5 tags make sense.

For Drupal 7 - it's XHTML. What are the ramifications of allowing HTML5 tags in filter_xss_admin() on an XHTML site? I had assumed this was *bad* but right now I can't think of a solid reason why. The only argument against, I think is 'never enable things you don't use or need' security ethos. I don't think it's the job of filter_xss_admin() to restrict markup to the default doctype. How could this be exploited?

I'm coming round to the idea of just adding the required HTML5 tags, instead of making this a variable. Previous patches above that suggested adding HTML5 tags were a bit loose I think. Will need to come up with a more restrictive patch.

Crell’s picture

Oh boy...

1) For Drupal 8, let's be honest, what we do here is going to change before release. A lot. The current plans for the Config Initiative involve essentially replacing the entire variable system with something that doesn't suck. So if it makes sense to do it that way in Drupal 7, we do it, and do the same in Drupal 8 for now knowing full well it will change in the next several months. That's OK. This is a Drupal 7 issue that relates to Drupal 8 solely because of Drupal's development procedures.

2) For Drupal 7, it is not XHTML. The core themes are XHTML. A contrib theme could be HTML 3.2 if you were so inclined, by simply overriding the html.tpl.php file. Everything else core outputs assumes HTML 4 with XML style rules, aka XHTML 1.0, which is really just HTML 4 with XML pedantry applied. But that same pedantry is fully compatible with HTML5, so making an HTML5-declared theme in Drupal 7 is absolutely possible and quite easy.

3) No browser with more than 5 users actually validates XHTML 1.0 against the spec, so throwing a section tag or article tag into a page with an XHTML doctype specified will break nothing in practice. The W3C validator will whine at you, but no browser in the world will do so.

The issue here is that Drupal's default filtering rules *for admin-supplied text* were written before the development of HTML5's new tags, and therefore do not include them. We are speaking strictly about admin-supplied text. This is not a fatal issue (div tags aren't going away), but it is an annoying one for anyone trying to do modern web development.

As noted in an earlier comment, we're dealing with admin-supplied text. Nothing will go through this function in non-full-bootstrap that isn't already hard-coded. I don't see any use cases for this change when not in full bootstrap.

That said... who cares? What I see in this issue is tons of over-engineering. This is a simple problem: We have a hard coded list of tags we know don't break stuff if entered by a sane, trusted admin. That has worked fine for a decade. The only catch is that there are new tags that now exist that also meet the standard of "don't break stuff if entered by a sane, trusted admin". So add them to that list. Done, move on, kthxbye. We don't need variables or hooks so that modules can change what tags uid 1 is allowed to enter into an admin screen. That's asking for mysterious things to break.

We're mostly engineers in this thread, and so we're trying to engineer complicated solutions. This doesn't need a complicated solution. The solution is already in core: A list of known tags that we know don't break shit if entered by a trusted admin. We just need to add a few more tags to it and call it a day. Everything else in this thread is, frankly, a waste of effort. Adding tags isn't even a BC break, since nothing will break that worked before; things will just start working that currently don't.

As for which tags we should add... I don't care. That should be decided on by Drupal's designers in cooperation with the security team, or better yet just one or two people representing those two groups. And the rest of us, me included, should shut up and let them do their job rather than bikeshedding them to death. I know bikeshedding minor issues like this to death and over-engineering them is the Drupal Way, but that doesn't mean it isn't harmful.

</rant>

sreynen’s picture

Title: Change filter_xss_admin() to take a variable by default instead of a hardcoded list. » Change filter_xss_admin() to allow HTML5 tags
FileSize
1.35 KB

I'm convinced. Let's take this back to the original request, get the expanded list done, and see if there's a real need for a variable after that. Attached patch adds nearly all of the new elements listed on:

http://dev.w3.org/html5/spec/index.html#elements-1

The exceptions are elements that are only relevant in <head> and form elements, as those were not previously in filter_xss_admin().

Status: Needs review » Needs work

The last submitted patch, expanded_tag_list_for_html5-732992-76.patch, failed testing.

coltrane’s picture

Further research into video and audio tag exploits has been inconclusive. Javascript events are still stripped, so the source is the only concern I have so far. Like embed, given an attacker with ability to use video or audio tags and the source set to a malicious file on attacker's domain, could that file execute a XSS attack or worse, and is this more dangerous than allowing the img tag?

sreynen’s picture

Status: Needs work » Needs review
adrinux’s picture

@sreynen Thanks for working up the patch :)

Patch in #76 That's too loose a list IMNSHO. I think you could remove the following tags: audio, canvas and video – all have the potential for misuse and are likely to be dealt with by modules which can define their own filter_xss function.

Keygen is, I think, effectively a form control and so probably shouldn't be there.

The rest seem to be structural and semantic tags and should be fine.

sreynen’s picture

Updated patch removes audio, video, canvas, and keygen from the list in #76.

I disagree on canvas being an XSS vulnerability, as that tag explicitly disallows loading off-site content. Also, it can only load even local content in combination with JavaScript, which is already excluded with filter_xss_admin(). Audio and video seem a little more questionable, as they can load offsite content, but that content is limited to specific data types, like img. On the other other hand, the data types are more complex than images, so that probably increases the opportunity for a codec bug to become an XSS exploit.

That said, I doubt we'll ever have much need for those specific tags in the contexts where filter_xss_admin() is used, and we can re-examine adding more tags in a future issue if needed.

sun’s picture

Status: Needs review » Reviewed & tested by the community
jensimmons’s picture

Status: Reviewed & tested by the community » Needs review

"potential for misuse" — well, that's true for a lot of elements that are allowed. Remember, this filter *blocks* all use of an element in a text area (even if a lenient input filter is on that text area), and prevents modules like Views and Semantic Fields/Display Suite from letting site builders specify elements to wrap around fields from the GUI.

IMG is currently allowed. If it were blocked, then you could never type <img src="foo"> into any text area on any content type, even if you are user 1.

There are two levels of security in Drupal right now — The We Block This All The Time Because It's Evil, and the We Allow This Sometimes, But Aren't Allowing It In the Filtered HTML Input Filter Because Might Dangerous And We Want To Help You Not Do Stupid Things.

To me, audio, video and the like should not be permanently banned from text areas and views/et al GUIs. They should be treated like image — site builders need to allow with caution, but Drupal won't stop you from allowing it. Let's trust the system we have in place now. Allow these elements in filter_xss, and disallow them in the default full html input filter.

jensimmons’s picture

crosspost

mfer’s picture

Status: Needs review » Reviewed & tested by the community

This seems to break down to two elements to tackle.

  1. filter_xss_admin is a hard coded list for admins. This list already includes img and is generally permissive. Users who can play around in fields filtered by filter_xss_admin can generally do harmful things for site users in other capacities. Expanding this admin list to include html5 bits is a no brainer. Even for audio and video tags. The only question worth debating is technically how to do this.
  2. The other issue is with filter_xss and what it provides by default. This can be a very closed list because it is an API function and just provides a default list. When one uses the html filter and sets allowed tags this list is what is passed into filter_xss and used for that case.

The first case is the one we really need to address because the second is configurable via the web ui. Unless there are objections to the technical methods for the update to filter_xss_admin then the patch set RTBC by sun should be ready to go.

mfer’s picture

Status: Reviewed & tested by the community » Needs work

Actually, going to set this to needs work. The canvas, audio, and video tags should be added to filter_xss_admin. This is filter_xss_admin for admins. It already allows images and is generally permissive. If we are really worried about external things doing naughty things here someone should create a ticket to remove the img tag ;).

If we aren't going to allow these tags then it would be useful to document the expected user role and experience for the cases that use this to come up with a better solution. That and remove the sub tags (like track) of the video tag.

jensimmons’s picture

Include:
audio
source
video

Consider:
canvas
datalist
kengen
link

There are a bunch of others that I think it's clear to not include. I'll look them up and list them later.

adrinux’s picture

Status: Needs work » Reviewed & tested by the community

img has been with us for a long time. At this point I'd like to think that most of the possible vulnerabilities have been discovered and removed, regardless of whether they exist in browsers or underlying graphics libraries.

That's probably also true of audio and video content - all that's really new is the tag, but I'll come back to these after…

Canvas. Which I am concerned about. Take the recent press about WebGL security risks[1] due to graphics cards not being built with security in mind, and note that WebGL is embeded in a canvas tag[2].

My thinking around removing audio, video and canvas was that they're likely to be added via contrib modules[3,4,5,6], as opposed to the tags being directly typed into text fields which get run though filter_admin_xss(). Such modules can have far more specific content sanitization in addition to providing a lot of additional desirable features.

So I'm still of the opinion that there is no need for those tags to be allowed in filter_xss_admin().

As for Jen's other suggestions in #87:
datalist: is a form related tag, support needs to be added via formapi (or elements, html5_tools etc for Drupal-7). So, no.
source: is nested inside audio and video, only if those tags are allowed is source needed.
keygen: another form control. For generating cryptographic keys. No, add to formapi.
link: Unless I'm mistaken, is added to the head as always. Not needed here.

I'm going to put this back to RTBC for the patch in #81 that sun set RTBC.

@mfer "Users who can play around in fields filtered by filter_xss_admin can generally do harmful things for site users in other capacities" - if you follow that logic it would make more sense to do away with filter_xss_admin() entirely ;)

  1. http://www.contextis.com/resources/blog/webgl/
  2. https://developer.mozilla.org/en/WebGL/Getting_started_with_WebGL
  3. http://drupal.org/project/video
  4. http://drupal.org/project/audio
  5. http://drupal.org/project/canvas_field
  6. http://drupal.org/project/media
sreynen’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.3 KB

Even with leaving out those tags, #81 still has a small issue mentioned in #86:

If we aren't going to allow these tags [...] remove the sub tags (like track)

This patch addresses that by removing track. I believe all other sub tags were already removed.

jensimmons’s picture

Status: Needs review » Needs work

The last several patches and much of the latest discussion has all been around the idea of adding more elements to the hardcoded list. Talking to webchick at the Twin Cities DrupalCamp yesterday, however, revealed her definitive preference for us to use a variable, and not add to the list. Taking this approach also means we do not have to agree on the list.

Can someone write up a patch to add a variable to this list (and not add elements)?

jensimmons’s picture

Title: Change filter_xss_admin() to allow HTML5 tags » Change filter_xss_admin() to have a variable alongside of the existing hardcoded list

changing title

marcingy’s picture

Status: Needs work » Needs review
FileSize
1.24 KB

Initial version of adding an additional list for xss_admin() not sure if we want to include tests?

sreynen’s picture

#28 seems a little simpler than #92. We never identified any problems with #28 other than general concerned about variables, as described in #73.

Dave Reid’s picture

Much, much prefer #28.

Jacine’s picture

Issue tags: +HTML5 Sprint: July 2011 - 2

A summary of where we are at with this would be good. Also, tagging for the next HMTL5 sprint.

Dave Reid’s picture

This is #28 re-uploaded for consideration.

The arguments for making this a variable:

  1. We already have a pattern in core for allowing 'allowed' tags/html to be configured - as a variable that has no UI in core. filter_xss_bad_protocol() calls drupal_strip_dangerous_protocols() which has variable_get('filter_allowed_protocols', array('ftp', 'http', 'https', 'irc', 'mailto', 'news', 'nntp', 'rtsp', 'sftp', 'ssh', 'tel', 'telnet', 'webcal')).
  2. The approach used in #92 uses variable_get('filter_xss_admin_additional_tags', array())) which doesn't allow the existing list of allowed tags to be altered at all, as well as adds a call to array_merge() on each call to filter_xss_admin().
  3. With future versions of HTML standards down the road, site builders and administrators of Drupal 8 and/or 7 will no longer be 'locked' into a certain limit of tags where the only solution is to hack core.

Concerns for this approach:

  1. Performance: We are adding a variable_get() call where there was a static array before. We can use the same approach as drupal_strip_dangerous_protocols() and add a static variable to store the result. We should also do a quick performance test to show how much worse this is without a static variable or if we'll even need it.
  2. filter_xss_admin() is called prior to the database being available in the install process, as well as in error handlers. I don't think this is a huge concern if it may behavior slightly differently prior to variable_initialize() - since we do things like variable_get('lock_inc', 'includes/lock.inc'); prior the variables being initialized as well. This is nothing new.

We still need help deciding which tags should be added as well.

TomSherlock’s picture

So with regards to the patch found at #96 if we want to add the video element are we talking about something like:

$filter_xss_admin_tags_array = variable_get('filter_xss_admin_tags');
$filter_xss_admin_tags_array[] = 'video';
variable_set('filter_xss_admin_tags', $filter_xss_admin_tags_array);

could a patch such as the one found #96 be tantamount to a hack if it has not yet been included in a minor release of Drupal 7.x.

I realize the patch is the product of a discussion with 96 comment amongst Drupal professionals familiar with the issue and context, and that it has passed the SimpleTest. Nevertheless. . . just wanted clarification.

David_Rothstein’s picture

If using a variable for this, the patch in #96 looks good as is. (It seems unlikely that performance would be a big concern compared to all the crazy regular expression matching that already happens every time filter_xss_admin() is run? Though it couldn't hurt to check.)

However, I have to say the arguments for using a variable really don't seem compelling at all... The only good argument I can think of for using a variable is if we decide we want to be cautious and not change the default behavior of this function during a stable D7 release? (I'm having trouble thinking of things that could break as a result of allowing HTML5 tags where they would have been stripped out before, but I guess it's theoretically possible.)

Otherwise, I agree with @Crell that making this configurable really seems like over-engineering. For simplicity and security it would be better to just add the relevant HTML5 tags to the existing hardcoded list. The function documentation for filter_xss_admin() is pretty clear about what it's supposed to do ("Allows all tags that can be used inside an HTML body, save for scripts and styles") which doesn't leave much room for the list to be configurable. And to the extent there are security questions about what tags to allow, it would be much better to make that determination centrally rather than leaving it up to whichever random code feels like setting a particular variable.

keithm’s picture

It isn't only (additional) HTML5 tags that are missing. #605664: Allow form tags in Xss::filterAdmin() (closed as a duplicate of the current issue) proposes some additional tags:

<button>
<fieldset>
<form>
<input>
<label>
<legend>
<optgroup>
<option>
<select>
<textarea>

The absence of some of these tags impact even casual site builders. For example, since views filters output with filter_xss_admin() and since fieldset and legend are not currently on the whitelist, it's not possible for views to output a collapsible fieldset.

Edit: As adrinux points out in #19, you can get views to output a collapsible fieldset via field.tpl files rather than the usual views admin 'rewrite this field' option and tokens.

David_Rothstein’s picture

Hm, I don't think that other issue should have been marked a duplicate. I'm going to reopen it now.

Also, I don't see how adding <fieldset and <legend> as allowed tags would allow you to add collapsible fieldsets in the general case; don't you still need to arrange for the correct JavaScript to be added to the page (in order to make them collapsible)? You can't do that with filter_xss_admin() alone.

David_Rothstein’s picture

Title: Change filter_xss_admin() to have a variable alongside of the existing hardcoded list » Allow filter_xss_admin() to accept HTML5 tags

Probably it was marked duplicate due to the issue title. So I'm putting the title here back to something that matches the actual goal of the issue (not tied to a particular implementation).

Using a variable_get() is one proposed way to achieve the goal, but not the only possible one - and it has several concerns noted above.

donquixote’s picture

You might want to look at htmlpurifier, or talk with the guy who made it..

Oh, wait..
http://stackoverflow.com/questions/4566301/htmlpurifier-with-an-html5-do...
> No, HTML Purifier does not currently support HTML 5.
> Are you planning to add support for HTML 5?
> Yes, but in the sense of "Uh, you will have to wait indefinitely long until I get around to doing it."

Jacine’s picture

For simplicity and security it would be better to just add the relevant HTML5 tags to the existing hardcoded list. The function documentation for filter_xss_admin() is pretty clear about what it's supposed to do ("Allows all tags that can be used inside an HTML body, save for scripts and styles") which doesn't leave much room for the list to be configurable. And to the extent there are security questions about what tags to allow, it would be much better to make that determination centrally rather than leaving it up to whichever random code feels like setting a particular variable.

This is a good point. The description is a little misleading though because the current version of filter_xss_admin() doesn't have all tags, except script and styles. There are quite a few elements missing (surely this is on purpose, but still). Some that I notice off the top of my head include: input, button, select, fieldset, legend, iframe, object, embed, etc.

I browsed through http://html5sec.org, but I'm no security expert. I'm not clear on what potential security issues there are, if any. I could definitely use some help coming up with a list of tags to whitelist here. Here's the list of the new elements to consider:

article, aside, audio, bdi, canvas, command, datalist, details, embed, figcaption, figure, footer, header, hgroup, keygen, mark, meter, nav, output, progress, rp, rt, ruby, section, source, summary, time, track, video, wbr

sreynen’s picture

I think http://html5sec.org/#56 makes a good case for leaving out media elements, since they apparently allow JavaScript execution and we're already filtering script. So that brings us back to the list in #89, with the open question of fixed vs. static vs. variable. To help resolve that question, I did a quick performance test:


function microtime_float() {
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}

 function filter_xss_admin_fixed($string) {
   return filter_xss($string, array('a', 'abbr', 'acronym', 'address', 'article', 'aside', 'b', 'bdi', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'command', 'dd', 'del', 'details', 'dfn', 'div', 'dl', 'dt', 'em', 'figcaption', 'figure', 'footer', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'mark', 'menu', 'meter', 'nav', 'ol', 'output', 'p', 'pre', 'progress', 'q', 'rp', 'rt', 'ruby', 's', 'samp', 'section', 'small', 'span', 'strong', 'sub', 'summary', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'time', 'tr', 'tt', 'u', 'ul', 'var', 'wbr'));
 }

function filter_xss_admin_static($string) {
  $list = &drupal_static(__FUNCTION__);
  if (!isset($list)) {
    $list = array('a', 'abbr', 'acronym', 'address', 'article', 'aside', 'b', 'bdi', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'command', 'dd', 'del', 'details', 'dfn', 'div', 'dl', 'dt', 'em', 'figcaption', 'figure', 'footer', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'mark', 'menu', 'meter', 'nav', 'ol', 'output', 'p', 'pre', 'progress', 'q', 'rp', 'rt', 'ruby', 's', 'samp', 'section', 'small', 'span', 'strong', 'sub', 'summary', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'time', 'tr', 'tt', 'u', 'ul', 'var', 'wbr');
  } 
  return filter_xss($string, $list);
}

function filter_xss_admin_variable($string) {
   return filter_xss($string, variable_get('filter_xss_admin_variable', array('a', 'abbr', 'acronym', 'address', 'article', 'aside', 'b', 'bdi', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'command', 'dd', 'del', 'details', 'dfn', 'div', 'dl', 'dt', 'em', 'figcaption', 'figure', 'footer', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'mark', 'menu', 'meter', 'nav', 'ol', 'output', 'p', 'pre', 'progress', 'q', 'rp', 'rt', 'ruby', 's', 'samp', 'section', 'small', 'span', 'strong', 'sub', 'summary', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'time', 'tr', 'tt', 'u', 'ul', 'var', 'wbr')));
}

$start = microtime_float();

for ($a = 0; $a < 30000; $a++) {
  $b = filter_xss_admin_fixed('<div><video></video></div>');
}

$time = microtime_float() - $start;

dpm($time, 'filter_xss_admin_fixed');

$start = microtime_float();

for ($a = 0; $a < 30000; $a++) {
  $b = filter_xss_admin_static('<div><video></video></div>');
}

$time = microtime_float() - $start;

dpm($time, 'filter_xss_admin_static');

$start = microtime_float();

for ($a = 0; $a < 30000; $a++) {
  $b = filter_xss_admin_variable('<div><video></video></div>');
}

$time = microtime_float() - $start;

dpm($time, 'filter_xss_admin_variable');

This probably isn't the best test, but it's something. The results vary, but the general outcome is consistant: static is faster than fixed, both faster than variable. Sample results:

filter_xss_admin_fixed => 3.22498416901
filter_xss_admin_static => 3.0746049881
filter_xss_admin_variable => 3.46761918068

Since static gives us flexibility to alter the list without hacking core and also appears to have the best performance, I suggest we go with that. Attached is #89 updated to use static.

Status: Needs review » Needs work

The last submitted patch, expanded_tag_list_for_html5-732992-104.patch, failed testing.

sreynen’s picture

Status: Needs work » Needs review

I'm not sure, but I suspect those are test errors rather than problems with the patch.

Jacine’s picture

Issue tags: +HTML5 Sprint: August 2011 - 1

Tagging.

sreynen’s picture

Issue tags: -html5, -xss, -HTML5 Sprint: July 2011 - 2, -HTML5 Sprint: August 2011 - 1

Status: Needs review » Needs work
Issue tags: +html5, +xss, +HTML5 Sprint: July 2011 - 2, +HTML5 Sprint: August 2011 - 1

The last submitted patch, expanded_tag_list_for_html5-732992-104.patch, failed testing.

sreynen’s picture

Hmm. I have no idea why those tests would be failing. Any ideas?

David_Rothstein’s picture

Status: Needs work » Needs review
FileSize
1.42 KB

Yes. The patch was passing a variable to filter_xss() that did not exist :)

This one will probably pass.

David_Rothstein’s picture

I'm really surprised by the performance numbers in #104, by the way... I thought from previous performance testing (in other issues) drupal_static() was known to be a noticeable performance hit in frequently-called functions. That is why a lot of those functions use a pattern like this instead:

  // Use the advanced drupal_static() pattern, since this is called very often.
  static $drupal_static_fast;
  if (!isset($drupal_static_fast)) {
    $drupal_static_fast['variable'] = &drupal_static(__FUNCTION__);
  }
  $variable = &$drupal_static_fast['variable'];

But here we seem to have evidence that calling drupal_static() each time is faster than using native PHP array() calls to define a moderately-sized array each time??

Anyway, ignoring the performance issue, it seems like this approach is still combining two unrelated issues into one (add HTML5 tags and make the list alterable). One or the other should really be enough for each of D7/D8. However, if it's actually better performance to make it alterable, then I guess that changes things...

sun’s picture

Component: filter.module » base system
Crell’s picture

Please see #80. Webchick has already said she does not want to expand the list, she wants a variable, because she doesn't want to be responsible for adding a security hole to D7 via a "new and not understood tag".

Personally I disagree with that position, as I don't see a security hole here at all, but it's her call to make. So we need a patch that adds a variable for "Extra allowed tags", defaulting to empty, and then we let the HTML5_Tools module inject whatever tags it decides make sense there.

At least that's what we need for D7. For D8, who knows what the right solution is but for god's sake let's just get this done for D7 and move on. There is no reason for this thread to be at 113 comments and still not resolved. The bikeshedding is over, The webchick has spoken, just do it.

chx’s picture

Status: Needs review » Needs work

Under no circumstances we are adding any way be it drupal_static, variable or intervention-by-chicken to change this list. It's hardwired and it stays hardwired. Lest some aspiring site builder add object to it. Hardwire is good. If you have a freakin' edge case, see http://drupal4hu.com/node/293

chx’s picture

So I ported KSES to become filter_xss and I feel strongly about this being hardwired. Dave Reid asked why protocols are allowed to be changed. Consider what a site builder faces and what she wants to add to the list. If you want to add skype: or spotify: you are unlikely to blow a sechole but if you want to make youtube-embed copypastes to work and change the tags then you create a sechole. The average sitebuilder is extremely unlikely to face a javascript: link he legimitely wants to make work.

Also protocols come and go. spotify: didnt exist before 2008 fall. They are easy to add. Tags need browser support. That's harder to add.

We can add any tag that just decorates text and does not refer to various resources. The attribute filter will obliterate any attempts regardless of the attribute name to do anything nasty, it will run a filter_xss_bad_protocol on practically every attribute value (more complicated than that, but that's the end of it). Unless of course a tag comes with an attribute which runs JavaScript and the attribute is not on* but I trust HTML5 to not want such horrors -- they want modern JS, right?

webchick’s picture

If that approach is ok by chx, it's ok by me.

Dave Reid’s picture

Ok, I'm fully behind a non-variables solution for the sake of "let's get it done". It would at least be nice to add a simple static rather than re-declaring the array every time.

chx’s picture

That might be a good idea.

sreynen’s picture

What exactly is "a simple static"? #111? #112? Something new? If we're waiting on a new patch now, I'd like to write it, but I'm not clear what the agreement was.

David_Rothstein’s picture

I think a simple static would just be like this:

  static $allowed_list;
  if (!isset($allowed_list)) {
    $allowed_list = array(...);
  }

That should be the absolute fastest in terms of performance. We've certainly survived without it before, but since this issue (at least for D8) will expand the size of the array a fair amount, maybe now's the time to do it. No idea if it's a significant performance benefit in real use cases though.

ericduran’s picture

I have to be honest. I read through this issue and I'm not quiet sure what tags we're trying to add.

Do we honestly need to add a new tag? I can't think of any. It would be nice if this list was configurable which is why on #10 I changed the title to remove HTML5 from it, being that I had a weird suspicion that the title will make this a 100 comment thread. Lol I guess title really doesn't matter ;-)

Also honestly most html5 tags that you'll want to use probably shouldn't be running through filter_xss_admin. Being that you should be putting those tags on the page yourself (for the footer/header example) in the tpl. Or another module should be rendering those element for you in the video/audio tag example.

According to chx this list is hard coded on purpose and it'll never change which I guess is fine. I honestly haven't ran into an issue with this and I like to use new elements with Drupal ;-) So pretty much if your new markup is being striped out from filter_xss_admin then you're probably putting that markup somewhere it doesn't belong. Feel free to correct me if I'm wrong ;-) I just can't think of any.

/me hopes for an unsubscribe ;-)

adrinux’s picture

Well, the 'edge case' that led me to this issue is Drupal 7 Views running 're-written' fields through filter_xss_admin() and thus stripping out HTML5 tags. It forces you to customize Views fields via theme views field tpl files instead of being able to do it via the Views UI.

If we can't alter filter_xss_admin(), I suppose we need to tackle the issue in Views itself. BUT having checked the issue queue I found http://drupal.org/node/853880 with Views maintainers stating that filter_xss_admin is a 'must do' according to the security team.

So it seems we are stuck. Security requires filtering entered HTML through filter_xss_admin() and there is a refusal of core devs to allow altering filter_xss_admin(). Impasse.

I can understand not wanting to add a variable, but not adding to the list of hard coded tags effectively blocks Drupal 7 and 8 support for several HTML5 tags entered through the UI. It blocks half-decent HTML5 support.

I can't honestly believe that tags like section, aside and figure/figcaption represent a security risk. Adding *all* HTML5 tags would be stupid, adding a restricted list is common sense, as per patch in #89 above.

@chx if you can suggest an alternative solution to the HTML5 'edge case' I'm all ears.

David_Rothstein’s picture

Status: Needs work » Needs review
FileSize
1.3 KB

Generally speaking, if your code allows administrators to type in text that is expected to reasonably contain lots of complex HTML, you should use a proper text format instead of using filter_xss_admin(), and then you get all sorts of benefits: better performance, usability, flexibility (including whatever HTML5 tags the site administrator wants to allow), etc. Offhand that seems like the right solution for that Views issue also.

However, the reality is that filter_xss_admin() is (mis)used all over the place in both core and contrib, so this issue is still worth fixing.

I did not read @chx's comment as saying HTML5 tags should never be added, only as saying the list should remain hardcoded. I think the security argument there may be a bit overstated, personally; clueless site administrators can already add <object> tags to e.g. the Filtered HTML text format and we let them do that via the user interface, and that's a much easier way to make your site insecure than the extra backdoor for filter_xss_admin() proposed here. However, the bigger point is that I don't think anyone has explained why making the list flexible is an actual good solution for this issue. Someone is still going to need to go through and figure out which HTML5 tags are safe to add and which aren't; adding a variable just pushes the problem to someone else. There are good reasons for security decisions like that to be made in core.

The attached patch goes back to a hardcoded list, adding the same HTML5 tags as the previous patch (which still need to be vetted).

I have not added any static variable for now; unless someone can show that the patch introduces a noticeable performance regression without it, I don't think we need to go down that road in this issue. The filter_xss_admin() function is not called that often on a typical Drupal page load anyway, and there could always be a separate issue to improve performance by introducing a static. (Small performance improvements like that also cause small increases in Drupal's memory footprint, I think, so it's not totally cut and dried.)

ericduran’s picture

@adrinux that might still be an example where you're you shouldn't really be writing that html in a fieldset. Take a look at http://drupal.org/node/1205374
which makes using those new elements in views a snap ;-)

@David_Rothstein thaks for clearing that up for me, I was a bit confused. Now regarding the tags allowed. The progress tag should probably be removed the tag itself is not going to do much with out some companion JS. Not really a security concern just doesn't seem practical.

bleen’s picture

I don't think anyone has explained why making the list flexible is an actual good solution for this issue

I'm not arguing that this is compelling enough, but one reason is that by making this list flexible, we will not need to open another issue like this in 6 months when another HTML5 tag (or a MathML, or choose your favorite fringe markup language) surfaces as being important. In that case we will again need to go 12 rounds on wether we should make that hard-coded list longer or not rather than just letting site owners extend the list on their own.

Again, I'm not saying this is the be all & end all, but it is a legitimate reason and it has been brought up way up there somewhere ...

sun’s picture

Status: Needs review » Reviewed & tested by the community

The progress tag should probably be removed the tag itself is not going to do much with out some companion JS. Not really a security concern just doesn't seem practical.

The JS for that may live in a .js file loaded by your theme or a module.

ericduran’s picture

@sun good enough for me.

sreynen’s picture

Does this still need security team review or are we done?

webchick’s picture

Status: Reviewed & tested by the community » Needs review

I'd like to see final sign-off on this list of tags by a sec. team member (ideally chx or Heine). http://ha.ckers.org/xss.html is a comprehensive list of XSS holes in "normal" HTML tags, and want to make sure we're covered for these new fancypants ones.

chx’s picture

I would like to get Heine's OK before a commit but the issue is IMO ready. I have read the July 29 version http://www.whatwg.org/specs/web-apps/current-work and observed that the only attributes of the Function type are on* which are already killed. The danger as I tried to described above was an attribute with Function type which is not called on*. There is no such. To give you a counterexample, Internet Explorer allowed CSS values to be expression() and then any script inside expression() ran. This is the reason filter_xss kills the style element wholesale. It was enough to write a parser that understands HTML enough to make it safe without being unbearable slow. Thanks KSES for doing that. I am unaware of anyone doing that for CSS. The effort would be dubiously worth it, anywyas.

That aside, it very well might be that video and audio are both good. I do not see how can you run a script with those. You are only specifying a source and a type and nothing else. There is no way to run something as unbound as Flash. If you were to specify a source of javascript, be my guest, or more precisely, be drupal_strip_dangerous_protocols' guest. That function makes short work of any javascript: without caring what the attribute name is. I can't even find why canvas would be a problem when canvas only marks an area as canvas and then you need script running somewhere to do anything with it. However these can easily be a followup.

adrinux’s picture

re: canvas - we probably had in mind bugs like this: http://www.opera.com/support/kb/view/966/ and the hubub over WebGL (which is added via canvas) http://en.wikipedia.org/wiki/WebGL not sure how relevant those are though.

I still think audio, video and canvas are unlikely to be typed into a text area though, they're the sort of content Drupal handles via modules.

(As an aside and a potential new issue: currently filter_xss and filter_admin_xss don't tell you if they strip out markup, this can be very confusing to users. Would it be possible and a good idea to set a message if filter_xss removes something?)

adrinux’s picture

@ericduran re:#125 Nice to see html5 tools module expanding :) But I don't think that addresses the specific problem I was talking about - rewriting markup within views to include one field within another.

Heine’s picture

I've reviewed the new tags with http://dev.w3.org/html5/spec/Overview.html at hand, mostly paying attention to new attributes. I see no security issues in the proposed changes.

I do wonder why command is in the list; no major browser seems to have implemented the element. How the element is going to be used is rather vague as well.

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Heine: When they do start implementing it, we don't need to do anything more at that point. :-)

That's two leading Sec team members that have signed off here. I think we're done.

Jacine’s picture

Awesome! Thanks everyone and especially chx for taking the time to go over the spec with me and helping me understand how to look for potential security issues. :)

chx’s picture

Jacine’s picture

Issue tags: -HTML5 Sprint: August 2011 - 1 +HTML5 Sprint: August 2011 - 2

This is still RTBC, just updating the sprint tag to keep track of it while we wait.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

I committed this to 7.x and 8.x.

I believe committing this to 7.x is the correct thing to do but please re-open this issue if it is not.

Crell’s picture

That was absolutely the right thing to do, Dries. Thanks. :-)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Jacine’s picture

Issue tags: -HTML5 Sprint: July 2011 - 2, -HTML5 Sprint: August 2011 - 2

Cleaning up tags.

Jacine’s picture

Issue summary: View changes

Adding davereid's summary.