The HTML Filter does a great job of stripping malicious and malformed elements. However, for certain types of elements such as <script></script> and <style></style> users should have the option of stripping out text between these elements as well, since it makes no sense to display style code or scripts in most situations. I can only think of these two examples right now, but in general this might apply to all elements appearing in the document head...

Specifically, I am having a problem dealing with HTML code where style code and scripts are not already commented out between <!-- -->, which I'm assuming looks like a malformed tag to HTML Filter. Thus, an alternative solution might be to create a filter that just goes through and comments out text between these tags, and runs before HTML Filter, which will then remove the text.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dane Powell’s picture

Is there a reason that this hasn't been commented on? It's a relatively large issue on our site. I am happy to try to put in the legwork on this, but I am not very familiar with the module, and could use some guidance.

Additionally, because I believe the filter is mistaking (albeit conveniently) HTML comments for invalid tags, I am changing this to a bug report.

Dane Powell’s picture

Category: feature » bug
Dane Powell’s picture

Version: 6.10 » 6.12

bump... this is still an issue with the latest version. Please let me know if there is a better place to post this, or if it's just not going to be fixed.

j0rd’s picture

Same issue. I would like to make sure these styles are stripped and not displayed.

I believe the people who are having issues with this are people who are using mailhandler, in which an entire HTML Email will get inputted into the body, which usually contains

and tags along with tags and need the output to look semi-decent.
apaderno’s picture

I agree it doesn't make sense to strip script tags but leave their content; the only reason I can think of to show the script code is to allow an administrator user to notice there is content that must be removed (if showing the JavaScript code mixed with normal content would really allow to notice it).

Dane Powell’s picture

As I mentioned in #75229: Compatibility with Outlook-generated HTML, I built a little input filter that can run before HTML Filter and will cause it to strip out this gunk. It's hackish, but if no one is going to try to patch the HTML Filter then mine might be the next best option.

Dane Powell’s picture

I have created a less-hackish patch for #75229: Compatibility with Outlook-generated HTML that comments out uncommented style definitions, thus allowing them to be stripped out by HTML filter. This patch could probably be easily adapted to apply to HTML filter directly, but I'm not sure how to do it.

Dane Powell’s picture

Version: 6.12 » 6.13

Come on maintainers, it's clear that a lot of people are affected by this issue, and we're having to resort to fixing it ourselves multiple times in individual modules. Obviously it should really be taken care of here. If you're not willing to fix this for some reason at least say so.

apaderno’s picture

I guess it is not considered a critical bug.

j0rd’s picture

I believe it's considered a critical bug by everyone in this thread.

I personally can't see any reason to leave the stuff in between the tags in the output, but I can see many reasons why people would want it removed.

Issue is probably that with a filter that's so widely used, the maintainers are unwilling to change something that might effect many already existing installs. In which case this patch should at least be rolled into something less stable like DRUPAL-7 and for Drupal 6 we can use the extra module provided by Dane Powell.

I would still like to see the maintainers ring in on this, so us people who are effected by thing, can provide a proper solution for ourselves.

apaderno’s picture

I meant that the issue seems to not be considered critical from the maintainers.
I am the first to think the content of the script tag should be removed together the script tag. To only remove the tag works for all the other tags, which have text content that can be shown; it is not the case for the script tag, which has content that is not text, but script code that should be executed by the browser.

If I would forget that the currently set input format does not allow me to use the script tags, I would like better if the script code would not be shown in the page.

lyricnz’s picture

Drupal's "input formats" are made up of an ordered list of "filters" which are applied (in order) to the text of a particular format. If you have a site-specific requirement to remove content from *inside* elements, you should use a module that provides a filter that does this - then add this filter to your input format.

Perhaps there's one already at http://drupal.org/taxonomy/term/63

For a good background on Filters and Formats see http://www.lullabot.com/articles/drupal_input_formats_and_filters

apaderno’s picture

Version: 7.x-dev » 6.13
Category: feature » bug

The problem here is not choosing an input filter to use, but a Drupal input filter that is not doing completely its task (or it is not doing what users expect from it).
If it makes sense to only strip the HTML tags when the tag is something like <h2>, <a> (compare "Test" with "Test"), it doesn't make sense to do the same with a tag like <script>, where the content of the tag is not text that should be shown.

Damien Tournoud’s picture

Version: 6.13 » 7.x-dev
Category: bug » feature

To all people in that thread that consider this issue as "critical": nothing in Drupal happens unless *someone* actually do something for it. Reassigning to Drupal 7, where it should first be considered, before an (eventual) back-port to Drupal 6.

Dane Powell’s picture

Version: 6.13 » 7.x-dev
Category: bug » feature

I think the first step in determining how best to fix this is to determine whether it is a bug or a feature request.

To me, it is quite clearly a bug - if I was the one designing this filter, I would certainly have it strip out text that would otherwise be hidden (such as scripts and styles) rather than dumping it (and there is sometimes a LOT of it) onto a page.

If you don't agree with this, please speak up. But if this is what everyone else would expect as well, then this issue quite clearly fits the definition of a bug and should be fixed in 6.x.

lyricnz’s picture

It seems like a feature request to me - the filter does what it says it does: remove *tags*. The fact that you want it to remove the *stuff between some of the tags*, is new/extra - the definition of a feature request. But, this distinction needn't effect whether it gets done.

Seriously, this would be pretty easy in a contrib module.

Dane Powell’s picture

There's clearly a discrepancy between the filter's prescribed and expected behavior, at least for many users. Given that, I'm not exactly sure what side of the line this issue falls on.

At this point I'm just interested in getting this patched up in 6.x, so I'll probably roll a module based on the patch in #75229: Compatibility with Outlook-generated HTML. If others are able to add an option to strip text between disallowed tags to core in 7.x, that would be great.

apaderno’s picture

I think that normally by "HTML tag", one means "the tag, and its content"; the content of a tag is not seen as an entity separated from the tag that contains it.

lyricnz’s picture

#18: so are you saying that <em>something important</em> should be removed entirely? What about <b> or <span> or <div> etc? (the actual selection of tags is up to the site admin). So clearly you don't mean "remove all contents from all removed tags".

Dane Powell’s picture

lyricnz, I don't think anyone is saying that this is an all-or-nothing affair. The header tags that we are talking about (style, script, meta, title) are functionally and categorically very different from content and styling tags (such as the ones that you mentioned).

sjtout’s picture

Hi Dane -- have you tried your regular expression with the Custom Filter module? -- I'm thinking about putting it in there -- it would be nice to have it as a separate filter, so it could be assigned to run before HTML Corrector, but HTML Corrector could run before HTML Filter...

http://drupal.org/project/customfilter

Dane Powell’s picture

I hadn't heard of that module before, but it looks promising; as I mentioned in the other thread I'll try that. It's the next-best solution to actually getting this patched in core.

Dane Powell’s picture

FYI, I could not get other filter modules to deal with this so I created my own. Check out the Office HTML filter if you want to "fix" this problem yourself.

sun’s picture

Title: HTML Filter should strip text between <style> and <script> elements » HTML Filter does not strip text between <style> and <script> elements
Category: feature » bug

subscribing

Anyone up for implementing this? Shouldn't be too hard.

Dane Powell’s picture

I have two ideas of how to do it but not the time at the moment to follow through - basically just adding a preg_replace or two to the filter_xss function:

1) Use a preg_replace like the following to remove offending tags and content (originally suggested by NancyDru):
$text = preg_replace('/<style.*?<\/style>/xmsi', '', $text);
However this sort of departs from the current architecture which only acts on individual tags (which is exactly what causes this problem)
2) Use the strategy of the office_html filter module to turn the offending content into a giant malformed tag that'll get stripped out in the normal process of the function:

$text = preg_replace("/<style>\n(?!<\!--)/", "<style>\r<!--", $text);
$text = preg_replace("/(?<!-->)\n<\/style>/", "-->\r\n</style>", $text);
Alan D.’s picture

#25-2 These will not match all tags. Office filter now uses this

  $text = preg_replace('/<style.*?<\/style>/xmsi', '', $text);
  $text = preg_replace('/<xml.*?<\/xml>/xmsi', '', $text);
  $text = preg_replace('/<script.*?<\/script>/xmsi', '', $text);
kscheirer’s picture

Version: 7.x-dev » 8.x-dev
Priority: Normal » Major
Issue summary: View changes

Verified the behavior described in the issue is still present in 8.0-alpha4.

sun (#24) and kiamlaluno (#13) agree this should be considered a bug - the default expectation is that text inside a script or style tag would be removed along with the tag itself.

This would also need a test.

star-szr’s picture

Title: HTML Filter does not strip text between <style> and <script> elements » HTML Filter does not strip text between 'style' and 'script' elements
Issue tags: +Needs tests

D7 d.o does funky things to HTML tags in issue titles so re-titling to fix.

Tagging for tests per the last comment.

Wim Leers’s picture

#28: issue titles are not filtered by text formats, but by check_plain()?

star-szr’s picture

@Wim Leers, no idea. All I know is that issue titles with HTML get messed up in certain issue views, can't remember exactly how atm but it might just strip out the tags.

cilefen’s picture

Status: Active » Needs review
FileSize
2.24 KB
cilefen’s picture

#31 is inefficient because it is serializing the html an additional time. It would maybe be preferable to add some kind of option to XSS::filter() to strip the inner tag contents.

Status: Needs review » Needs work

The last submitted patch, 31: html_filter_does_not-447684-31.patch, failed testing.

cilefen’s picture

Status: Needs work » Needs review
FileSize
1.1 KB
1.5 KB

Status: Needs review » Needs work

The last submitted patch, 34: html_filter_does_not-447684-34.patch, failed testing.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

apaderno’s picture

Version: 8.6.x-dev » 8.7.x-dev
kscheirer’s picture

This probably needs a test, but otherwise the patch in #34 seems solid.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

irsar’s picture

I am having the same issue in drupal 8.9.16. In one of the view we are loading the content which has script in it. And when we strip html it strips just the

tags and leaves the content inside the script tag. So we end up showing just the script content and not the actual html content of the page. I am not able to find "function _filter_html($text, $filter)" in "/core/modules/filter/filter.module" as mentioned in the patch #34. Any help would be greatly appreciated and thanks in advance.
Alan Delval’s picture

I need to do more testing before publishing

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

This came up as a daily bugsmash target.

Testing with 10.1 I added

<script>alert('hello');</script>

and confirmed alert still appears.

This ticket will still need a test case (or expand existing ones).

Plus an issue summary update using the default template

thanks.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

Status: Needs work » Postponed (maintainer needs more info)

So brought this up in #bugsmash with @larowlan and @longwave and consensus seems to be this is a feature we may want to keep. Shows that a user has done something they shouldn't.

Also the code that would probably have to be edited is Xss.php which is a nightmare, and would impact other tags as well.