Include support for <script> tags [#824314]

Comment	File	Size	Author
#8	forrester_filter.zip	1016 bytes	nancydru

Comment #1

danepowell commented 10 June 2010 at 22:24

Hi NancyDru

Can you please post the raw input you are having problems filtering as a text file?

FYI, this module only acts on style tags at the moment, because those are the only ones I've had problems with, but I'd be happy to expand it to include script tags as well if it's not too difficult.

Also note that this module doesn't "get rid" of anything on its own - it simply HTML-comments-out offending sections of code, which can then be stripped by the core HTML filter.

Log in or register to post comments

Comment #2

nancydru

she/her/hers

English

Boston

commented 11 June 2010 at 11:53

Ah, so it doesn't take care of <script> tags, or the many crappy things Word does (like <o:p> or ”)? If that's the case, it doesn't help me at all. I really want all that Word stuff gone.

Log in or register to post comments

Comment #3

danepowell commented 11 June 2010 at 19:46

Ah, so it doesn't take care of script tags

As I mentioned- no, because I have yet to see them in Office-generated content. However if you are having problems with them send me a copy of your raw input so I can take a crack at adding support for them.

...the many crappy things Word does (like <o:p>...

Those should be taken care of using other filters such as the core HTML filter, by either whitelisting other tags or blacklisting and stripping the offending ones. If that's not working for you (i.e. the core HTML filter is broken in yet another way...) let me know.

I'm beginning to think this this module was poorly named - it is not a turnkey solution for killing Office HTML gunk, it is simply meant to hide the content contained within header tags (style, script, meta, etc...), thus filling a gaping hole left by Drupal's core html filter (see #447684: HTML Filter does not strip text between 'style' and 'script' elements). In combination with that filter (or others), it is very easy to hide Office-generated gunk in a very general way. Perhaps I will update the description page to highlight that fact.

FWIW, if a turnkey solution is what you are looking for, I don't think such a module would be practical or represent best practice, as we'd constantly be chasing new variants of HTML crud as they are introduced by Microsoft, not to mention that on any production site, whitelisting (supported by the core HTML filter) should always be used over blacklisting.

Log in or register to post comments

Comment #4

danepowell commented 11 June 2010 at 19:46

Title:	Not doing anything	» Include support for <script> tags
Category:	support	» feature

Log in or register to post comments

Comment #5

danepowell commented 25 June 2010 at 13:19

Status:

Active

» Postponed (maintainer needs more info)

Log in or register to post comments

Comment #6

nancydru

she/her/hers

English

Boston

commented 25 June 2010 at 15:49

As far as I'm concerned you can close this. I have abandoned this module and created my own.

Log in or register to post comments

Comment #7

danepowell commented 25 June 2010 at 16:44

I hope I have not generated any antipathy here- I'd be interested to know more about the module you created and discuss if it would be beneficial to the Drupal community for us to work together on this, or if we are really trying to fill different niches.

Log in or register to post comments

Comment #8

nancydru

she/her/hers

English

Boston

commented 25 June 2010 at 18:40

Status	File	Size
new	forrester_filter.zip	1016 bytes

No, I just didn't understand this module before. I had hoped someone had already dealt with the crap I get when people copy and paste from Word and then add scripts. Here's what I have.

Log in or register to post comments

Comment #9

danepowell commented 25 June 2010 at 19:22

Okay, I think we should work together on this and incorporate the features you've added into this module, if that's alright with you. A few thoughts:

1) Your module deals with script tags as well as style tags. Awesome. I'd like to get rid of xml tags as well, to deal with #735496: Filter XML tags.
2) You take a much more direct approach, by stripping the tags and contents altogether instead of commenting them out. I like that, it's probably easier and has less overhead than relying on a second filter to strip the content.
3) You also decode a bunch of HTML entities that aren't "law-abiding", but they all look legitimate to me, and I can only imagine them causing trouble if you're trying to view something in plain text. But it seems like in that case you should enable a "plain text" filter to decode all HTML entities. What are your thoughts on this?

Log in or register to post comments

Comment #10

nancydru

she/her/hers

English

Boston

commented 25 June 2010 at 21:26

Actually, the biggest problem with the entities is that when they get into a title, Drupal can go bonkers and include only part of the title or even none at all. So the other module that calls check_markup also uses that filter on the title.

Beyond that, maybe all browsers can handle them now, but that has not always been the case, nor do I know what other languages do with them.

My users don't save their content as HTML from Word and then paste the whole thing in, as it looks in that other issue. They are just copying the text straight from Word and pasting that. Then they add some scripts, mostly Google Analytics tracking. But allowing any scripts is a security nightmare, hence my desire to get rid of them.

There might be faster ways to scan the content. I would think that the technique I use would work with XML as well.
$text = preg_replace('/<xml.*?<\/xml>/xmsi', '', $text);

I don't care for the commenting technique because I have sites that I have developed where many of the users are still using dial-up, so I don't send any more text than I have to.

Feel free to use any of my code. My module that calls the filter is easily reconfigurable, so I can test the result when it's ready.

Log in or register to post comments

Comment #11

danepowell commented 8 July 2010 at 19:54

Status:

Postponed (maintainer needs more info)

» Fixed

Okay, check out the latest dev release (should roll in the next 12 hours). It uses your style/script/xml tag and HTML entity filters. Let me know if you have any more suggestions. If there's no complaints I'll roll 6.x-1.1.

Log in or register to post comments

Comment #12

nancydru

she/her/hers

English

Boston

commented 9 July 2010 at 03:18

It will take me a bit to test again as I uninstalled it. We also just had a big installation on my customer's site that is probably going to have some fixing to do in a few hours.

Log in or register to post comments

Comment #13

23 July 2010 at 03:20

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Include support for <script> tags

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

News items

Our community

Documentation

Drupal code base

Governance of community