UI for filter_xss allowed tags default

NancyDru - June 20, 2008 - 16:05
Project:Drupal
Version:7.x-dev
Component:filter.module
Category:feature request
Priority:normal
Assigned:NancyDru
Status:won't fix
Description

In the wake of numerous SAs of late, I have been more liberally using filter_xss. One big problem with this is that the standard default allowed tags is quite limited and I keep having to add a long string of tags that I want to all my modules. This is just plain code bloat.

I feel that a much better solution is to allow the admins to specify which tags they want to allow (similar to "Filtered HTML"). Then module maintainers won't have to keep including those long strings.

This patch adds a new tab to the "Input formats" page to define the default allowed tags.

.
A marginally acceptable alternative would be to turn "_filter_html" into a public function. But that doesn't give the admin as much flexibility as this path does.

AttachmentSizeStatusTest resultOperations
xss.patch3.86 KBIgnoredNoneNone

#1

keith.smith - June 22, 2008 - 02:14
Status:active» needs work

There is a patch so changing status.

I'm against mentioning a function name in user-facing text:

+    '#description' => t('These HTML tags will not be removed by the filter_xss function. Be sure you trust all possible combinations and your users before including new tags.'),

#2

NancyDru - June 22, 2008 - 17:49
Status:needs work» needs review

This is an admin function. One would need "administer filters" permission to see it. This is also specific to that function, so I think that is important.

#3

Steven - June 22, 2008 - 20:06

Since when does http://drupal.org/node/28984 tell you to use filter_xss() ?

#4

NancyDru - June 22, 2008 - 20:31

Check_plain renders non-Latin characters as HTML entities, leaving non-English speakers with junk to look at when they enter their alphabet. You can even see this in other places like titles that end up with &#039 instead of single quotes.

#5

keith.smith - June 23, 2008 - 18:22
Status:needs review» needs work

And, as a followup, the sentence "Be sure you trust all possible combinations and your users before including new tags." is also confusing to me. Does that mean that you should trust all combinations and interactions between added tags? Or all combinations of new tags with your users, which wouldn't make sense.

Seems like something like "Be sure that you trust your users to make proper use of all allowed tags." would be simpler and possibly clearer, unless I completely misunderstand the meaning of the original.

#6

NancyDru - June 23, 2008 - 19:17
Status:needs work» needs review

Updated

AttachmentSizeStatusTest resultOperations
xss.patch3.85 KBIgnoredNoneNone

#7

Heine - June 25, 2008 - 09:38

If you need to mix user-supplied plaintext with module supplied HTML, use check_plain.

You can even see this in other places like titles that end up with &#039 instead of single quotes.

This simply indicates that a string containing a single quote has been escaped twice. Which is a coder error, not a problem of check_plain. If you suspect a bug in check_plain the natural response is to file a bugreport, not work around it in some weird, error prone way.

In what context do you use filter_xss? If you need to display user-supplied rich text, why not associate a format with the input and use check_markup instead?

#8

NancyDru - June 25, 2008 - 12:27

I am a module developer, not the site admin. I cannot create an input format for them. It's not just single quotes, that was just an example. And I see this problem in core implementations as well, such as titles, block names, and links. Check_plain simply does not allow rich text. In addition to showing plain HTML, it also turns many non-English characters into entities. My module users simply won't accept this (look at some of the past issues in Glossary, for example). Further, when displaying things like taxonomy term descriptions, there is no input format associated with the text.

The only way I have found so far that makes my international users happy is decode_entities(filter_xss($term->description, $allowed)) (in PHP 5, one may use htmlspecialchars_decode).

#9

catch - June 25, 2008 - 12:54

"Further, when displaying things like taxonomy term descriptions, there is no input format associated with the text."

Then we should add an input format to that field.

#10

NancyDru - June 25, 2008 - 13:30

I would accept that, but considering the imminent freeze, who's going to write that code? And I shudder to think of someone being allowed to insert PHP code into a term or vocabulary description.

#11

catch - June 25, 2008 - 13:41

The freeze isn't imminent, it's unspecified. Could be any time within the next 1-5 months.

PHP in a term or vocabulary description is down to the PHP module being installed and appropriate permission - it's no worse than putting php into a block IMO.

#12

moshe weitzman - June 25, 2008 - 14:13
Status:needs review» won't fix

I think sufficient alternatives have been proposed.

#13

NancyDru - June 25, 2008 - 15:18

Which would they be?

Input formats for taxonomy terms and vocabularies? That doesn't exist and I have doubts that Dries would buy it.

A bug report on check_plain? I don't think there is a bug - it works as designed and does what is needed in some cases. It simply is inadequate for non-English and rich-text as my users tell me over and over again.

Create another input format? That is pretty much what I am doing with this patch, but without the chances of PHP. [And I agree that it shouldn't be available.]

Turn "_filter_html" into a public function? As I suggested this is a marginally acceptable solution since it is largely check_markup($text, DEFAULT_FILTER_FORMAT) with slightly less overhead (but, IMHO, a performance bug of its own).

 
 

Drupal is a registered trademark of Dries Buytaert.