One of the things rasmus mentioned, was being paranoid about any input you get, as we've already seen with the need for the plain_text function on output of titles and the like.
This patch creates a new form property called '#strip_tags', that is set to true by default, that will run strip_tags() on the form value.
Theoretically, I would prefer if you set it to false on a per field basis, for when you expect html / xml / php input from a field, explicitly, but functionally, we couldn;t find a single textarea element in core that wasn't meant to recieve html input, hence I make textareas NOT strip tags by default.
| Comment | File | Size | Author |
|---|---|---|---|
| #3 | strip_tags_by_default_1.diff | 1.82 KB | chx |
| strip_tags_by_default_0.diff | 2.05 KB | adrian |
Comments
Comment #1
chx commentedBRILLIANT idea. I do not think anyone would swear on his life that there is no check_plain left out in his/her module.
Well, there are a few textareas that may not need a strip, for example taxonomy synonyms -- but then again, those are not stored or outputted.
I'll roll another security type patch, which adds automatically the filter_form to node bodies -- we need this for the legacy upgrade anyways -- and then we are all set. The textfields and much everything else is stripped by this patch and most textareas are node bodies in contrib which will get automaticlly a self-validating filter element. Future looks nice.
Get this in! :)
Comment #2
asimmonds commentedThere's a print_r() left in the patch
Comment #3
chx commentedBother! said Pooh and deleted the unncessary patch hunk.
Comment #4
Steven commentedChx wanted me to provide input, so here goes.
No, no, no, no, no no.
First, I really hate invisible transformations.
They confuse the user because you destroy their input without feedback. If someone wants to post a forum topic "Problem with <code> tag", then they should be able to. If I want to use a < sign in my title, I should be able to. By running strip_tags() on the form stuff by default, you're taking that away from the user, without giving anything back. They also give the developer a false sense of security, because you assume that none of the code which they execute can generate unsafe content.
Second, this kind of stripping propagates the false truth that there is a single target format/context. This is not true. A variable can go into a URL (?:/=&#% are special), an HTTP header (linebreaks), HTML text (<&), HTML tag attribute (<>"&), etc.
The situation is similar to magic quotes: the assumption there is that every variable will be sent directly to SQL, and so it is pre-escaped. Except of course, this assumption is false, and magic quotes caused ugly slashes to appear on sites across the board. And it didn't seem to do all that much against SQL injections regardless.
Each of the contexts I've mentioned requires its own escaping, and the different escapings are mutually incompatible. And they are all used in Drupal.
The 'solution' that Rasmus talked about is where they strip every possible special character (it was not clear to me which context he was talking about). If you want to do it properly across the board, then this is a hugely destructive action. Too destructive, IMO.
On the other hand, something as conservative as simply removing HTML tags will not do. In a stereotypical XSS problem of
<a href="$var">, an attacker would inject#" onclick="javascript:evil();, which would pass through strip_tags() unharmed.If we provide a URL profile field for a user, I could put in
javascript:evil();as my homepage, no problem. The exploit vector here is allowing arbitrary (well-formed!) URLs. There is no HTML anywhere.There is a perfectly understandable paradigm for problems like this that has existed in programming for ages: typed variables... the string format/context is the type. Concatenating a plain-text string with an html string is bad (type conflict). Check_plain() can be seen as a cast from plain-text string to html string: by escaping special characters, you ensure the data retains its meaning after the 'cast'. urlencode() is a cast from plain-text string to an url string. Check_url() is a cast from url string to html string. By respecting types, you get safe code.
On the other hand, Drupal developers have consistently shown that they don't want to think about security and just want a system where 'everything works'. That system is one where we go back to the Drupal 4.4 age where you have no clue what comes out when you enter a string. Enjoy.
Comment #5
dries commentedClearly, this needs more discussion. ;)
Comment #6
puregin commentedIs Adrian's patch a sufficient improvement over the status quo to adopt it while we figure out how to implement Steven's typed string approach for 4.8/5.0?
Djun
Comment #7
puregin commentedIs Adrian's patch a sufficient improvement over the status quo to adopt it while we figure out how to implement Steven's typed string approach for 4.8/5.0?
Djun
Comment #8
chx commentedit is an improvement but as Steven says, it's not a complete solution and there are 40 textareas in core, who will check each and every whether they need stripping or not?
Comment #9
drummI agree in principle with chx and Rasmus. But we need the origional values availiable too and we need to see if we can use some smarter function than strip_tags(). Plenty of testing will be needed if this does move forward.
And I'm guessing this doesn't apply anyway since the way to set form values changed.
Comment #10
chx commentedInput filtering is not going to happen, this is made clear by Steven. And I am pretty much convinced he is right.