Last updated May 31, 2011. Created by arcaneadam on May 26, 2008.
Edited by pfrenssen, drumm, scor, kbahey. Log in to edit this page.
It seems to be a popular approach in other web applications to process/filter the user input in the name of security. Historically, Drupal has preserved user input as is, and filter it on output only. This is occasionally debated within the Drupal community.
Steven Wittens' excellent article Safe string theory for the web provides a full technical explanation of why it is best to preserve the original user input. The type of filtering needed depends on the output context. Acting on input can be quite problematic because you do not know what characters are forbidden without knowing the context where they will appear.
To make things even worse, a certain string could appear in more than one context. For example, the same string might be used as HTML text and as an HTML attribute too:<a title="$node->title">$node->title</a>
So if you want to strip all characters, your system will be crippled because you need to strip so many characters, if you try to encode, you can not know how to encode. Encoding has another problem, processing escaped text is very cumbersome (try to extract a teaser from HTML escaped node body).
We do not have any other choice but to store the user input unchanged and do the proper escaping upon output.