htmLawed: purify HTML for security, standards and admin-compliance

The htmLawed module enables the use of the htmLawed (X)HTML filter/purifier PHP script as an input filter with input format-, content (node) type- and body/comment/teaser-specific configurations .

Its speed and high configurability as well as its coverage of entire HTML (including elements like script, form and embed, CDATA sections, HTML comments, etc.) sets htmLawed apart from the in-built Drupal filter as well as many other filters that require external applications like HTML Tidy or use incomplete, or large and resource-intensive libraries like HTMLPurifier.

The highly-customizable htmLawed filter can be used to make text with HTML more secure, and HTML standards- and admin. policy-compliant. It can auto-correct and beautify HTML markup and restrict HTML elements (tags), attributes, and URL protocols in the input. It also balances tags and checks for proper nesting of the HTML elements. Furthermore, it can transform deprecated tags and attributes, check and convert character entities (e.g., from hexadecimal to decimal type), obfuscate email addresses as an anti-spam measure, etc.

The GPL licensed, single-file (45 Kb) htmLawed has a basal peak memory usage of just ~0.5 Mb and is well-documented. The filter can be tested on this demo page.

The module allows the use of different htmLawed filter-settings for different node-types (content types, such as stories and pages). Specific filter settings can be used for teasers (including RSS newsfeed items), as well as comments and other types of input. The module also provides an option to filter submitted content before it is stored in the database.

By appropriately setting the module, PHP coders can further finely specify the htmLawed configuration (e.g., for user-specific input HTML filtering).

 
 

Drupal is a registered trademark of Dries Buytaert.