Office HTML filter

This filter cleans up HTML generated by Microsoft Office. It can remove header tags (<style>, <script>, etc...) and their contents, and can convert HTML entities to their plain-text equivalents. This filter can be used in conjunction with the core HTML filter to completely filter out the plethora of HTML generated by Microsoft Office.

In order to deal with Office-generated HTML, you must not only strip the offending tags but also the markup between them. The core HTML filter can easily deal with stripping the tags by using a whitelist such as <a> <i> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <h1> <h2> <h3> <h4> <h5> <h6> <table> <tr> <td> <thead> <tbody> <tfoot><br><p><b> and choosing to strip disallowed tags. However, due to a bug/feature it does not strip the content between the tags (#447684: HTML Filter does not strip text between 'style' and 'script' elements). This is the gaping void that this module seeks to fill by stripping out that offending content. It also converts some HTML entities to their plain-text equivalents.

Development / maintenance / issue queue policy

I have no immediate plans / funding for further development. However, I will happily accept RTBC patches.

Project information

Unsupported
Not supported (i.e. abandoned), and no longer being developed. Learn more about dealing with unsupported (abandoned) projects
No further development
No longer developed by its maintainers.
Module categories: Content Editing Experience
420 sites report using this module
Created by Dane Powell on 19 November 2009, updated 13 February 2024
Stable releases for this project are covered by the security advisory policy.
Look for the shield icon below.