This filter cleans up HTML generated by Microsoft Office. It can remove header tags (<style>, <script>, etc...) and their contents, and can convert HTML entities to their plain-text equivalents. This filter can be used in conjunction with the core HTML filter to completely filter out the plethora of HTML generated by Microsoft Office.

In order to deal with Office-generated HTML, you must not only strip the offending tags but also the markup between them. The core HTML filter can easily deal with stripping the tags by using a whitelist such as <a> <i> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <h1> <h2> <h3> <h4> <h5> <h6> <table> <tr> <td> <thead> <tbody> <tfoot><br><p><b> and choosing to strip disallowed tags. However, due to a bug/feature it does not strip the content between the tags (#447684: HTML Filter does not strip text between <style> and <script> elements). This is the gaping void that this module seeks to fill by stripping out that offending content. It also converts some HTML entities to their plain-text equivalents.

Development / maintenance / issue queue policy

I have no immediate plans / funding for further development. However, I will happily accept RTBC patches.

Downloads

Recommended releases

Version Downloads Date Links
6.x-1.1 tar.gz (6.64 KB) | zip (7.12 KB) 2010-Jul-19 Notes

Other releases

Version Downloads Date Links
7.x-1.0-alpha1 tar.gz (6.65 KB) | zip (7.11 KB) 2011-Apr-17 Notes

Development releases

Version Downloads Date Links
7.x-1.x-dev tar.gz (7.72 KB) | zip (8.18 KB) 2011-Dec-16 Notes
6.x-1.x-dev tar.gz (7.7 KB) | zip (8.14 KB) 2011-Dec-16 Notes

Project Information


Maintainers for Office HTML filter

  • Dane Powell - 8 commits
    last: 8 weeks ago, first: 2 years ago

Issues for Office HTML filter

To avoid duplicates, please search before submitting a new issue.
All issues
Bug reports
nobody click here