HTML Tidy is rather...difficult...to understand how to install, not necessarily in and of itself, but made confusing by the horrible documentation on the SourceForge project page. There are at least half a dozen issues in your issue queue relating to the installation and usage of HTML Tidy.

I suggest a different approach. Use the htmLawed library. It provides the same functionality as HTML Tidy, but as a PHP library. The only drawback to it is it is slower in speed.

Best of all, it is available in Drupal Module form. http://drupal.org/project/htmLawed Thus, all your module would need would be a dependency on another Drupal module instead of something that requires server configuration.

I'd be willing to help you port it.

Comments

dman’s picture

You may be right. there is a history of htmltidy issues.
If that other one works well, it may smooth something out. Worth a look.

dman’s picture

Version: master » 7.x-1.x-dev
Status: Active » Needs work

Clearing the old 6.x issues from the issue queue for a cleanup.
But putting this back on the radar for evaluation. that, or some other contenders (querypath) may be engines to try. But I'm still most comfortable and familiar with htmltidy which I feel is a proven bedrock as a parser

coderintherye’s picture

I'm not sure QueryPath can tidy things up for you, however it can safely ignore bad html.

Would making a patch for htmLawed so that you can give it a try help?