Hi Edward,

We at the Acquia DrupalGardens team were recently thinking about how to solve the problem with style attributes being blacklisted by core's HTML filter. It seems to be a choice between wysiwyg_filter and HTMLPurifier. We'd prefer to use your code as it seems more robust, tested and applied elsewhere.

Indeed, I'd like to see it replace the filter we have in core. I think the fear is performance. Do you have any benchmarks on using HTMLPurifier? Do you feel like it is performant in varied situations (lots of text, many transformations per page load, etc)?

We'll probably need to start on a 7.x port ASAP if we can prove the performance will be good enough. Also, are you interested in doing any (paid) work on this? Please get in touch if so via my contact form or here on this issue.

Best,
Jacob

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

JacobSingh’s picture

Title: 7.x Port and benchmarks » 7.x Port

Sorry, benchmarks is really a separate issue and specific to HTMLPurifier (the library not the module).

ezyang’s picture

I have not made any official benchmarks, however, it appears to be anecdotally the case that most servers can't really handle ~100 transformations per page load. The Drupal filter does have an aggressive caching policy which should alleviate this.

David_Rothstein’s picture

Version: 6.x-2.1 » 6.x-2.3
Status: Active » Needs review
FileSize
23.47 KB

Here is an initial port of the module to Drupal 7.

Most things seem to behave themselves, but there are TODOs in the patch noting some areas that still need work (some might be existing bugs in D6 though). It also seems like integration with the non-Drupal form isn't working totally right in all cases yet.

Other notes:

  • This patch moves the HTML Purifier cache settings from the per-format configuration pages to their own dedicated settings page. Because of the way the filter administration form works in D7, it would require some real gymnastics to keep making these available where they were before, and wouldn't really make sense to anyway - if they affect all text formats on the site, then they shouldn't be attached to a single text format's configuration form because it is confusing to save one text format and have the settings automatically applied to other text formats also.
  • This currently requires a patch to D7 core in order to work correctly: #826380: Empty filter objects for new/unconfigured filters are incomplete
ezyang’s picture

Thanks for your great work.

Re moving the cache settings, it's a good change that probably should have been in D6, except I couldn't figure out how to make a global settings page, so into the per-filter it went. It may still be useful to provide a link to the global settings page.

I'll take the patch for a spin soon; it might not be too early to set up a branch either. David, would you at all be interested in commit access?

Cheers,
Edward

David_Rothstein’s picture

Thanks, Edward!

I'm not sure I'll have a lot of time to work on this module beyond the initial effort of porting it for (possible use in) Drupal Gardens, so you might want to hold off on giving me commit access... or, rather, you're certainly welcome to give me commit access, but I'm not sure how useful it will wind up being to you in the long run :) It's a very cool module and a neat library, though.

Either way, a branch sounds like it might be a good idea if you feel this is getting close - that would help other people more easily test it out with D7 and help fix the remaining issues.

David_Rothstein’s picture

Here is an updated patch that adds a link to the global settings page as described in #4.

It also fixes a bug with the default filter settings - I was previously using something that made no sense at all (a full HTMLPurifier_Config object when we actually want a settings array). For now, I've fixed this by just removing the 'htmlpurifier_config' default setting altogether, since the module itself works fine without it (see the attached "interdiff" file for the change).

However, I ideally want to figure out how to do it correctly because it's needed for anyone else who wants to write code that cleanly modifies the filter's default behavior. The requirement is that any element which exists in the 'default settings' array in hook_filter_info() should have the same exact structure as what is saved to the database when you submit that part of the filter settings form. So in particular, if you manually save the filter settings form, you wind up with an array of data in 'htmlpurifier_config' with elements that look like this:

    [URI.AllowedSchemes] => http
https
mailto
ftp
nntp
news

This is the format expected by $config->mergeArrayFromForm(), and that is what we want. The problem is I can't figure out how to use the library to recreate this array structure automatically in 'default settings'. The closest I could get was with code like this:

    'default settings' => array(
      'htmlpurifier_help' => TRUE,
      'htmlpurifier_config' => _htmlpurifier_get_config()->getAll(),
    ),

However, that still isn't quite right. That method produces an array of settings with elements that look like this:

[URI] => Array
        (
            [AllowedSchemes] => Array
                (
                    [http] => 1
                    [https] => 1
                    [mailto] => 1
                    [ftp] => 1
                    [nntp] => 1
                    [news] => 1
                )

so not quite the same as above.

Does the library provide a method that I am not finding which outputs the configuration in an array like the first one - i.e. in an format that is suitable to feed back in to $config->mergeArrayFromForm()? Just wondering if I'm missing something obvious :)

ezyang’s picture

Interesting question. mergeArrayFromForm is intended to deal with data supplied by an HTML form, so the closest thing you can get to that is the code that generates the HTML form, which is in HTMLPurifier_Printer_Config, but you still need a browser.

What you're probably actually looking for is HTMLPurifier_Config->loadArray(), which loads in the result of getAll, or perhaps even HTMLPurifier_Config::inherit($parent), which creates a config object that inherits the settings of another object, but can override them.

mamarley’s picture

subscribing

JacobSingh’s picture

Hi Edward,

Any chance we can get this patch committed? Now that D7 BETA2 is around the corner it would be nice to start rolling releases.

Thanks!
Jacob

ezyang’s picture

Status: Needs review » Needs work

Yep, I've branched out 7.x-1.x for dev. Patch as it stands doesn't apply cleanly to latest CVS HEAD.

ezyang’s picture

Status: Needs work » Active

Committed. There are some residual issues which I will be dealing with shortly.

ezyang’s picture

Current known issues with the 7.x port:

  • On filter config page comes up, we get the errors: Notice: Undefined property: stdClass::$name in _htmlpurifier_settings() (line 483 of htmlpurifier.module).
  • Selecting basic filter results in display advanced config
  • Styling of the config form is god awful; figure out why Drupal styles are overriding ours
ezyang’s picture

Version: 6.x-2.3 » 7.x-1.x-dev
Category: feature » task
Priority: Normal » Major

Reclassified.

sammyman’s picture

Bump and subscribing.

AdrianC-1’s picture

Subscribing

smls’s picture

subscribe

Ilya1st’s picture

subscribing!
good module for stupid users who want use html 3.0 ))

ezyang’s picture

Other discovered issues:

* We need to write the upgrade script from 6 to 7
* #993274: Remove double caching logic needs to be completed

ezyang’s picture

Version: 7.x-1.x-dev » 7.x-2.x-dev
ezyang’s picture

The stdclass error appears to have gone a way. I've fixed up the styling in my local copy. Displays seem fixed.

There's another bug where JS for the basic form doesn't work, because Drupal seems to put both the advanced config and the basic config forms on the same page, so the IDs are messed up.

ezyang’s picture

Status: Active » Fixed

Last bug fixed. Gonna roll a release.

ezyang’s picture

Status: Fixed » Active

Actually, no, we still need to write the upgrade script. Sigh.

ezyang’s picture

JacobSingh: I don't think an upgrade hook is necessary, based on reviewing the filter code you cite, and looking at some other ported modules. Shouldn't the filter upgrade hook handle all the cases we're interested in? I guess I should run an upgrade and find out.

catch’s picture

I'm pretty sure an upgrade is needed due to #685486: filter_update_7003() doesn't work for contrib modules.

ezyang’s picture

Ok, that sounds complicated. I'll investigate it soon but I'm not sure when I'll be able to figure it out.

heddn’s picture

heddn’s picture

FileSize
2.83 KB

Closed #1359418: Schema issue upon upgrade to D7 as duplicate and merging into here the patch for the D6 upgrade. One difference is it checks if the schema fields actually exists before droping it for those who have native installed the module in D7.

ezyang’s picture

Status: Active » Fixed

Fixed in bba8868

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.