Hi,

I was under the impression that HTML Purifier would correct my HTML issues (according to standards) however, it seems it removes almost everything including images and Google maps.

Is there any way of treating this or is this just normal behaviour?

Files: 
CommentFileSizeAuthor
#43 html-one.png24.35 KBfilmoreha
#20 iframes.patch1.5 KBdevkinetic

Comments

Update:

Articles I've written that contain images are stripped as soon as I install html purifier (without even enabling it).

Going to input format I can see purifier is not enabled for all profiles, still I have no images on my site. As soon as I uninstall the module my images get back.

Any help or suggestion would be hugely appreciated!

Hi!

External images are disabled by default, since Drupal's default HTML doesn't allow any images at all. You can turn them back on by setting "DisableExternalResources" to No.

As for Google Maps, it utilizes iframes, which are disallowed by HTML Purifier for obvious security reasons. There are some ways to work around this, but it will require writing a nominal amount of code. I can teach you how to do it, but it'll be kind of nontrivial if you've never written PHP before.

Cheers,
Edward

Status:Active» Postponed (maintainer needs more info)

ezyang,

Many thanks for your response.
I can confirm that turning DisableExternalResources off does indeed allow us to embed images from external sources.

With regards to Google maps, can you please think of a workaround that would allow us to use them?
I'm afraid my knowledge of PHP is simply null. I have the confidence of pasting some code to a document but writing my own code...

Any alternative approach would be extremely appreciated!

Cheers

Would a SafeIframe functionality work for you? This would require you to explicitly whitelist domains that you'd want to allow iframes from.

I would like to second the SafeIframe whitelist idea. I think being explicit about who you trust is a perfect solution.

Sounds an excellent idea.
Can you show me the right direction? I need to see a few examples, maybe an article or a drupal resource?

Many thanks!

Title:HTML Purifier removes Images and Google mapsSafeIframe configuration for images and google maps
Status:Postponed (maintainer needs more info)» Needs work

Renamed.

Status:Needs work» Postponed (maintainer needs more info)

We also need this because YouTube changed its embed code to use iframes. I need some UI advice from you guys: what kind of whitelisting mechanism do you want? Domains? Regexes? Arbitrary code? If we allow multiple whitelisting mechanisms, how do they interact with each other?

domain whitelisting would work to solve issues for non-mainstream websites.

Example Embed Code
ex www.democracynow.org

<script type="text/javascript" src="http://www.democracynow.org/embed_show_v2/300/2011/1/25/story/do_you_know_the_full_story"></script>

I have some problems to embed Amazon banner code:

<iframe src="http://rcm-de.amazon.de/e/cm?t=xxxxxxxxxxxx&o=3&p=20&l=ur1&category=generic&banner=1VH46RJT28QKG4Q5HM02&f=ifr" width="120" height="90" scrolling="no" border="0" marginwidth="0" style="border:none;" frameborder="0"></iframe>

I think domain whitelisting would be great.

Any other ideas on how I could embed this code to a block with htmlpurifier turned on?

I had to write Filter for HTMLPurifier, and tell HTMLPurifier module to add the filter in the config:

http://stackoverflow.com/questions/5144189/htmlpurifier-iframe-regex-iss...

Now I can embed Google maps and other iFrame content.

It would be nice to add a domain whitelist, so iframes would be allowed if the source was Google, Youtube, Vimeo, etc.

Ya, this is annoying. I had trouble embedding Youtube videos with this module enabled.

@Kevin Quillen:

can you provide more specific details on how you managed this?
I did what you mention on stack overflow, also read the HTMLPurifier forum but it wont work :(

In the HTMLPurifier module you also have to add to _htmlpurifier_get_config():

$config->set('Filter.Custom', array( new HTMLPurifier_Filter_MyIframe() ));

I know I should not hack the module, unless there is a hook I simply did not see.

It might be best to have the _config function invoke a hook so other modules can set their own filters or other HTML Purifier settings through code.

In my case, I cannot enable Advanced mode with the iFrame plugin (PHP error, something about it cannot render it in the form). So I had to adjust the module. Is there any other way to change settings through code without editing the module? I could not get the format specific config file to work.

Thanks a lot Kevin!!

I know that i shouldnt hack the module as well,but my client cant wait :/

So i just add this to my list with hacked modules to watch out on upgrades

Kevin,

Can you provide a more detailed explanation?

I added: $config->set('Filter.Custom', array( new HTMLPurifier_Filter_MyIframe() )); to _htmlpurifier_get_config()

But I'm unsure where to add the snippet from http://stackoverflow.com/questions/5144189/htmlpurifier-iframe-regex-iss....

Thanks!

devkinetic

i added it to HTMLPurifier_DefinitionCache_Drupal.php and it works perfectly:)

StatusFileSize
new1.5 KB

UPDATE: The issue i was having was the line break converter in Drupal was wrapping the iframe in a P tag. The code was working correctly, but because the block element was placed within the inline p tag, Purifier was stripping it out anyways because it was invalid HTML.

Here is a patch file that is comprised of the suggestions in this thread.

back to the main point though, a safe-list sounds like the best bet!

yeap +1 for domain whitelisting

Another +1 for whitelisting.

Cheers

El B

I was able to get this working in 7.x but only briefly. Returning to the Text Format config form for any format utilizing HTML Purifier results in the following PHP error:

Object of class HTMLPurifier_Filter_MyIframe could not be converted to string";s:9:"%function";s:49:"HTMLPurifier_Printer_ConfigForm_default->render()

The page is not editable as it just says a generic Error message. It also points to line 266 of ConfigForm.php in the Printer library of HTMLPurifier:

<?php
case HTMLPurifier_VarParser::ALIST:
                  
$value = implode(PHP_EOL, $value);
                    break;
?>

Commenting out $value makes the form show up.

What are some possible solutions to this problem? Is it the plugin code, or the way it is trying to be interpreted? Casting (string) on the imploded value there also makes the form re-appear, though I do not know what implications that has on the library.

Version:6.x-2.1» 7.x-2.x-dev

4.4.0 of html purifier now supports safeiframe

Status:Postponed (maintainer needs more info)» Fixed

Fixed. You need HTML Purifier 4.4.0, and you need to access the "Advanced Settings" (as they are not shown in basic settings.) The configuration you need to set is: turn on HTML.SafeIframe, and fill in URI.SafeIframeRegexp with the necessary values. Here is an example that allows YouTube and Vimeo: %^http://(www.youtube.com/embed/|player.vimeo.com/video/)%

Don't forget to add iframe (and the necessary attributes) to your allowed elements list, if you are manually configuring this.

This still doesn't work. It gets stripped out.

if you have the remove empty items then yes it will. To fix this, give the iframe a name property and purifier will ignore it's remove empty things rule. You might also have to refresh the page after save as I've noticed I have to do this all the time after new-ly saving the node (6.x but should still be the same).

Its so super confusing. I turned those off and cleared the cache, but iframes did not show up until the 8th reload. Why is that?

I have tested this and you have to touch more things.

1. You need: RemoveEmpty: No

2. If you have: RemoveEmpty.RemoveNbsp: Yes, then you need to add > RemoveEmpty.RemoveNbsp.Exceptions: iframe

3. If you use HTML Allowed > You need to add here: iframe[frameborder|marginheight|marginwidth|scrolling|src]

4. Put SafeIframe: Yes and I use for SafeIframeRegexp: %^http://(www.youtube.|player.vimeo.|maps.google.|www.slideshare.)%

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Version:7.x-2.x-dev» 7.x-1.0-rc1

This does not appear to work with 7.x-1.0-rc1 of HTML Purifier. I am using 4.4.0 of the HTMLPurifier library.

I wish to embed the following video.
http://www.youtube.com/embed/e3OthM-seJs?wmode=opaque

My Settings:
SafeIframe: Yes
SafeIframeRegexp: %^http://(www.youtube.|player.vimeo.|maps.google.|www.slideshare.)%
RemoveEmpty: No
RemoveEmpty.RemoveNbsp: No
I have added the following to AllowedFrameTargets:
_blank
_self
_top
_parent
I am using the default allowed HTML.

The output src attribute of the iframe is stripped out when I use HTML Purifier, however with Full HTML allowed I can see that the src that is output is as follows: //www.youtube.com/embed/e3OthM-seJs?wmode=opaque

I am guessing the regex is incorrect but everything I have tried is not working. The src link is being created by the media module filter that runs before HTML Purifier.

Any ideas as to why I cannot get a video to appear? If 7.x-1.0-rc1 does not support this where can I grab the 2.x-dev version?

Category:feature» bug
Status:Closed (fixed)» Active

I tried this in another environment and have the same results. No YouTube video is shown. The src attribute is stripped out upon save when using HTML purifier.

I tend to agree its a regex thing. Can you confirm if this an upstream library issue? If so, I'll point you to http://htmlpurifier.org/phorum/list.php?3.

I have tried this with the standalone PHP library and it works great. The settings in the code block match what I have in Drupal.

<?php
require_once 'htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp','%^http://(www.youtube.com/embed/|player.vimeo.com/video/)%');
$config->set('Attr.AllowedFrameTargets', '_blank, _self, _target, _parent');
$config->set('Attr.EnableID', true);
$config->set('AutoFormat.Linkify', true);
$purifier = new HTMLPurifier($config);
echo
'<h1>Show me my Movie</h1> <iframe width="560" height="315" src="http://www.youtube.com/embed/e3OthM-seJs" frameborder="0" allowfullscreen></iframe>';
?>

I followed this thread, http://htmlpurifier.org/phorum/read.php?3,6237,6237#msg-6237 to set up the standalone version.

Make sure that none of the other filters, including core's html filter don't break what is going on with htmlpurifier. Disable all the other filters and see if it still doesn't work...

Status:Active» Postponed (maintainer needs more info)

I apologize for the delay. I am only using videos on a few areas of this site. I have been using the default full html text format for the moment.

I turned off the two other filters, image resize and convert media tags to markup. I also tried it with one off the other one for both combinations with no luck. However, perhaps the media tag markup upon conversion is messing with HTML purifier. I am converting the media tags first in the filter processing order and html purify is running last.

If I switch that order and have the media tag markup filtered last the videos are output correctly. I am guessing this just avoids the regex check applied by HTML Purifier.

If it helps in diagnosis, the media markup that is output if I do not convert the markup with the media module's filter is as follows:

Video 1:
[[{"type":"media","view_mode":"media_large","fid":"33","attributes":{"alt":"Intro.mov","class":"media-image","typeof":"foaf:Image"}}]]

Video 2:
[[{"type":"media","view_mode":"media_large","fid":"35","attributes":{"alt":"WWU Summer Commencement 2011","class":"media-image","typeof":"foaf:Image"}}]]

For now, I will keep the order swapped as it works and the media sources are currently vetted before being posted. Thank you for the help.

Working Filter Order on text format

  • Image Resize
  • HTML Purifier
  • Convert media tags to markup

Status:Postponed (maintainer needs more info)» Fixed

Glad that worked for you.

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Have you noticed a problem where the embedded media is wrapped in paragraph tags and mucks up the market when source is viewed?

Title:SafeIframe configuration for images and google mapsSafeIframe configuration for images, google maps, and videos
Version:7.x-1.0-rc1» 7.x-1.0
Status:Closed (fixed)» Active

Going to reopen this b/c I having the same issue.
Yes I understand the fix is to run the Convert media filter after htmlpurifier, however that still isn't optimal since you lose out on stripping the automatic paragraph tags that are wrapped around your embedded content.

To reproduce, use wysiwyg, media + media_youtube, and htmlpurifier. Remove all filters except Covert media and htmlpurifier. Run 1) convert media before htmlpurifier then 2) htmlpurifier before convertmedia.

Create some content and insert a youtube video with media. You'll notice in the first case the iframe renders but the src and embedded markup do not exist so you get a blank square, in addition there are no P tags around the iframe's container div if you view source. In the second case, the iframe and video are rendered correctly, but viewing source you see a pair of empty P tags above and below the iframe container div.

Any ideas?

StatusFileSize
new24.35 KB

Here are two images to illustrate my previous post.

onetwo

Here are two images to illustrate my previous post.

onetwo

Thank you oriol_e9g - this works for me!

#30 worked for me too (Thanks, oriol_e9g!) but note that I had to clear caches after making the configuration changes. Probably just clearing the HTML Purifier cache at admin/config/content/htmlpurifier should suffice.

Not recommended, but you can allow content from all sources by using %^.*% in the SafeIframeRegexp field.

#30 it doesn't work for me I loose image, youtube and vimeo video. I'm using HTML purifier 7.x 1.0, and HTML Purifier v4.5.0. Should I to change to HTML purifier 7.x-2.x-dev?

Core is 7.23.

Thanks a lot!

I've had some success with the #30's steps, plus the following regex for the SafeIframeRegexp:
%^(https?:)?//(www\.youtube(?:-nocookie)?\.com/embed/|player\.vimeo\.com/video/)%
This way it accommodates for a src that starts with "//", and also if you have http or https. I got it from the HTMLPurifier documentation.

Now, I said "some" success. I'm using the Media embed toolbar button. I can get the video to embed, and it shows when I view the node, but if I go to edit it again, all the iframe stuff gets stripped from the field. Has anyone else had this experience?