Closed (works as designed)
Project:
HTML Purifier
Version:
7.x-2.x-dev
Component:
Miscellaneous
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
15 May 2012 at 19:23 UTC
Updated:
25 May 2012 at 06:45 UTC
Hello there....
I have users who copy/paste from their respectively joomla sites in my filtered textformat
and it gives outputs some xml content from their webpages like this
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:HyphenationZone>21</w:HyphenationZone>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>DA</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
</w:LatentStyles>
</xml><![endif]-->How can i configure htmlpurifier to strip that whole xml block
Comments
Comment #1
ezyang commentedHTML Purifier should already strip that out.
Comment #2
acoustika commentedWell it dont..... Well yes it it not writing like that on screen... I havent pasted it all....
But some weird things are printed out, and that is what I see when I disable rich text in te editor
Comment #3
ezyang commentedWhat gets output when you input the text that you showed in this post?
It's possible that HTML Purifier is not getting something, but you might be attributing it to a different kind of markup.
Comment #4
acoustika commentedHere goes at long post :-)
The full copy/paste I see when I disable rich text is like this
And the weird output on the page is like this
I have spend hours trying to put thing in disallowed tags and attributes fields...
Tried the advanced purifier and alot of different things but cant get it to strip it....
I'm sure I have the purifier on the right text format at at bottom filtering process...
I also have tinymce installed through the wysiwig module and use that for that field...
And also invisimail...
My filter process order are
Comment #5
heddnCan you disable pasting from MS Word? I think that is an option from wysywig...
Comment #6
acoustika commentedIn tinymce on cleanup I have these
Verify HTML ON
If enabled, potentially malicious code like tags will be removed from HTML contents.
Preformatted OFF
If enabled, the editor will insert TAB characters on tab and preserve other whitespace characters just like a PRE element in HTML does.
Convert tags to styles ON
If enabled, HTML tags declaring the font size, font family, font color and font background color will be replaced by inline CSS styles.
Remove linebreaks ON
If enabled, the editor will remove most linebreaks from contents. Disabling this option could avoid conflicts with other input filters.
Apply source formatting
If enabled, the editor will re-format the HTML source code. Disabling this option could avoid conflicts with other input filters.
Force cleanup on standard paste OFF
If enabled, the default paste function (CTRL-V or SHIFT-INS) behaves like the "paste from word" plugin function.
So actually that one is off
Comment #7
ezyang commentedIf you copy paste that HTML into the HTML Purifier demo: http://htmlpurifier.org/demo.php?post , you can see that it all gets cleaned up. Did you enable the HTML Purifier filter?
Comment #8
acoustika commentedYes it is.....
But I just tried moving the filters around abit.... And Figured out if I drag "Limit allowed HTML tags" down beneath htmlpurifier it strips it ut from the display..... It is still there in the post if I edit the post and disable rich text... Dont know if it's supposed to be or to get truely stripped away from the post, but at least with having the "Limit allowed HTML tags" filter beneath htmlpurifier it doesnt display anything weird....
I now have my filters like this
Correct faulty and chopped off HTML
Convert line breaks into HTML
Convert URLs into links
HTML Purifier (advanced)
Limit allowed HTML tags
But a thing like making htmlpurifier add rel="nofollow" to al links dont work like that, but that I can make the last filter do...
Encode email addresses: Javascript-wrapped HTML entities
Comment #9
ezyang commentedI think all of the extra filters you have are unnecessary, and can be done by HTML Purifier. In particular, enable auto paragraphing and URLification.