Compatibility with Outlook-generated HTML
mike stewart - July 24, 2006 - 10:08
| Project: | Mailhandler |
| Version: | 6.x-1.5 |
| Component: | Code |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | fixed |
Description
The problem seems obvious. Some posts have touched on the subject, but I'm sure anyone using this module would like users to be able to send posts using HTML AND MS Outlook Express (possibly even outlook).
Maybe I'm missing something? I can't figure out how to setup drupal to parse Outlook HTML email posts that retain usable HTML -- and filter out the rest of the gobbly-gook-tags that come with embedded Outlook CSS?
I've played with filters. It doesn't seem to work or I haven't found the right combination.
Related posts I've found (sans-solution):
http://drupal.org/node/39172#comment-97220
http://drupal.org/node/57603
Bueller?

#1
am I the only one? or am i missing the obvious on how to post HTML formatted email (rich text) using outlook?
#2
I'd love to know the solution to this, too, for what its worth.
#3
I had to go with plain text for mime type, then use filters to handle and mailto or URL links.
otherwise, noone has ever responded to me. bummer. a module with such cool potential - esp for users that are inclined to NOT learn a new system of posting content.
#4
As a corporate Outlook user, and part-time site maintainer, I wanted to check this out. I can't really comment on how to **fix** the issue, but I wanted to at least be better able to describe what works and doesn't.
So, I sent some emails to my site. I'm using version 5.x-1.1
Overall, it allowed me to use Outlook without the post looking messed up, but with the exception of bullet points it didn't actually carry over the special formatting.
Maybe not ideal, but acceptable for my purposes.
#5
Thanks for the input, but I think the original title of this post could have been better.
What I was originally getting at is it'd be nice to take advantage of HTML that Outlook generates because it's a very common email client with a built in Rich Text editor. However, when creating an Input Filter for incoming mail that attempts to allow the Microsoft HTML tags, you end up with a bunch of MS tags -> in addition to any valid HTML. This results in said gobbledygook. Many web based email clients seem to work fine with an HTML filter.
So I agree that that content appears 'ok' when using Outlook, Mailhandler, and a default (Drupal 5.x) Input filter... it in fact isn't. (Give it a try)
NOTE: As I recall, Images aren't supported by Mailhandler and are a separate issue altogether. I've seen some patches in the past which allow for image support. In its current state, Mailhandler, lacking image support seems to be a real limitation now that so many people have camera and email enabled cell phones.
#6
This is very much still an issue, unfortunately. I think it's really a matter of getting a good input filter. I currently have "Filtered HTML" as the default Input Filter and that works great for everything EXCEPT email from Outlook. If an email is sent from Outlook, tons of HTML for some reason gets escaped as text and included in the body of the post, making it unreadable (gobbledygook). Interestingly, if I then edit the post and re-save (without changing the input format), all of the goop gets stripped out and it looks fine. Thus, I KNOW that the solution to this is possible and theoretically easy. I am just not familiar enough with Drupal to implement it myself.
#7
I have been looking further into this and have had good success stripping most tags with the HTML Filter and HTML Corrector filters. The only problem is that certain style definitions (those that aren't commented out via HTML) show up as plain text. I am working on getting a solution to this- if you are interested you might want to follow #447684: HTML Filter should strip text between <style> and <script> elements.
#8
I believe that this can be solved using the right combination of input filters, and that any remaining problems are bugs with those filters, so I will close this issue and encourage everyone to take a look at HTML Filter and HTML Corrector filters and bug #447684: HTML Filter should strip text between <style> and <script> elements, or consider creating a custom "Outlook Filter."
#9
I have been unable to get any response re: the HTML Filter. Even if I developed a patch for it I doubt it would get picked up.
I have instead written a small input filter module that runs before HTML Filter and can clear out all of that Outlook-generated gunk.
I can roll this into a standalone module, or it can be distributed with Mailhandler. What would be the better solution?
#10
Another benefit of this module is that it could drop the character count leading up to the actual message body considerably, so that the message actually appears in teasers.
#11
@Dane: Please propose a solution in the form of a patch, or an include or a module - whatever you fill more comfortable with - and let us examine it.
I see no reason to not including it, if it helps.
#12
Okay- most Outlook HTML can be filtered out using the "HTML filter" input filter. The only problem is style definitions that are not HTML-commented out by Outlook. Thus, this patch comments out those definitions. The HTML filter will then strip them out.
In short, after you apply this patch, you can simply use the "HTML filter" input filter to keep Outlook gunk from displaying.
(FYI, on my site I configure the filter to strip disallowed tags, and allow the following tags:
<a> <i> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <h1> <h2> <h3> <h4> <h5> <h6> <table> <tr> <td> <thead> <tbody> <tfoot><br><p><b>)#13
So there's really nobody out there who receives email from Outlook who's willing to review this? :)
#14
I'm trying to get mailhandler/listhandler working in an Outlook environment. I'm a relative noob here, but will try the patch and let you know what happens. Would like to be more helpful, but not familiar enough with community process here ...
Thank you for working on this.
***
RESULT
I tried to apply the patch -- but may not have done it correctly -- I just
1) inserted these lines in mailhandler.include.inc (they became lines 331-333):
+ // Comment out Outlook-generated HTML
+ $node->body = preg_replace("/\r\n(?!<\!--)/", "\rbody);
+ $node->body = preg_replace("/(?)\r\n<\/style>/", "-->\r\n", $node->body);
2) Restarted services (Apache, PHP, MySQL),
3) set Listhandler's 'HTML to Text Converter' to 'Fancy'
4) set mailbox settings in mailhandler to MIME type of HTML, and Input format 'Filtered HTML'
Perhaps I'm missing/misunderstanding something?
The result was forum posts with junk similar to what had been appearing -- e.g., a post that starts with the following, created by an email from Outlook (some of the stuff that's showing in the post on my site is in this comment, but won't appear in the 'read' view on this site--it's only visible in 'edit'):
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);}
***
#15
@sjtout: Sorry I didn't notice that you posted your results until now (edits don't show up as new posts).
<code>tags.)#16
Hi Dane -- thanks for the tip about editing v. adding a new comment.
I tried again. Unfortunately, in the interim, I'd tried to update to the latest release of Mailhandler (6.x-1.8 2009-Jul-21), so things may be affected by that.
I disabled Listhandler this time, so that shouldn't be an issue. I hadn't noticed that the Drupal.org filter had removed part of the patch code from my comment (I had removed the '+' signs last time as well). I did place all of the patch code in the comment -- these lines:
// Comment out Outlook-generated HTML - From patch at http://drupal.org/node/75229 by Dane Powell$node->body = preg_replace("/<style>\r\n(?!<\!--)/", "<style>\r<!--", $node->body);
$node->body = preg_replace("/(?<!-->)\r\n<\/style>/", "-->\r\n</style>", $node->body);
became lines 332-334 (there's a blank line above and below).
There's still Outlook gunk in the post -- it appears in the forum post that results when I send email created in Exchange to the mailbox that Drupal fetches from:
<!--[if !mso]>
<!--v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);}-->
<![endif]-->
<!-- /* Font Definitions */ @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} @font-face {font-family:Verdana; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p.MsoAcetate, li.MsoAcetate, div.MsoAcetate {mso-style-priority:99; mso-style-link:"Balloon Text Char"; margin:0in; margin-bottom:.0001pt; font-size:8.0pt; font-family:"Tahoma","sans-serif";} span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri","sans-serif"; color:windowtext;} span.BalloonTextChar {mso-style-name:"Balloon Text Char"; mso-style-priority:99; mso-style-link:"Balloon Text"; font-family:"Tahoma","sans-serif";} .MsoChpDefault {mso-style-type:export-only;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.Section1 {page:Section1;} -->
<!--[if gte mso 9]>
<![endif]--><!--[if gte mso 9]>
<![endif]-->
***
Let me know if I need to do something differently, or if you'd like to try something else. Thanks for looking at this.
***
One other small note -- for the purposes of this test I made my FilteredHTML input filter allow the same tags you had -- just copied and pasted your tags into my filter definition.
#17
It's working now...
After the last test, it occurred to me to try turning off all of the components of the HTMLFilter input filter, except the filtering itself (there are some path handling components and something called 'HTMLCorrector'). Maybe that cleared the cache -- don't know, but as soon as I turned them off it worked -- the gunk was gone, and now I've turned them all back on and it still works... so... dunno... But am very happy, thank you!
#18
#19
Ok, a little more info--
It was the HTMLCorrector filter that seemed to break your patch. It has to be ordered after the HTMLFilter.
This is my current order of filters:
1. URL filter
2. Pathologic
3. HTML filter
4. HTML corrector
(I turned off the Line Corrector, as it seemed to split ordered lists into two lines ... tangentially, do you know if it can be used without doing that?)
###ok, short follow-up to that parenthetical-- it doesn't have trouble with ordered lists, only the pseudo 'ordered lists' that don't use the OL tag and that Outlook uses instead of standard OLs... not sure how to address this additional Outlook curveball.
Thank you thank you.
#20
Would it be difficult to include this in a filter, rather than a patch? It seems like filter functionality -- the direction Dane started on initially. Could it be a separate filter, so that it could be arranged in the best possible place in the filter precedence list?
I think I'd prefer to have this as a filter than actually editing the content -- e.g., using something like this module: http://drupal.org/project/customfilter
#21
My plan was to eventually roll this into a new filter rather than just making a patch here. However, the customfilter module looks promising - I'll see if I can get this working in there (it'll be a day or two, I'm swamped at work). If anyone's feeling antsy they could probably look at what's in the patch and do it themselves pretty quickly.
#22
I made a quick attempt, but wasn't successful -- if you do get it working, would you post the custom filter rules here?
And, just so it's said -- this capability is really a great to have and will make a big difference on my site, which will have a lot of content from Outlook. I'm sure it will help plenty of others as well. Thanks again.
Over the next month or so I'll try to find some way to address the incompatibility between Outlook "HTML" and the line break converter... if you have any thoughts about that one I'd love to hear them.
#23
Just saw this too -- looks like it may have more options than custom filter... http://drupal.org/project/flexifilter
#24
Over the next two days I will also try to include your patch, this may be exactly what I needed (trying to replace OTRS with Drupal Case Tracker + Mailhandler...btw. OTRS seems to do a wonderful job converting Outlook email, it displays it perfectly, except for images of course).
If i can get it to work I will have a look at customfilter/flexifilter integration if time permits. Thanks for your work!
#25
As a side note on the custom filter/flexi-filter stuff -- I've ended up using Custom Filter to also remove the trailing part of the thread -- that is, the part that's not new -- before posting a new email as a comment to a forum. It's working reasonably well so far.
#26
subscribe
#27
I think filter would be a good idea instead of a patch.
Regards,
Rahim.
#28
I am in the process of rolling a new module based on this patch. Does anyone know if this same sort of HTML is generated throughout Microsoft Office, or only Outlook? If it's all of Office, then this filter might have much broader applicability (and it might be better to name it officehtml_filter instead of outlook_filter)
#29
Okay, go ahead and check out the Office HTML filter that I rolled. As a bonus, it should help in other situations where Office-generated HTML finds its way onto a site through user input. I'm going to go ahead and mark this as fixed.
#30
Thank you Dane!