The actions patch is trying to add a drupal_html_to_text() function for use with email bodies to core, but IMO that function is subpar. It just does some minor regular expression replacements, but fails to exploit the possibilities that plain text e-mail has. It is a variation of a piece of code that lives in contrib in various incarnations (project, mimemail, simplenews, og_mail, ...) but has never really gotten the attention it deserves.

For example, every e-mail client supports standard quoting of replies (a perfect <blockquote> match), and plain text e-mail has supported soft-breaks with format=flowed for quite some time now (RFC 2646/3676). The way this works is that it has no (visible) effect on clients that do not support it, as the resulting text looks hard-wrapped to normal text editors (summary: soft break = " \n", hard break = "\n").

I do support introducing a core function for generating mail text from HTML (since there is demand for it), but it should at least support the conventions that email clients use today.

With more and more tools for processing HTML in core (xss filter, html corrector, html indexer and other filters), this function can applied in many situations and the input and output can be tied to other operations. So it's more like "anything the filter system supports"_to_text rather than just html_to_text.

  • The attached patch introduces an alternate drupal_html_to_text() that not only looks much better in the output due to consistent indentation, but is also RFC 2464/3676 compliant and hence will wrap when sent as email and read in a client that supports it (sample conversion). I think it's complete enough to be reused by all the contrib modules that currently have their own rolled code.

    I've tried both Thunderbird and Apple Mail, and both display the text soft-wrapped, as intended. Note however that due to limitations in format=flowed, indented items like lists will still be hard-wrapped.

  • Aside from drupal_html_to_text(), you can also use drupal_wrap_mail() directly to wrap normal plain-text with format=flowed soft breaks. Until now, user.module was in fact incorrectly not wrapping any of its mails, while contact.module was using wordwrap(). However, wordwrap() only works on unbroken lines of text, so requires an explode/foreach:wordwrap/implode loop to apply to arbitrary messages. drupal_wrap_mail() can take any string, linebreaks or not.

    By using drupal_wrap_mail() everywhere in core, this patch sends standards-compliant wrapped mails that look the same as the unwrapped ones (in modern clients).

    drupal_mail() sends the right headers already (no explanation as to why that was done), but we can't do the wrapping in there automatically, because we still need to support passing in other data (e.g. mime multipart, but also pre-wrapped text from drupal_html_to_text), and there is no easy way to distinguish the two.

    I toyed with the idea of wrapping the body automatically if the Content-Type is not set explicitly, but this would mean that anyone who passes in pre-wrapped text (e.g. from drupal_html_to_text) would need to specify a redundant, complicated Content-Type string to override this behaviour, which is not good.

    Manually calling drupal_wrap_mail() seems preferable, and it's better than before where you were supposed to do the explode/foreach:wordwrap/implode dance yourself.

  • It also seemed appropriate to create a mail.inc, since we now have several useful functions for dealing with mail.
  • Finally, drupal_mail() contained the following comment, which had been living in the function's various incarnations since the dawn of time:
    -    // Note: if you are having problems with sending mail, or mails look wrong
    -    // when they are received you may have to modify the str_replace to suit
    -    // your systems.
    -    //  - \r\n will work under dos and windows.
    -    //  - \n will work for linux, unix and BSDs.
    -    //  - \r will work for macs.
    -    //
    -    // According to RFC 2646, it's quite rude to not wrap your e-mails:
    -    //
    -    // "The Text/Plain media type is the lowest common denominator of
    -    // Internet e-mail, with lines of no more than 997 characters (by
    -    // convention usually no more than 80), and where the CRLF sequence
    -    // represents a line break [MIME-IMT]."
    -    //
    -    // CRLF === \r\n
    -    //
    -    // http://www.rfc-editor.org/rfc/rfc2646.txt
    

    In all my time developing PHP on Windows, and now for 1.5 years on Mac, passing "\n" line-endings to mail() has always worked. In fact I specifically remember than "\r\n" (the technically correct ones) did not work on Unices many years ago. Given that we've been using "\n" line endings for ages and I haven't seen a line ending bug report in ages, I simply changed the comment to say that this is what PHP requires, and that the actual mail that will be sent will be correctly converted to "\r\n" line endings.

CommentFileSizeAuthor
html_to_text.patch20.98 KBSteven

Comments

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

i did some testing and no plain text emails broke. i didn't do complex HTML -> plain conversion but I think this is RTBC. we can fix any bugs during beta and RC periods.

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

The examples look fantastic, and these great changes block the actions patch, and the hopefully soon to reborn multilanguage user mails patch, so it was time to commit this. I looked through the code, and it has beautiful code comments to help me understand all parts of it. Mail.inc will probably grow even more with multilanguage emails, so it will have even more right to live. Thanks!

drewish’s picture

High five! WTG Steven.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

ax’s picture

there is a bug in the way drupal_html_to_text() extracts the url from <a href="...">s. would be nice if the patch over there could be reviewed and applied. thanks!