During the development of the "send a mail" actions of the rules module I noted that drupal_html_to_text() removes the line endings of the mail text.

It seems that function is only used by drupal's system mail action - so I was able to reproduce the problem by sending mail with that action too.

I noted that the preg_replace('/\s+/', ' ', $text) call is causing the problem, as \s seems to match the line endings too.
Possible fixes are:
* remove this replacement of whitespaces as indenting with whitespaces might be useful when no html is available.
* just replace multiple real spaces ' ' by one

A patch for both fixes is attached for HEAD. Both apply to drupal 6 too.

Comments

fago’s picture

Title: block user action creates a garbled log message » drupal_html_to_text() removes line endings

arg, fixing title.

dries’s picture

I think you're right about this, fago. I would recommend that we write some simpletests for this.

I think I've seen this problem at Acquia. I've pointed Acquia's engineers to this issue; maybe they can help test this fix.

dave reid’s picture

One thing to note...if you call drupal_html_to_text before calling drupal_mail, it will strip line endings. If you just call drupal_mail (which has a call to drupal_html_to_text in the function), it won't strip them. Found this in #299548: Line breaks not showing up in email. Basically there are problems calling this function twice, just like from #212126: drupal_html_to_text() can only run once.

dave reid’s picture

Ignore my previous post. I was wrong about drupal_mail.

This is very much a bug in drupal_html_to_text. I did a little investigative work into all the hook_mail calls to see which ones ran the mail body through the drupal_html_to_text function:
- contact_mail: Does not call drupal_html_to_text (Contact form e-mail I get are always formatted correctly)
- system_mail: Does call drupal_html_to_text (Email sent with the action 'send email' does not format correctly)
- update_mail: Does not call drupal_html_to_text
- user_mail: Does not call drupal_html_to_text (User registration emails are formatted correctly)

So, anything that calls drupal_mail('system', ....) or drupal_html_to_text will not be formatted correctly. This includes:
system_send_email_action
Comment_notify module: #299548: Line breaks not showing up in email

This can be confirmed by adding an action 'Send email' (admin/settings/actions) and typing a body that has multiple line breaks in it. Assign that action a trigger (admin/build/triggers), and make sure it fires. The e-mail is run all together with no line breaks, except for where drupal_wrap_mail has forced it to wrap. Also I'm wondering why system_mail the only one calling drupal_html_to_text? Should the other mail_hooks be calling it as well but they aren't currently because of this bug?

frankcarey’s picture

So, wouldn't it just make sense to bring system_mail in line with the rest of the mail functions? I'm wondering if this was an oversight, or if there was a reason for using drupal_html_to_text() in this context.

frankcarey’s picture

StatusFileSize
new740 bytes

I'd like to have this fixed in 6.x first so I'm temporarily changing the version an uploading a patch that removes drupal_html_to_text.

Looks like it is unnecessary, at least in the send mail cation context as my email went through with no html.

Looking at the sister function user_mail and it's helper _user_mail_text... the only text conversion happens here:
>> strtr($admin_setting, $variables);
with no html_to_text conversion.

Should be the same patch for 7? I'll have to check it out , make changes and I'll post the patch in a few min.

fago’s picture

Version: 7.x-dev » 6.x-dev

hm, I don't think it's clever to *not* use drupal_html_to_text() any more instead of fixing drupal_html_to_text().

Yes, right the system action is the only mail implementation in core that uses it. But this doesn't change the problem.

frankcarey’s picture

Seems to me that these mail functions should just output the full body (As most of them do).

Then a hook_mail_alter(); function should modify the email as appropriate, doing this work in in the mail function completely limits the ability to use html, even if you were to use mime mail or the like (assumed not tested).

Brining this one function in line with the other functions seems the important first step, and then the writing of an appropriate _mail_alter would be the second, though not immediately necessary.

Thoughts?

frankcarey’s picture

Version: 6.x-dev » 7.x-dev
StatusFileSize
new726 bytes

drupal 7 version of patch

frankcarey’s picture

fago,

I want to make clear that I still think html_to_text() needs to be changed, and it looks like you've made the appropriate patches above. I just think it is a two part problem:

1) drupal_html_to_text() is over aggressive by taking out line breaks.

2)system_mail breaks from the others by using drupal_html_to_text() in the first place and it should be removed, with the possibility of having plaintext vs. html emails handled in a more flexible sytematic way.

fago’s picture

point 2) is completely unrelated to this issue, point 1), so please open a separate issue for that. thanks.

heine’s picture

Status: Needs review » Needs work

We need to step back a little and consider the role of drupal_html_to_text. As the name implies, it serves to convert HTML to an acceptable plaintext representation.

Relevant from http://www.w3.org/TR/REC-html40/struct/text.html#whitespace:

For all HTML elements except PRE, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). When formatting text, user agents should identify these words and lay them out according to the conventions of the particular written language (script) and target medium.
[...]
This layout may involve putting space between words (called inter-word space), but conventions for inter-word space vary from script to script. For example, in Latin scripts, inter-word space is typically rendered as an ASCII space ( ),
[...]
In particular, user agents should collapse input white space sequences when producing output inter-word space.

In light of the above, it makes perfect sense for drupal_html_to_text to collapse whitespace.

The issue here stems from confusion surrounding the type of string passed to drupal_html_to_text. One should simply not pass plaintext to the function.

To add to this confusion, many think $node->body is HTML, whereas in reality it is Rich Text (RT) and needs to be converted to HTML by calling check_markup with the specified format (in $node->format). This HTML can then be converted to plaintext by drupal_html_to_text.

The question that needs to be answered to resolve this properly is then: what type of string is the message?

dave reid’s picture

For most calls to drupal_mail I can think of, it's an admin-entered textarea string with line breaks only. If a function is e-mailing HTML content, it should be up to that code to make sure to use drupal_html_to_text, but the system_mail function should not automatically call drupal_html_to_text.

It seems to me the drupal_html_to_text call should actually be in the following part of system_mail and not automatically applied to the subject:

  if (isset($params['node'])) {
    $node = $params['node'];
    $variables += array(
      '%uid' => $node->uid,
      '%node_url' => url('node/'. $node->nid, array('absolute' => TRUE)),
      '%node_type' => node_get_types('name', $node),
      '%title' => $node->title,
      '%teaser' => $node->teaser,
      '%body' => $node->body,
    );
  }
fago’s picture

hm, I agree. If HTML is passed to the function, it doesn't matter whether line endings are removed. So it's just wrongly used by the action.
However it might still be useful if it doesn't strip line endings - as so one could also pass plain text or mixed strings (plain text enriched with some html tags) through it.

dww’s picture

I ran into the same problem over at #356968-5: Line feeds stripped from confirmation email. It does seem like Heine's hit the fundamental point here -- are the email templates considered plain text to be sent as-is, or RT to be converted to HTML as appropriate, and then back to ASCII via drupal_html_to_text()? If anyone's interested, I've got examples over at #356968: Line feeds stripped from confirmation email demonstrating the pros and cons of each approach in terms of how things behave in my contrib module given the current core API. Ultimately, we need to be clear about each email template setting, and is it plain or rich text...

nadavoid’s picture

I would like to see the patch attached to #6 get applied to drupal core 6. Without it, email actions that are defined in the core actions module become very jumbled very quickly. Should I open a separate issue for that?

nadavoid’s picture

I may have spoken too soon. I think I like using one of the originally attached patches: "just replace real spaces" or "don't replace whitespaces". This is because other modules trying to send plain-text emails will get their line-endings stripped, unless one of these patches is applied. I am still thinking mainly about Drupal 6, so let me know if I should open a separate issue about this.

fuerst’s picture

As long as there is no fix for this problem in Drupal 6 you have to enter <br> tags as line break. Looks ugly in Action's "Edit Message" textarea but it is a work around.

Once the function drupal_html_to_text() got fixed your messages probably will have 2 line breaks where you expect only one. So you instantly will know this bug is fixed :)

heine’s picture

There's nothing to be fixed in drupal_html_to_text. You need to pass HTML, not plaintext.

frankcarey’s picture

long overdue, but finally added an issue for #10 - 2 , to have the drupal_html_to_text removed from system_mail

http://drupal.org/node/407452

heine’s picture

Status: Needs work » Closed (won't fix)
dww’s picture

Status: Closed (won't fix) » Closed (works as designed)

I think "by design" is more appropriate here. There's no bug in drupal_html_to_text() itself -- it's doing what it's supposed to. The problems are places that call it without sending in HTML in the first place.

liquidcms’s picture

subscribing.. and voicing my general concern.. i love Drupal!! but damn.. mail has to be about the worst handled thing in Drupal.. at Drupal 6 now and still not much improvement over Dr4 - every site i do i have to piece together a mail system to send something as simple as a text (but in html) email with a couple links in it.

oh how i long for the day when core will let me send smtp authenticated html email without having to add extra class files, 2 or 3 modules and then still hack a few things.. :)

was pretty sure that in Dr5 actions emails worked better than they do now.. scared to think our email solutions are going backwards..

sorry.. just frustrated.. sent about 50 test emails tonight trying couple different modules and still can't get simple email action to look correct..

try again tomorrow.

dww’s picture

@liquidcms: I totally agree. Mail handling in Drupal sucks. Especially hook_mail() in D6 and beyond -- the developer experience is terrible. And core actions is definitely broken in D6 (and D7). But it's not the fault of this function, per se, it's the fact that it's being used incorrectly. See #407452: Remove use of drupal_html_to_text() from system mails for fixing that bug. Cheers.

WeaveGeek’s picture

I'm doing non-coding administration through the Actions UI. When I add only <br> tags (or <br/> tags) to the message body in the /admin/settings/actions/configure/[action-id] form, it doesn't result in whitespace in the resulting email. I'm still getting one-paragraph emails.

If I wrap the paragraphs in <p></p>, and put the <br/> tags inside the <p></p> tags, whitespace is preserved in the email as expected. Even the <br/> tags started working properly. It was strange, and took me a bit to figure out what was going on, so I thought I'd mention it and save the next guy a bit of leg work.

ggevalt’s picture

WeaveGeek,

I've been hunting around these forums for months trying to find a fix for this problem and, until now, I saw no easy solution. Thank you so, so much. Such a simple solution it is amazing.

God bless the Drupal World, but so often little problems become these big coding, patch, commit flurries that make little sense and, sometimes, don't even work. So it is a real pleasure to run into a tiny html solution to a tiny little problem that makes such a big difference.

cheerio

geoff

stefan81’s picture

hi, would love to see line breaks working too :)

jruz’s picture

thanks @WeaveGeek that solved the problem :)

Shai’s picture

@weavegeek, another big fan of you and the wonder you work in comment #25! Thanks for saving me so much time and frustration.