drupal_html_to_text() removes line endings

fago - August 22, 2008 - 16:39
Project:Drupal
Version:7.x-dev
Component:system.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:by design
Description

During the development of the "send a mail" actions of the rules module I noted that drupal_html_to_text() removes the line endings of the mail text.

It seems that function is only used by drupal's system mail action - so I was able to reproduce the problem by sending mail with that action too.

I noted that the

<?php
preg_replace
('/\s+/', ' ', $text)
?>
call is causing the problem, as \s seems to match the line endings too.
Possible fixes are:
* remove this replacement of whitespaces as indenting with whitespaces might be useful when no html is available.
* just replace multiple real spaces ' ' by one

A patch for both fixes is attached for HEAD. Both apply to drupal 6 too.

AttachmentSize
don't replace whitespaces603 bytes
just replace real spaces630 bytes

#1

fago - August 22, 2008 - 16:40
Title:block user action creates a garbled log message » drupal_html_to_text() removes line endings

arg, fixing title.

#2

Dries - August 23, 2008 - 07:32

I think you're right about this, fago. I would recommend that we write some simpletests for this.

I think I've seen this problem at Acquia. I've pointed Acquia's engineers to this issue; maybe they can help test this fix.

#3

Dave Reid - September 11, 2008 - 21:24

One thing to note...if you call drupal_html_to_text before calling drupal_mail, it will strip line endings. If you just call drupal_mail (which has a call to drupal_html_to_text in the function), it won't strip them. Found this in #299548: Line breaks not showing up in email. Basically there are problems calling this function twice, just like from #212126: drupal_html_to_text() can only run once.

#4

Dave Reid - September 11, 2008 - 22:29

Ignore my previous post. I was wrong about drupal_mail.

This is very much a bug in drupal_html_to_text. I did a little investigative work into all the hook_mail calls to see which ones ran the mail body through the drupal_html_to_text function:
- contact_mail: Does not call drupal_html_to_text (Contact form e-mail I get are always formatted correctly)
- system_mail: Does call drupal_html_to_text (Email sent with the action 'send email' does not format correctly)
- update_mail: Does not call drupal_html_to_text
- user_mail: Does not call drupal_html_to_text (User registration emails are formatted correctly)

So, anything that calls drupal_mail('system', ....) or drupal_html_to_text will not be formatted correctly. This includes:
system_send_email_action
Comment_notify module: #299548: Line breaks not showing up in email

This can be confirmed by adding an action 'Send email' (admin/settings/actions) and typing a body that has multiple line breaks in it. Assign that action a trigger (admin/build/triggers), and make sure it fires. The e-mail is run all together with no line breaks, except for where drupal_wrap_mail has forced it to wrap. Also I'm wondering why system_mail the only one calling drupal_html_to_text? Should the other mail_hooks be calling it as well but they aren't currently because of this bug?

#5

frankcarey - September 29, 2008 - 15:05

So, wouldn't it just make sense to bring system_mail in line with the rest of the mail functions? I'm wondering if this was an oversight, or if there was a reason for using drupal_html_to_text() in this context.

#6

frankcarey - September 29, 2008 - 16:04
Version:7.x-dev» 6.x-dev

I'd like to have this fixed in 6.x first so I'm temporarily changing the version an uploading a patch that removes drupal_html_to_text.

Looks like it is unnecessary, at least in the send mail cation context as my email went through with no html.

Looking at the sister function user_mail and it's helper _user_mail_text... the only text conversion happens here:
>> strtr($admin_setting, $variables);
with no html_to_text conversion.

Should be the same patch for 7? I'll have to check it out , make changes and I'll post the patch in a few min.

AttachmentSize
remove_drupal_html_to_text.patch 740 bytes

#7

fago - September 29, 2008 - 15:56

hm, I don't think it's clever to *not* use drupal_html_to_text() any more instead of fixing drupal_html_to_text().

Yes, right the system action is the only mail implementation in core that uses it. But this doesn't change the problem.

#8

frankcarey - September 29, 2008 - 16:17

Seems to me that these mail functions should just output the full body (As most of them do).

Then a hook_mail_alter(); function should modify the email as appropriate, doing this work in in the mail function completely limits the ability to use html, even if you were to use mime mail or the like (assumed not tested).

Brining this one function in line with the other functions seems the important first step, and then the writing of an appropriate _mail_alter would be the second, though not immediately necessary.

Thoughts?

#9

frankcarey - September 29, 2008 - 16:34
Version:6.x-dev» 7.x-dev

drupal 7 version of patch

AttachmentSize
remove_http_to_text_d7.patch 726 bytes

#10

frankcarey - September 29, 2008 - 17:06

fago,

I want to make clear that I still think html_to_text() needs to be changed, and it looks like you've made the appropriate patches above. I just think it is a two part problem:

1) drupal_html_to_text() is over aggressive by taking out line breaks.

2)system_mail breaks from the others by using drupal_html_to_text() in the first place and it should be removed, with the possibility of having plaintext vs. html emails handled in a more flexible sytematic way.

#11

fago - September 29, 2008 - 19:31

point 2) is completely unrelated to this issue, point 1), so please open a separate issue for that. thanks.

#12

Heine - September 30, 2008 - 08:04
Status:needs review» needs work

We need to step back a little and consider the role of drupal_html_to_text. As the name implies, it serves to convert HTML to an acceptable plaintext representation.

Relevant from http://www.w3.org/TR/REC-html40/struct/text.html#whitespace:

For all HTML elements except PRE, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). When formatting text, user agents should identify these words and lay them out according to the conventions of the particular written language (script) and target medium.
[...]
This layout may involve putting space between words (called inter-word space), but conventions for inter-word space vary from script to script. For example, in Latin scripts, inter-word space is typically rendered as an ASCII space (&#x0020;),
[...]
In particular, user agents should collapse input white space sequences when producing output inter-word space.

In light of the above, it makes perfect sense for drupal_html_to_text to collapse whitespace.

The issue here stems from confusion surrounding the type of string passed to drupal_html_to_text. One should simply not pass plaintext to the function.

To add to this confusion, many think $node->body is HTML, whereas in reality it is Rich Text (RT) and needs to be converted to HTML by calling check_markup with the specified format (in $node->format). This HTML can then be converted to plaintext by drupal_html_to_text.

The question that needs to be answered to resolve this properly is then: what type of string is the message?

#13

Dave Reid - September 30, 2008 - 08:19

For most calls to drupal_mail I can think of, it's an admin-entered textarea string with line breaks only. If a function is e-mailing HTML content, it should be up to that code to make sure to use drupal_html_to_text, but the system_mail function should not automatically call drupal_html_to_text.

It seems to me the drupal_html_to_text call should actually be in the following part of system_mail and not automatically applied to the subject:

  if (isset($params['node'])) {
    $node = $params['node'];
    $variables += array(
      '%uid' => $node->uid,
      '%node_url' => url('node/'. $node->nid, array('absolute' => TRUE)),
      '%node_type' => node_get_types('name', $node),
      '%title' => $node->title,
      '%teaser' => $node->teaser,
      '%body' => $node->body,
    );
  }

#14

fago - September 30, 2008 - 09:54

hm, I agree. If HTML is passed to the function, it doesn't matter whether line endings are removed. So it's just wrongly used by the action.
However it might still be useful if it doesn't strip line endings - as so one could also pass plain text or mixed strings (plain text enriched with some html tags) through it.

#15

dww - January 24, 2009 - 01:44

I ran into the same problem over at #356968-5: Line feeds stripped from confirmation email. It does seem like Heine's hit the fundamental point here -- are the email templates considered plain text to be sent as-is, or RT to be converted to HTML as appropriate, and then back to ASCII via drupal_html_to_text()? If anyone's interested, I've got examples over at #356968: Line feeds stripped from confirmation email demonstrating the pros and cons of each approach in terms of how things behave in my contrib module given the current core API. Ultimately, we need to be clear about each email template setting, and is it plain or rich text...

#16

nadavoid - February 14, 2009 - 01:01

I would like to see the patch attached to #6 get applied to drupal core 6. Without it, email actions that are defined in the core actions module become very jumbled very quickly. Should I open a separate issue for that?

#17

nadavoid - February 14, 2009 - 01:24

I may have spoken too soon. I think I like using one of the originally attached patches: "just replace real spaces" or "don't replace whitespaces". This is because other modules trying to send plain-text emails will get their line-endings stripped, unless one of these patches is applied. I am still thinking mainly about Drupal 6, so let me know if I should open a separate issue about this.

#18

fuerst - March 19, 2009 - 16:05

As long as there is no fix for this problem in Drupal 6 you have to enter <br> tags as line break. Looks ugly in Action's "Edit Message" textarea but it is a work around.

Once the function drupal_html_to_text() got fixed your messages probably will have 2 line breaks where you expect only one. So you instantly will know this bug is fixed :)

#19

Heine - March 19, 2009 - 16:13

There's nothing to be fixed in drupal_html_to_text. You need to pass HTML, not plaintext.

#20

frankcarey - March 19, 2009 - 18:09

long overdue, but finally added an issue for #10 - 2 , to have the drupal_html_to_text removed from system_mail

http://drupal.org/node/407452

#21

Heine - March 19, 2009 - 20:12
Status:needs work» won't fix

Continued in #407452: Remove use of drupal_html_to_text() from system mails.

#22

dww - March 19, 2009 - 20:50
Status:won't fix» by design

I think "by design" is more appropriate here. There's no bug in drupal_html_to_text() itself -- it's doing what it's supposed to. The problems are places that call it without sending in HTML in the first place.

#23

liquidcms - March 20, 2009 - 05:56

subscribing.. and voicing my general concern.. i love Drupal!! but damn.. mail has to be about the worst handled thing in Drupal.. at Drupal 6 now and still not much improvement over Dr4 - every site i do i have to piece together a mail system to send something as simple as a text (but in html) email with a couple links in it.

oh how i long for the day when core will let me send smtp authenticated html email without having to add extra class files, 2 or 3 modules and then still hack a few things.. :)

was pretty sure that in Dr5 actions emails worked better than they do now.. scared to think our email solutions are going backwards..

sorry.. just frustrated.. sent about 50 test emails tonight trying couple different modules and still can't get simple email action to look correct..

try again tomorrow.

#24

dww - March 20, 2009 - 08:05

@liquidcms: I totally agree. Mail handling in Drupal sucks. Especially hook_mail() in D6 and beyond -- the developer experience is terrible. And core actions is definitely broken in D6 (and D7). But it's not the fault of this function, per se, it's the fact that it's being used incorrectly. See #407452: Remove use of drupal_html_to_text() from system mails for fixing that bug. Cheers.

 
 

Drupal is a registered trademark of Dries Buytaert.