Disclaimer: no idea if this is the right place to report, it involves simplenews, mimemail and drupal 6 core

When I send a newsletter in HTML, the HTLM gets messy, the only work-around I have for the moment is patching drupal_wrap_mail, so it just returns the input

function drupal_wrap_mail($text, $indent = '') {
  // Convert CRLF into LF.
  $text = str_replace("\r", '', $text);
  return $text;
}

If I don't do this I get the following in my email body?

... <img <br />src="... 
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Sutharsan’s picture

Project: Simplenews » Drupal core
Version: 6.x-1.x-dev » 6.x-dev
Component: Code » base system
Status: Active » Postponed (maintainer needs more info)

It looks like drupal_wrap_mail() is called while it shouldn't. Please try to find out which module what causes this.

Sutharsan’s picture

Project: Drupal core » Simplenews
Version: 6.x-dev » 6.x-1.x-dev
Component: base system » Code

Oops.

attiks’s picture

drupal_wrap_mail() is called by drupal_html_to_text() wich is called by mimemail_html_body() when a text only alternative isn't specified. So is this still a simplenews issue or is this mimemail?

attiks’s picture

Project: Simplenews » Mime Mail
Status: Postponed (maintainer needs more info) » Active

according to http://drupal.org/node/320989 this is mimemail problem

michaeldhart’s picture

Confirming this problem...
Email sent by Simplenews to test email broke Only local images are allowed. tags, although not consistently. I tried varying code but couldn't figure out what caused it to go right or wrong. Finally when I sent the email to the mailing list, everyone received "n/a" as a message body.

zmove’s picture

I created an issue for this problem too, I didn't see your issue before posting :

see #336778: Mimemail add automatic <br /> That break my HTML

sgabe’s picture

Status: Active » Closed (fixed)

I am closing this issue since there is no activity over a year now. Probably the issue doesn't stand for the latest version. If it is, feel free to reopen.

LUTi’s picture

With the new 1.0-alpha2 and a patch from HTML emails are text-only in Hotmail - post #79 I am still getting those "\n " "extras", so I wouldn't just close it...

Applying the patch:

--- mimemail/mimemail.inc.OK1-372710_02-patched		2010-03-25 10:19:19.000000000 +0100
+++ mimemail/mimemail.inc				2010-03-25 10:51:26.000000000 +0100
@@ -89,8 +89,8 @@ function mimemail_extract_files($html) {
 
   $document = array(array(
     'Content-Type' => "text/html; charset=utf-8",
-    'Content-Transfer-Encoding' => '8bit',
-    'content' => $html,
+    'Content-Transfer-Encoding' => 'base64',
+    'content' => chunk_split(base64_encode($html)),
   ));
 
   $files = _mimemail_file();

seems to resolve the issue for me.

What are the other possible consequences, I don't know (everything seems to be OK for me in M$ Outlook Express...).

LUTi’s picture

Status: Closed (fixed) » Needs work
sgabe’s picture

Title: drupal_wrap_mail » HTML messages are broken by line breaks
Status: Needs work » Active

Changing the title to a descriptive one, and the status too, since we don't have a patch or anything that needs work.

LUTi’s picture

Probably this issue is related to the following:

Content-Transfer-Encoding: 8bit – up to 998 octets per line with CR and LF (codes 13 and 10 respectively) only allowed to appear as part of a CRLF line ending.

RFC-2045 - 2.8. 8bit Data: "8bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821]), but octets with decimal values greater than 127 may be used. As with "7bit data" CR and LF octets only occur as part of CRLF line separation sequences and no NULs are allowed.

So, it seems that if we are not lucky (that CR/LF would fall to some harmless location, where it doesn't break HTML code) we get those CR/LFs inserted. It is however not easy to notice those CR/LFs, as they are usually represented just as some blank space, sometimes even between words. I've noticed it since it was at such a position to break the link...

For me, a solution as above (#8) works, but maybe (since base64 was replaced by 8bit, certainly with some reason, to just go back is probably not the best solution...) it could be resolved in a better (more efficient in terms of e-mail message size) way. Probably with some function which would split HTML code to chunks smaller than those 998 bytes so that nothing would be broken (so, not in just the middle of some text). Therfore category was "needs work"...

LUTi’s picture

Related to my previous post,

I've created a function to check and split HTML code into chunks of max. length 248 characters (if needed). It is a very basic function, where we insert CR/LF after the last space character in the chunk which would be longer than 248 characters otherwise. Probably this function could be improved, but seems to do teh job for me also as it is. If HTML code can not be splitted, it is base64 encoded instead.

Please note that my patch provided here is an aggregate patch for 3 issues (this one - #321026: HTML messages are broken by line breaks, #634210: Images missing from repeated mail message elements and #372710: HTML emails are text-only in Hotmail). To resolve just this issue, remove everything between the lines 153/164 and 593/603 from the patch provided first, or manually patch mimemail.inc with just the code:
1. between @@ -87,11 +87,22 @@ and -153,6 +164,7 @@
and
2. after @@ -593,3 +603,62 @@

Another question is if it makes sense to transfer HTML as 8bit. In terms of performance (efficiency), I've discovered my mails are between 1 and 2 kB smaller if 8bit encoded 8comparing to base64), so the benefit is probably not worth the risk that something will go wrong during splitting / transfer. If there are issues with some clients (to display the content right), it probably makes sense to try...

In any case, I think this issue should be discussed a bit more.

sgabe’s picture

@LUTi: Please provide a separate patch just for this issue. Aggregated patches are much harder to review if they contain irrelevant changes according to the issue in question. We shall not lose focus.

LUTi’s picture

@sgabe,

as I've written, it is very simple - just remove a couple of lines from the patch provided (or, manually copy 2 sections from the patch into your version...).

The problem is I've patched for other issues first, and this one at the end - so, the line numbers in patch wouldn't match any of published versions in any case...).

sgabe’s picture

Version: 6.x-1.x-dev » 6.x-1.0-alpha2
Status: Active » Needs review
FileSize
3.47 KB

@LUTi: I know it's simple (for me or you), but IMHO we shall provide patches which include just the necessary changes for just the issue in question. However, to create patches you should use the project root from CVS, otherwise there will be problems.

I am attaching a rerolled patch of #12. It applies to 6.x-1.0-alpha2 so I'm changing the version.

Sutharsan’s picture

6.x-1.x-dev is _usually_ the correct version to make patches for. But Mime Mail's alpha releases are not made on the DRUPAL-6--1 branch as usual, but made on Trunk. So patches should be made against HEAD (which is correct in #15).

sgabe’s picture

@Sutharsan: I didn't think of the version, but this...

--- mimemail/mimemail.inc.ORIG 2010-03-24 22:36:41.000000000 +0100

I think this is wrong.

LUTi’s picture

That's why I didn't bother to undo all other patches (I need, and were already applied at my site...) first just to get some "clean" version, which probably in any case wouldn't be the right one... ;-)

In any case, my simple solution (which in any case should probably be improved / optimized) should be very easy to implement on any version, as it is just an added function (at the end of file) and a few extra lines where we try to use 8bit instead of base64 encoding, but just if we can be 100% sure it will go through SMTP transfer safely. An improved test function could however reduce the number of "false positives" (making 8bit to be chosen in more cases / theoretically everywhere it cold actually be used - but in this case we should count octets, not just utf-8 characters...).

But, as said already - I am hardly waiting us to get some at least beta or RC version soon, and it looks quite promissing, thanks to a great job sgabe is doing. After that, I expect the number of currently opened / pending issues will be significantly reduced, and probably it will also be easier to follow the same version.

irozner’s picture

Having the same problem with Hebrew characters. I applied the above patch, with no results.
Still getting line breaks after each word.
Any ideas how to overcome this?

LUTi’s picture

After each word?!

This seems to be another issue, according to #11...

Renee S’s picture

This patch worked great for me.

Dave Cohen’s picture

Status: Needs review » Needs work

I have a problem, just tried this patch and the patch does not fix this.

My mail includes an image, which unfortunately has a space in the filename. Drupal displays the image fine in nodes. But when sending an email, a newline gets inserted in the filename. So the markup of the email looks like this.

<p><span class="inline inline-left"><img  
src="http://teaserstallion.com/work/sunflower/htdocs/sites/localwork/files/images/2002  
003_5.jpg" alt="2002 003.jpg" title="2002 003.jpg"  class="image  
image-preview " width="185" height="154" /><span class="caption"  
style="width: 183px;"><strong>2002 003.jpg</strong></span></span></p>

Note how multiple newlines are inserted into the <img> tag, but the one that causes the problem is right in the middle of the filename, which is 2002 003_5.jpg.

Is there some way to tell mimemail don't introduce any newlines at all, ever, into markup? I don't understand why it wants to do this.

sgabe’s picture

The best solution to that problem is to use filenames without spaces. Believe me, spaces in filenames will cause you nothing but problems.

Dave Cohen’s picture

I'll pass that on to the 70,000 users of the site. :)

Only a handful of those have permission to write a newsletter, but it's hard to explain to them why some images will work and others won't.

sgabe’s picture

There are some methods to handle this problem, see the FileField Paths module.

Deciphered’s picture

@sgabe,

While you are technically correct that FileField Paths could help Dave Cohen with his particular issue, and it's best practice not to have spaces in URLs, it should be a requirement to fix this issue. There should be a check to make sure the break point is outside of a HTML tag. And I say that as the developer of FFP and a user of Mime Mail with this same issue.

LUTi’s picture

@Dave Cohen,

if you would read the whole thread (in particular my posts #11 & #12), you should probably understand, why linebreaks... ;-)

In short, you don't have to, but there is quite a chance (probably bigger than the probability that your users wouldn't understand/respect the requiement not to have spaces in filenames or paths...) that something will be broken in received mails, as they would be introduced on the way...

If you don't need a solution for broken mails in Hotmail, you can simply resolve your issue using a much more simple solution from my post #8 (more details there).

@Deciphered,

maybe a requirement not to have linebreaks within HTML tags is too restrictive (more chances that everything within such a tag is too long). I would say it should be enough to ensure that anything between single or double quotes (and only when within a HTML tag) is not broken (by linebreaks).

So, we should adapt the function chunk_split_for_8bit_encoding_xfer($string) so that we keep an information about opened tags and a number of quotes (if the count of both up to a potential breakpoint is dividable by 2 should be OK, otherwise we have to go back for one problematic quote; test this only when within an opened tag...) whenever we check if we can insert a linebreak.

But, I'm quite short in time at the moment, so I can not do that very soon. Can any of you guys take care about such a patch?

Dave Cohen’s picture

Thanks for the helpful references. I've found another issue, #348327: Soft-wrapping in drupal_wrap_mail() breaks URLs with white-spaces, which I think more accurately describes the bug I'm experiencing. I've tried to understand this thread, but to be honest I don't fully.

And I want to point out, I don't consider FileField Paths a solution. The way I see it... If Drupal email cannot include any valid URL, Drupal is broken. And drupal email cannot include valid URLs that happen to have spaces in them. What if I want an email that links out to some site that I have no control over?

aaron’s picture

it might be possible to s/ /+

LUTi’s picture

Version: 6.x-1.0-alpha2 » 6.x-1.0-alpha7
Category: bug » feature
FileSize
2.89 KB

Obviously, there are some changes in alpha7, which seems to more or less correct this issue. However, chunks of more than 991 bytes (without certain separators; I've tested it with a string of 1040 continuous x-es...) seems still to result in a broken line.

To be on a safe side in such cases, I've introduced a test (the patch attached). If message body is utf-8 transfer encoding safe (no "continuous" chunks over 990 bytes), we transfer that way. If not, I prefer to use base64 transfer encoding.

This ensures that such lines are displayed properly (tested with Outlook Express). However, I've found out that this still doesn't prevent potential issues 100%, as lines are still broken at certain characters (I've found '!' and '?', but there are probably some others as well...). I find a questionmark ('?') particularly problematic, as it can be a part of the URL - meaning such link can be broken in mail. But, it is another issue (body text processing somewhere else...).

If my proposal (transfer of messages with "problematic" long chunks of characters without "safe" separators in base64 instead of utf-8 encoding) would be accapted, body text processing should be changed to prevent lines being broken at questionmarks.

If there are any reasons that base64 transfer encoding should be absolutely avoided in certain cases, we may also introduce an admin option - to choose among optional base64 transfer encoding (and longer chunks possible) and strict utf-8 transfer encoding (and a possibility that too long chunks are broken).

m.stenta’s picture

I was experiencing something similar to this, where links in my emails were coming out like:

href=[url]">[link]

I traced it to the "Line break converter" setting in the Input Filters. I was able to fix mine by adding an Input Filter called MIME Mail, and making sure that the "Line break converter" filter was unchecked. Then assign that Input Filter to MIME Mail messages in the MIME Mail settings and voila: no more broken links!

Hope that helps!

LUTi’s picture

I am not using a "line break converter" filter for any of my posts (I have customized input format as I prefer to have a full control about some things...), so I guess your issue is something totally different.

Thanky anyway. ;-)

pillarsdotnet’s picture

Once the following issue gets resolved, I expect this problem will also go away:

#299138: Improve \Drupal\Core\Utility\Mail::htmlToText()

pillarsdotnet’s picture

baisong’s picture

Had a similar issue, #31's workaround solved my problem:

1) made a new input format
2) removed the linebreak & html filters
3) set mimemail's input to the new one
4) ...now my links don't all get linebreaks in them!

reallyordinary’s picture

I'm having the same kind of problem. I'm using 1.0-beta2. Pressflow 6.22.

Lines break erratically, regardless of what input filter I use. I tried setting up a new filter with no line break convert or html stuff on it... no dice. Lines are still breaking randomly.

Also any links I try to put on text within the email get pushed to the bottom of the message. So if I try to put a link on a sentence, it comes out looking like:

This is the text that's supposed to be linked [1].

And then at the bottom of the message there's a [1] with the actual link next to it.

Really frustrating.

pillarsdotnet’s picture

Also any links I try to put on text within the email get pushed to the bottom of the message. So if I try to put a link on a sentence, it comes out looking like:

This is the text that's supposed to be linked [1].

And then at the bottom of the message there's a [1] with the actual link next to it.

This is a deliberate design feature of the drupal_html_to_text() function. If you find it that offensive, you should open up a bug report / feature request against Drupal core.

GaëlG’s picture

The workaround #35 worked for me.

youri27’s picture

Subscribing.

sgabe’s picture

sgabe’s picture

Version: 6.x-1.0-alpha7 » 7.x-1.x-dev
Category: feature » bug
Priority: Normal » Major

Changing issue properties.

sgabe’s picture

Status: Needs work » Needs review
FileSize
1.02 KB

Patch attached.

Dave Cohen’s picture

Why not base64 encode all the time? Why complicate things with a loop and test?

sgabe’s picture

I think we should use the lowest encoding possible. Base64 encoded messages will be 30% (or even more bigger for short messages) even if it would be unnecessary.

Dave Cohen’s picture

Makes sense. Thanks.

rp7’s picture

Patch in #42 works for me. Thanks!

sgabe’s picture

Status: Needs review » Fixed

Committed to both branches.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

plach’s picture

For people having base64 encoding issues since this has been committed, here is a possible fix: #1741082-16: 8bit encoding for base64 encoded message body.