drupal_mail_send with long UTF-8 subject puts \n to subject line, the mail() function won't send it then. [#300387]

Note (when closing issue):

The below issue report is invalidated since the quoted restriction has been removed from the php mail() specification since november 2009 or earlier; see #84883-85: Unicode::mimeHeaderEncode() doesn't correctly follow RFC 2047. Therefore this issue seems outdated and is closed. The last report (for one particular FreeBSD hoster) was in august 2013. If anyone still sees problems, this issue can be reopened with specific details about PHP version, mailer, OS.

Original report

When using drupal_mail_send() with unicode text, a mime_header_encode() function splits text to chunks that are divided by \n. However, php mail() documentation says:

$subject - Subject of the email to be sent.
Caution: This must not contain any newline characters, or the mail may not be sent properly.

This is what happens to me: I am sending messages with non-ASCII characters that get split to several chunks (are longer that 47 characters). mime_header_encode() then joins chunks with \n, and the mail() function returns true, although the mail doesn't get sent on my host (their phpinfo: http://phpinfo.webdum.com/ ).

      $output .= ' =?UTF-8?B?'. base64_encode($chunk) ."?=\n";

When I replace

    return mail(
      $message['to'],
      mime_header_encode($message['subject']),
      str_replace("\r", '', $message['body']),
      join("\n", $mimeheaders)
    );

with

    return mail(
      $message['to'],
      str_replace("\n", ' ', mime_header_encode($message['subject'])), //\n deleted here
      str_replace("\r", '', $message['body']),
      join("\n", $mimeheaders)
    );

mails get sent.

However, for a really long subjects this would violate maximal line length in email protocol, so we should limit the length as well (RFC2822 says that subject field can be split with CRLF to several lines, but apparently PHP doesn't support it, so the only other option is trimming - and logging it so that admin knows about it :( )

Comment	File	Size	Author
#18	300387-18-mime-encode-clutter.patch	810 bytes	KevinVanRansbeeck
#18
#3	300387-mime-encode-clutter.patch	940 bytes	Damien Tournoud
#3

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

keff CreditAttribution: keff commented 27 August 2008 at 00:49

PS: I marked it as critical because in my case, no one could even create account on a server - the confirmation mail didn't get sent and there was no error in watchdog. This can be *very* confusing to a lot of users, as I assume that almost all non-english languages use some national characters, and most subjects that include user name and site name will be over 47 characters.

Comment #2

Anonymous (not verified) CreditAttribution: Anonymous commented 29 August 2008 at 22:45

Another option for the subject would be to trim and then put the original in the body of the email.

Comment #3

Damien Tournoud CreditAttribution: Damien Tournoud commented 29 August 2008 at 23:11

File	Size
300387-mime-encode-clutter.patch	940 bytes

I don't believe there was never a maximum line length for mail headers.

RFC822 (the very old one, published in 82) says:

For readability, the field-body portion of long header fields may be "folded" onto multiple lines of the actual field.

Which means nothing at all these days because we don't have 72 characters terminals anymore.

Please just drop the trimming in mime_header_encode(). It's buggy, badly programmed (have you seen the floor((75 - strlen("=?UTF-8?B??=")) * 0.75)? It looks like an heuristic...), and unnecessarely clutters the code.

Patch attached (yeah, less buggy code!).

Comment #4

Damien Tournoud CreditAttribution: Damien Tournoud commented 29 August 2008 at 23:12

Version:	6.4	» 7.x-dev
Status:	Active	» Needs review

Forgot to bump to 7.x.

Comment #5

Steven CreditAttribution: Steven commented 30 August 2008 at 08:04

Please just drop the trimming in mime_header_encode(). It's buggy, badly programmed (have you seen the floor((75 - strlen("=?UTF-8?B??=")) * 0.75)? It looks like an heuristic...), and unnecessarely clutters the code.

This 'buggy, badly programmed code' calculates how many UTF-8 bytes we can fit in an encoded word of 75 bytes. The * 0.75 is to account for the fixed 8-to-6 bits conversion in base64. The prefix/suffix is also taken into account. Hardly a random heuristic: just simply fixed ratios.

The 75 limit comes straight from RFC 2047:

An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters. If it is
desirable to encode more text than will fit in an 'encoded-word' of
75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
be used.

http://www.ietf.org/rfc/rfc2047.txt

Comment #6

Damien Tournoud CreditAttribution: Damien Tournoud commented 30 August 2008 at 08:28

Status:

Needs review

» Needs work

Thanks for the explanation, Steven. That computation could have been documented better :)

So RFC 2047 imposes to separate encoded words with CRLF SPACE, and PHP don't support subject lines with LF in some setup.

Trimming the subject line to 47 characters (!!) is obviously not a solution, and we can't forcibly put the title line at the beginning of the body.

Could we send emails in 8bit, and skip the encoding completely?

Comment #7

alexanderpas CreditAttribution: alexanderpas commented 30 August 2008 at 16:50

reported at http://bugs.php.net/bug.php?id=45955

Comment #8

Steven CreditAttribution: Steven commented 30 August 2008 at 23:12

The function's doxygen already points to RFC 2047. Yet nobody in this issue bothered to actually go and read it. Great.

Comment #9

alexanderpas CreditAttribution: alexanderpas commented 31 August 2008 at 00:39

not as far as i can see... http://api.drupal.org/api/function/drupal_mail_send/7

Comment #10

scottrigby

he/him

English

Brooklyn, NY

CreditAttribution: scottrigby commented 10 April 2009 at 16:38

Could applying the patch in #3 cause any other problems? Testing so far seems to solve our issue - but this one is slightly outside my level of knowledge, so not sure if my testing is helpful -- or if I'm missing something important. Thx in advance for clarifying & will try to help test.

Comment #11

Anatk CreditAttribution: Anatk commented 27 August 2009 at 07:57

Slightly outside my level of knowledge, too, but the patch worked for me.
Had a problem with the encoding of the subject in Hebrew. Now it works fine.

Comment #12

catch

he/him

English

CreditAttribution: catch commented 24 January 2010 at 00:01

Priority:

Critical

» Normal

Downgrading, this is just a normal bug.

Comment #13

stewart.adam CreditAttribution: stewart.adam commented 2 August 2013 at 16:57

I just encountered this issue on a FreeBSD server (8.3-RELEASE-p8) with both PHP 5.2.17 and 5.3.19. Was very difficult to track down, as the very same code works fine on a Linux-based server with PHP 5.2.10 or 5.3.25.

Just as described in the issue though, it seems that the FreeBSD server's PHP doesn't like sending mail with any \n characters in the subject, even if those are to separate the encoded chunks. I have applied the patch in #84883: Unicode::mimeHeaderEncode() doesn't correctly follow RFC 2047 (#106 for D6) to no avail. I have resorted instead to setting the chunk size to 46 and using ' ' (a single space) as the separator, and now it seems to be working.

Comment #14

stewart.adam CreditAttribution: stewart.adam commented 2 August 2013 at 16:57

After troubleshooting this with my host, it appears that this may be specific to HSphere and not FreeBSD as I originally thought.

Comment #15

roderik

Dutch

Amsterdam,NL / Budapest,HU

CreditAttribution: roderik as a volunteer commented 8 August 2015 at 09:14

@stewart.adam / #13: it's kinda strange that using a single (rather than multiple) spaces would change things. The definition in RFC 2047 (section 2) excludes spaces from the 'encoded-word' (which has a max length of 75).

Just noting this as part of #84883: Unicode::mimeHeaderEncode() doesn't correctly follow RFC 2047 for full info. (It's slightly scary that the current patch in there removes the comments about \n not always working, but there may still be systems out there that don't actually work with \n...)

Comment #16

roderik

Dutch

Amsterdam,NL / Budapest,HU

CreditAttribution: roderik as a volunteer commented 28 November 2015 at 15:07

Issue summary:	View changes
Status:	Needs work	» Closed (cannot reproduce)

Comment #17

roderik

Dutch

Amsterdam,NL / Budapest,HU

CreditAttribution: roderik as a volunteer commented 28 November 2015 at 15:41

#5 / #6:

Thanks for the explanation, Steven. That computation could have been documented better :)

Mmmyeah, especially because it's buggy :p
It should be
floor((75 - strlen("=?UTF-8?B??=")) / 4) * 3 == 45 instead of
floor((75 - strlen("=?UTF-8?B??=")) * 0.75) == 47.

Fixing in #84883.