Note (when closing issue):

The below issue report is invalidated since the quoted restriction has been removed from the php mail() specification since november 2009 or earlier; see #84883-85: Unicode::mimeHeaderEncode() doesn't correctly follow RFC 2047. Therefore this issue seems outdated and is closed. The last report (for one particular FreeBSD hoster) was in august 2013. If anyone still sees problems, this issue can be reopened with specific details about PHP version, mailer, OS.

Original report

When using drupal_mail_send() with unicode text, a mime_header_encode() function splits text to chunks that are divided by \n. However, php mail() documentation says:

$subject - Subject of the email to be sent.
Caution: This must not contain any newline characters, or the mail may not be sent properly.

This is what happens to me: I am sending messages with non-ASCII characters that get split to several chunks (are longer that 47 characters). mime_header_encode() then joins chunks with \n, and the mail() function returns true, although the mail doesn't get sent on my host (their phpinfo: http://phpinfo.webdum.com/ ).

      $output .= ' =?UTF-8?B?'. base64_encode($chunk) ."?=\n";

When I replace

    return mail(
      $message['to'],
      mime_header_encode($message['subject']),
      str_replace("\r", '', $message['body']),
      join("\n", $mimeheaders)
    );

with

    return mail(
      $message['to'],
      str_replace("\n", ' ', mime_header_encode($message['subject'])), //\n deleted here
      str_replace("\r", '', $message['body']),
      join("\n", $mimeheaders)
    );

mails get sent.

However, for a really long subjects this would violate maximal line length in email protocol, so we should limit the length as well (RFC2822 says that subject field can be split with CRLF to several lines, but apparently PHP doesn't support it, so the only other option is trimming - and logging it so that admin knows about it :( )

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

keff’s picture

PS: I marked it as critical because in my case, no one could even create account on a server - the confirmation mail didn't get sent and there was no error in watchdog. This can be *very* confusing to a lot of users, as I assume that almost all non-english languages use some national characters, and most subjects that include user name and site name will be over 47 characters.

Anonymous’s picture

Another option for the subject would be to trim and then put the original in the body of the email.

Damien Tournoud’s picture

I don't believe there was never a maximum line length for mail headers.

RFC822 (the very old one, published in 82) says:

For readability, the field-body portion of long header fields may be "folded" onto multiple lines of the actual field.

Which means nothing at all these days because we don't have 72 characters terminals anymore.

Please just drop the trimming in mime_header_encode(). It's buggy, badly programmed (have you seen the floor((75 - strlen("=?UTF-8?B??=")) * 0.75)? It looks like an heuristic...), and unnecessarely clutters the code.

Patch attached (yeah, less buggy code!).

Damien Tournoud’s picture

Version: 6.4 » 7.x-dev
Status: Active » Needs review

Forgot to bump to 7.x.

Steven’s picture

Please just drop the trimming in mime_header_encode(). It's buggy, badly programmed (have you seen the floor((75 - strlen("=?UTF-8?B??=")) * 0.75)? It looks like an heuristic...), and unnecessarely clutters the code.

This 'buggy, badly programmed code' calculates how many UTF-8 bytes we can fit in an encoded word of 75 bytes. The * 0.75 is to account for the fixed 8-to-6 bits conversion in base64. The prefix/suffix is also taken into account. Hardly a random heuristic: just simply fixed ratios.

The 75 limit comes straight from RFC 2047:

An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters. If it is
desirable to encode more text than will fit in an 'encoded-word' of
75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
be used.

http://www.ietf.org/rfc/rfc2047.txt

Damien Tournoud’s picture

Status: Needs review » Needs work

Thanks for the explanation, Steven. That computation could have been documented better :)

So RFC 2047 imposes to separate encoded words with CRLF SPACE, and PHP don't support subject lines with LF in some setup.

Trimming the subject line to 47 characters (!!) is obviously not a solution, and we can't forcibly put the title line at the beginning of the body.

Could we send emails in 8bit, and skip the encoding completely?

alexanderpas’s picture

Steven’s picture

The function's doxygen already points to RFC 2047. Yet nobody in this issue bothered to actually go and read it. Great.

alexanderpas’s picture

scottrigby’s picture

Could applying the patch in #3 cause any other problems? Testing so far seems to solve our issue - but this one is slightly outside my level of knowledge, so not sure if my testing is helpful -- or if I'm missing something important. Thx in advance for clarifying & will try to help test.

Anatk’s picture

Slightly outside my level of knowledge, too, but the patch worked for me.
Had a problem with the encoding of the subject in Hebrew. Now it works fine.

catch’s picture

Priority: Critical » Normal

Downgrading, this is just a normal bug.

stewart.adam’s picture

I just encountered this issue on a FreeBSD server (8.3-RELEASE-p8) with both PHP 5.2.17 and 5.3.19. Was very difficult to track down, as the very same code works fine on a Linux-based server with PHP 5.2.10 or 5.3.25.

Just as described in the issue though, it seems that the FreeBSD server's PHP doesn't like sending mail with any \n characters in the subject, even if those are to separate the encoded chunks. I have applied the patch in #84883: Unicode::mimeHeaderEncode() doesn't correctly follow RFC 2047 (#106 for D6) to no avail. I have resorted instead to setting the chunk size to 46 and using ' ' (a single space) as the separator, and now it seems to be working.

stewart.adam’s picture

After troubleshooting this with my host, it appears that this may be specific to HSphere and not FreeBSD as I originally thought.

roderik’s picture

@stewart.adam / #13: it's kinda strange that using a single (rather than multiple) spaces would change things. The definition in RFC 2047 (section 2) excludes spaces from the 'encoded-word' (which has a max length of 75).

Just noting this as part of #84883: Unicode::mimeHeaderEncode() doesn't correctly follow RFC 2047 for full info. (It's slightly scary that the current patch in there removes the comments about \n not always working, but there may still be systems out there that don't actually work with \n...)

roderik’s picture

Issue summary: View changes
Status: Needs work » Closed (cannot reproduce)
roderik’s picture

#5 / #6:

Thanks for the explanation, Steven. That computation could have been documented better :)

Mmmyeah, especially because it's buggy :p
It should be
floor((75 - strlen("=?UTF-8?B??=")) / 4) * 3 == 45 instead of
floor((75 - strlen("=?UTF-8?B??=")) * 0.75) == 47.

Fixing in #84883.

KevinVanRansbeeck’s picture

Re-rolled patch for Drupal 7.56