Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhodgdon’s picture

Status: Active » Postponed (maintainer needs more info)
Issue tags: +Needs backport to D6, +Needs backport to D7

Hmmm... The page you linked to says this is a "draft standard", so it doesn't look like it's adopted yet?

If it has been adopted, we should make the change, but not until then. Also, if we do make this change, the text saying "RFC 5322" should also be made into a link.

kingandy’s picture

I don't know, their page for 2822 ( http://tools.ietf.org/html/rfc2822) also says "draft standard", I think that's just how it gets released. It seems to have been pretty thoroughly adopted. If nothing else, current PHP's FILTER_VALIDATE_EMAIL seems to respect it (as reported in the comments, it rejects emails with a period at the end of the local part, which I don't believe is explicitly barred under 2822).

jhodgdon’s picture

Status: Postponed (maintainer needs more info) » Active

Ah, OK. Sounds like a good thing to fix then. Thanks!

Aron Novak’s picture

See php-5.3.14/ext/filter/logical_filters.c (where this functionality is implemented inside PHP) . It does not refer to any of the above mentioned RFCs, only 5321 and 2821 .
So in short: PHP's source code refers to the RFC of the SMTP protocol, the above linked Drupal function refers to the RFC of "Internet Message Format", just to know.
Also: "Empty e-mail addresses are allowed.", it seems to be an obsolete sentence, it's not the case (anymore?)

Aron Novak’s picture

Status: Active » Needs review
yurtboy’s picture

Got a warning when applying the patch

* 
Checking patch core/includes/common.inc...
Applied patch core/includes/common.inc cleanly.
warning: 1 line adds whitespace errors

I removed the "white space"

../1629994_valid_email_address_doc-4.patch:11: trailing whitespace.
 * 
jhodgdon’s picture

Status: Needs review » Needs work

That @link should be moved up to the previous line. It will all fit within 80 characters.

But... I am not convinced that the SMTP specification (http://tools.ietf.org/html/rfc5321) makes any kind of sense here.

Someone needs to track down the actual section of some RFC that specifies what a valid email address is... At first glance, I don't see it in RFC 5321? And php.net is not giving us much information about what it means by a "valid email address" either, at least in the documentation. Sigh.

Aron Novak’s picture

I looked up the source code of PHP, it seems to be kind of precise source, not? They refer to this RFC where they define the regexp for this email checker.
The 4.1.2., 4.1.3. , 4.5.3. and 2.3.5. sections of the SMTP specs talk verbosely about this topic.
In my opinion, the SMTP RFC specifies it more precisely.
That's what wikipedia says about it:
"The formal definitions are in RFC 5322 (sections 3.2.3 and 3.4.1) and RFC 5321"
Then should we link both? I don't think so.

Aron Novak’s picture

Status: Needs work » Needs review
jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Oh sorry, I missed your reference above to the PHP source code. Too bad they are not as good about precise documentation as we are trying to be (would have been nice to have the RFC link in their docs too). :)

In that case, this looks reasonable. Thanks!

Aron Novak’s picture

updated to ensure clean apply

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 8.x. Thanks!

David_Rothstein’s picture

Version: 8.x-dev » 7.x-dev
Status: Fixed » Patch (to be ported)
Aron Novak’s picture

Status: Patch (to be ported) » Reviewed & tested by the community
FileSize
460 bytes

here it is for the 7.x

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Thanks, committed to 7.x. Needs backport to 6.x.

Aron Novak’s picture

Status: Patch (to be ported) » Reviewed & tested by the community
FileSize
452 bytes
jhodgdon’s picture

It's not usually good practice to mark your own patch (even a port) as Reviewed and Tested by the Community. (Sometimes, an extra space or something creeps into a patch port, so it's always good to have someone review it, so we usually set the status to "needs review" when uploading a new patch.)

In this case, though, I concur that this patch is fine. Thanks!

kingandy’s picture

Is it worth mentioning here that drupal_mail() references the same RFC, or should I make a separate issue for that?

jhodgdon’s picture

Title: valid_email_address references obsolete RFC » mail functions reference obsolete RFC
Version: 6.x-dev » 8.x-dev
Status: Reviewed & tested by the community » Needs work

Oh gracious. Yes, we should really go back to 8.x. Can you grep through the code for that RFC number and make sure it's not somewhere else too?

oadaeh’s picture

Title: mail functions reference obsolete RFC » Mail functions reference obsolete RFC
Status: Needs work » Needs review
FileSize
3.56 KB

Umm, RFC 5321 is the wrong RFC to refer to. The title of the document clearly states what the RFC is for: RFC 5321 is for Simple Mail Transfer Protocol and RFC 5322 is for Internet Message Format.

Also, at the top of each RFC is an Abstract which gives more information about what the RFC is for:

From http://tools.ietf.org/html/rfc5321:

This document is a specification of the basic protocol for Internet
electronic mail transport. It consolidates, updates, and clarifies
several previous documents, making all or parts of most of them
obsolete. It covers the SMTP extension mechanisms and best practices
for the contemporary Internet, but does not provide details about
particular extensions. Although SMTP was designed as a mail
transport and delivery protocol, this specification also contains
information that is important to its use as a "mail submission"
protocol for "split-UA" (User Agent) mail reading systems and mobile
environments.

From http://tools.ietf.org/html/rfc5322:

This document specifies the Internet Message Format (IMF), a syntax
for text messages that are sent between computer users, within the
framework of "electronic mail" messages. This specification is a
revision of Request For Comments (RFC) 2822, which itself superseded
Request For Comments (RFC) 822, "Standard for the Format of ARPA
Internet Text Messages", updating it to reflect current practice and
incorporating incremental changes that were specified in other RFCs.

Further, the third paragraph of section 1.1 of RFC 5322 states this:

In the context of electronic mail, messages are viewed as having an
envelope and contents. The envelope contains whatever information is
needed to accomplish transmission and delivery. (See [RFC5321] for a
discussion of the envelope.) The contents comprise the object to be
delivered to the recipient. This specification applies only to the
format and some of the semantics of message contents. It contains no
specification of the information in the envelope.

It refers to RFC 5321 as describing the envelope (which is the mechinism for transport and delivery) and refers to RFC 5322 as describing the content.

Another clue is that RFC 5322 obsoletes RFC 2822, which is the RFC we're updating. RFC 5321 obsoletes RFC 2821, not RFC 2822.

Section 2.3.5 of RFC 5321 describes only Domain Names, not entire email addresses, in the context of routability.
Section 4.1.2 of RFC 5321 describes the Command Argument Syntax, which has nothing to do with the email address format.
Section 4.1.3 of RFC 5321 describes Address Literals, which again, is not about the email address, but the Internet address.
Section 4.5.3 of RFC 5321 describes the Sizes and Timeouts of the transmission of the message.
It's clear based on the headings alone that RFC 5321 has nothing to do with the definition of a correctly formatted email address.

While the code in the referenced PHP code (php-5.3.14/ext/filter/logical_filters.c) is deailing with the entire email address, the comment is describing and dealing only with the domain name part, and what it is referring to is whether a domain name is routable or not (see section 2.3.5 above).

Just because Wikipedia says it, doesn't mean it's right. (I'll leave the Internet searching to you.) Based on the RFCs themselves, there is at least one instance in the referenced Wikipedia document that is clearly incorrect.

These messages in Drupal are referring to the format of an email message, which is correctly and completely defined in section 3.4 of RFC 5322, and not in any section of RFC 5321.

The attached patch corrects all references I could locate (three) for both RFC 2822 and RFC 5321. The patch also corrects a spacing issue in the same comment in core/lib/Drupal/Core/Mail/MailInterface.php.

I attempted to use the git format-patch method of creating the patch (as referenced here: http://drupal.org/node/1146430). If that was wrong, let me know and I will recreate it the "normal" way.

kingandy’s picture

Judging by the discussion above (#4, #7, #8), RFC 5321 was chosen because this is the one the PHP source code declares it is using as the basis of its validation.

It's entirely possible the source comments are in error. Annoyingly, there doesn't seem to be any reference to an RFC in the online documentation...

oadaeh’s picture

@kingandy: I addressed that in my post:

While the code in the referenced PHP code (php-5.3.14/ext/filter/logical_filters.c) is deailing with the entire email address, the comment is describing and dealing only with the domain name part, and what it is referring to is whether a domain name is routable or not (see section 2.3.5 above).

Here is the referenced comment, in its entirty:

/*
* The regex below is based on a regex by Michael Rushton.
* However, it is not identical. I changed it to only consider routeable
* addresses as valid. Michael's regex considers a@b a valid address
* which conflicts with section 2.3.5 of RFC 5321 which states that:
*
* Only resolvable, fully-qualified domain names (FQDNs) are permitted
* when domain names are used in SMTP. In other words, names that can
* be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
* in Section 5) are permitted, as are CNAME RRs whose targets can be
* resolved, in turn, to MX or address RRs. Local nicknames or
* unqualified names MUST NOT be used.
*
* This regex does not handle comments and folding whitespace. While
* this is technically valid in an email address, these parts aren't
* actually part of the address itself.
*
* Michael's regex carries this copyright:
*
* Copyright © Michael Rushton 2009-10
* http://squiloople.com/
* Feel free to use and redistribute this code. But please keep this copyright notice.
*
*/

kingandy’s picture

Aha, that makes sense. Unfortunately that leaves us with no clear indication as to what standards PHP is actually using to validate email addresses, if any. The comment notes that "This regex does not handle comments and folding whitespace", which is "technically valid" in RFC5322 (the CFWS token is present in the angle-addr element defined in section 3.4), so it's not technically a 100% faithful implementation of RFC5322. It may be more appropriate to avoid mention of any RFC and just say we're using PHP's email address validation.

That said, I'm happy to reference RFC5322, as that's what I suggested in the first place...

oadaeh’s picture

Yes, I did notice the author appeared to be picking and choosing which standards to adhere to.

jhodgdon’s picture

Given the above, let's remove the reference to any RFC completely and just say it's a wrapper around PHP's filter_var() function that uses the FILTER_VALIDATE_EMAIL validation filter, and give a reference to
http://php.net/manual/filter.filters.validate.php

jhodgdon’s picture

Status: Needs review » Needs work

In which case, the last patch there needs reworking.

oadaeh’s picture

That PHP page doesn't actually have any details. This is all that is there:

FILTER_VALIDATE_EMAIL 	"validate_email" 	  	  	Validates value as e-mail.

Do we still want to link there? For the sake of not wasting more cycles, I went ahead and linked to it anyway, but I changed to wording a bit.

Do we want to reference the .c file where that validation is being done, even though we can't link to it?

oadaeh’s picture

Status: Needs work » Needs review

Status update.

jhodgdon’s picture

Status: Needs review » Needs work

I don't think we want to reference the .c file. If someone is really that concerned about the details, they can dig. So I think the wording in the patch in #27 is fine.

The only problem is that some of the lines are way over 80 characters. If an @link...@endlink pushes a line over 80 characters, move the @link...@endlink to its own line with nothing else on that line. Thanks!

oadaeh’s picture

Status: Needs work » Needs review
FileSize
3.4 KB

Sorry about the line length. I was couting characters after the link conversion, which of course, doesn't make sense.

So, I changed the wording slightly to pull the text out of the link and keep the lines <= 80 characters.

jhodgdon’s picture

If there is no link text, then you don't need @link at all.
http://drupal.org/node/1354#links

I also think if you are going to omit link text, then you should change the text to say something like:

The formatting of this string will be validated with the PHP e-mail validation filter. See http://php.net/manual/filter.filters.validate.php for more information.

But the previous version with link text is more concise (at least when it is on api.drupal.org it will be):

The formatting of this string will be validated with the PHP e-mail validation filter.

jhodgdon’s picture

Status: Needs review » Needs work
oadaeh’s picture

Status: Needs work » Needs review
FileSize
3.35 KB

I preferred the wording of the previous patch, but I felt like to comply, I needed to change it. I see now that I was confused, and that may because there are several examples within core that do not follow the document you referenced.

Hopefully, this one will work, which is more like #27, but with more lines.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Looks good, thanks!

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 8.x. Thanks.

jhodgdon’s picture

Version: 8.x-dev » 7.x-dev
Status: Fixed » Patch (to be ported)
oadaeh’s picture

Assigned: Unassigned » oadaeh
Status: Patch (to be ported) » Needs review
FileSize
4.82 KB

Here's my first (and hopefully only) stab at it.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Looks good, thanks!

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed to 7.x. Thanks!

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)
oadaeh’s picture

Status: Patch (to be ported) » Needs review
FileSize
2.95 KB

Drupal 6 doesn't use the PHP mail filter function, so I went back to changing 2822 to 5322. I only found it in includes/common.inc and includes/mail.inc.

jhodgdon’s picture

Status: Needs review » Needs work

Thanks! That sounds like the right thing to do for D6. One of your changes went to 5321 instead of 5322 though. Also, the list formatting updates do not bring the list section into compliance with our documentation standards:
http://drupal.org/node/1354#lists

oadaeh’s picture

Status: Needs work » Needs review
FileSize
3.15 KB

How about this? I'm not sure about the list within the list that doesn't really comply with Doxygen, so I made it part of the body, rather than the pseudo-list it was.

jhodgdon’s picture

Status: Needs review » Needs work

Much better formatting, thanks! This section should also be made into a sub-list:

+ *     Some examples are:
+ *     user@example.com
+ *     user@example.com, anotheruser@example.com
+ *     User <user@example.com>
+ *     User <user@example.com>, Another User <anotheruser@example.com>

Other than that, looks good... We could use a blank line between the @param and @return for extra credit. :)

oadaeh’s picture

That section was what I was referring to in my previous, cryptic comment.
Based on this part of the documentation, I though it best to not make them list items:

  • Each list item starts with the key, followed by a colon, followed by a space, followed by the key description. The key description starts with a capital letter and ends with a period.
  • If the list has no keys, start each list item with a capital letter and end with a period.

Following those guidelines would not work well for the email examples.

The attached patch adds the hyphens to both lists of example email addresses and the extra line between the @param and @return sections of all three functions where they are missing.

jhodgdon’s picture

Status: Needs work » Reviewed & tested by the community

Perfect, thanks! I'll get this committed soon, assuming the test bot doesn't choke on the patch.

jhodgdon’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 6.x. Thanks all!

Automatically closed -- issue fixed for 2 weeks with no activity.

oadaeh’s picture

Assigned: oadaeh » Unassigned
Issue summary: View changes