valid_email_address uses a regex to detect if an email address is valid. In PHP 5.2 we can use

  filter_var('example@example.com', FILTER_VALIDATE_EMAIL);

to do the same thing. This is a built in function to PHP and should be faster.

Before I write a patch I wonder if we should leave valid_email_address and change the logic to use the filter_var function or replace each of the valid_email_address calls with the filter_var call. Thoughts?

For more details and example usage check out http://www.w3schools.com/PHP/filter_validate_email.asp.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mfer’s picture

Assigned: Unassigned » mfer
Status: Active » Needs review
FileSize
846 bytes

This patch replaces the guts of the valid_email_address function.

Dave Reid’s picture

Very neat use of the new PHP 5.2 requirement! Patch applies cleanly with a little offset. Testing is working great so far! Using this patch it actually caught the invalid e-mail of 'test@e' that core did not catch before. It even catches the two commented-out invalid e-mails from the contact.test that apparently needed to be commented out since they weren't caught by the old valid_email_address function!
$invalid_recipients = array('invalid', 'invalid@', /*'invalid@site', 'invalid@site.',*/ '@site.', '@site.com');

Maybe a couple more reviews and RTBC!

Dave Reid’s picture

I'd have to look and see if there was more we could replace with filter_var. One likely suspect is valid_url. Not sure how it would work with local drupal paths however.

function valid_url($url, $absolute = FALSE) {
  $allowed_characters = '[a-z0-9\/:_\-_\.\?\$,;~=#&%\+]';
  if ($absolute) {
    return (bool)filter_var($url, FILTER_VALIDATE_URL);
  }
  else {
    return (bool)preg_match("/^" . $allowed_characters . "+$/i", $url);
  }
}
mfer’s picture

Status: Needs review » Needs work

@Dave Reid thanks for pointing out the tests. I'll update for the offset and the tests and post a new patch.

I looked at valid_url as well. We will need something more like:

function valid_url($url, $absolute = FALSE) {
  $allowed_characters = '[a-z0-9\/:_\-_\.\?\$,;~=#&%\+]';
  if ($absolute) {
    return (bool)filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
  }
  else {
    return (bool)preg_match("/^" . $allowed_characters . "+$/i", $url);
  }
}

I didn't include this in the patch because I'm not entirely sure of the behavior of FILTER_VALIDATE_URL with it's different options. Doing some testing. Filters don't seem to be documented very well. I found some detail here - http://devolio.com/blog/archives/413-Data-Filtering-Using-PHPs-Filter-Fu...

mfer’s picture

Status: Needs work » Needs review
FileSize
1.86 KB

The attached patch has the same change as the first but the contact form test has been updated to as described in #2.

$invalid_recipients = array('invalid', 'invalid@', /*'invalid@site', 'invalid@site.',*/ '@site.', '@site.com');

was updated to

$invalid_recipients = array('invalid', 'invalid@', 'invalid@site', 'invalid@site.', '@site.', '@site.com');

The tests passed.

I separated out a change to valid_url into a separate issue because using filter_var would change the response. Details at http://drupal.org/node/308977.

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

Looks great since this will narrow down invalid e-mail addresses! Patch applied cleanly and ran the contact & user tests with 100% pass. I believe it is RTBC!

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Looks like an improvement to me, and the tests pass. Committed. Thanks.

drewish’s picture

Status: Fixed » Active

I think this is a bit of a step backwards. Previously you could use 'drewish@localhost' as a valid email this is not longer allowed.

Dave Reid’s picture

I really don't think that should be a valid e-mail. Suppose you send a user an e-mail through their personal contact form. When that user replies to your e-mail, it's going to send it to 'drewish@localhost' from their computer or web mail, which would be invalid. If someone wants to use localhost as an e-mail domain, they should be using somekind of externall- accessable domain like 'mylocalcomputer.dyndns.org'.

drewish’s picture

looks like there are problems with the function: #43402 FILTER_VALIDATE_EMAIL is not RFC2822 compliant

drewish’s picture

dave reid, the function is called valid_email_address(), and 'drewish@localhost' is a valid email address. you could make an argument that we should have a function valid_globally_email() or some such but as the function is currently named it's a step backwards that could invalidate existing user data.

mfer’s picture

From what I have read localhost fails RFC 2822 because localhost is an alias and not a fully qualified domain or IP. I'm digging into this but it still doesn't solve our problem here.

Do we roll back, go to a different regex that allows for aliased domain names, or do we stick to RFC 2822 on this?

drewish’s picture

Title: valid_email_address doesn't accept @localhost addresses » Replace valid_email_address with filter_var and FILTER_VALIDATE_EMAIL
Category: bug » task
Priority: Critical » Normal

mfer, i think the relevant part of RFC 2822 is page 17 (emphasis is mine):

addr-spec = local-part "@" domain

local-part = dot-atom / quoted-string / obs-local-part

domain = dot-atom / domain-literal / obs-domain

domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]

dcontent = dtext / quoted-pair

dtext = NO-WS-CTL / ; Non white space controls

%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including "[",
; "]", or "\"

The domain portion identifies the point to which the mail is
delivered. In the dot-atom form, this is interpreted as an Internet
domain name (either a host name or a mail exchanger name)
as
described in [STD3, STD13, STD14]. In the domain-literal form, the
domain is interpreted as the literal Internet address of the
particular host. In both cases, how addressing is used and how
messages are transported to a particular host is covered in the mail
transport document [RFC2821]. These mechanisms are outside of the
scope of this document.

That leads me to believe that user@hostname is a valid email address.

Update: I forgot to mention that I'm in favor of rolling back to the regex.

mfer’s picture

user@hostname is not valid under RFC 2822. I've done a bit of reading and this is the case. You can even find detail of this on the email regular expressions page at http://www.regular-expressions.info/email.html

For the RFC 2822 regex it says

The part after the @ also has two alternatives. It can either be a fully qualified domain name (e.g. regular-expressions.info), or it can be a literal Internet address between square brackets. The literal Internet address can either be an IP address, or a domain-specific routing address.

There is a difference between a domain name like localhost and a fully qualified domain name. Fully qualified domain names are unique.

In either case someone from above my drupal pay grade needs to decide our path forward. If we aren't following RFC 2822 I'd suggest a comment regarding where we didn't follow 2822.

mfer’s picture

Category: task » bug
Priority: Normal » Critical
mfer’s picture

Title: Replace valid_email_address with filter_var and FILTER_VALIDATE_EMAIL » valid_email_address doesn't accept @localhost addresses

Updating the title to describe the issue as it stands.

Dave Reid’s picture

Title: Replace valid_email_address with filter_var and FILTER_VALIDATE_EMAIL » valid_email_address doesn't accept @localhost addresses
Category: task » bug
Priority: Normal » Critical

Is there a case where using @localhost.localdomain wouldn't work? It passes valid_email_address and looks to be a generally more accepted form over "localhost". I just think there is no real use case for @localhost besides testing. I cannot e-mail user@localhost so it should not be a "valid" email.

drewish’s picture

Dave Reid, .localdomain isn't a reserved TLD and Apple uses .local and Microsoft suggests using it. if you just want something that'll validate you can use foo@127.0.0.1.

mfer, we need to be reading RFC 2822 rather than regular-expressions.info's commentary on it (on that page he discusses using happily using a regex that blocks .museum addresses).

i already quoted the part of the RFC with the grammar for an address token and you can see there that one of the productions for the domain token is the dot-atom token. this is in turn defined as:

atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"

atom = [CFWS] 1*atext [CFWS]

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1*atext *("." 1*atext)

after brushing up on ABNF by reading RFC 2234, i read the 1*atext production to be one or more atext tokens then optionally followed by any number of period-one-or-more-atext tokens. please correct me if i'm reading that wrong but it sure looks like we're not following the standard the current implementation.

my big problem with this change is that it invalidates existing data. i could have foo@localhost on my site and next time i go to submit that form it'll fail.

mfer’s picture

@drewish - you have now made me think more than I would have liked to today :)

After digging into RFC 2234 I believe you have the correct interpretation of dot-atom-text. For anyone interested see the * operator in section 3.6 of RFC 2234.

First, what ever direction we need to go I'll craft up a patch.

Next, This raises 2 questions for me:

1. In the test for this in contact.test it reads:

  $invalid_recipients = array('invalid', 'invalid@', 'invalid@site', 'invalid@site.', '@site.', '@site.com'); 

We have a test for 'invalid@site'. This is equivalent to example@localhost which is the issue at hand. Is there a reason for this?

Note: If this should be valid we need to remove it as an invalid test. If we go to a our own regex we should use one that finds 'invalid@site.' to be invalid.

2. Is there something that overrides RFC 2822 on the dot-atom? The interpretations and implementations I've read all over reject @localhost being valid. Or, am I just missing something?

mfer’s picture

Here's the node with the initial contact.test which has the wrong invalid test case. http://drupal.org/node/232665

mfer’s picture

After talking with drewish in IRC I'll craft up a patch that uses an RFC2822 compliant regex and update the tests to be correct.

And I thought this was going to be a simple patch.

mfer’s picture

This get's a little more interesting. RFC 2821 (see 4.1.2) doesn't allow user@localhost. It requires "Domain = (sub-domain 1*("." sub-domain)) / address-literal" This one requires and least one . to be present in the domain.

I love it when standards differ with each other. The function filter_var($mail, FILTER_VALIDATE_EMAIL) is RFC 2821 compliant. I've added 5 more invalid cases to contact.test and all the tests pass.

So, do we go with RFC 2822 or RFC 2821?

drewish’s picture

Good points, though I think you meant section 4.1.3 Address Literals.

mfer’s picture

The more I think about this I'm not sure I like the idea of visitors coming to a site, filling out the contact form with an address like user@example and having it pass validation. If we go with RFC 2822 that's what we will have. The same thing would happen with email addresses on user pages. We could get email addresses that don't have domains we can route to.

I understand the intranet use case. But, the larger use case is where we want it to fail if it's not a domain as RFC 2821 calls out because that's the SMTP standard and that deals with the routing.

Basically, if my users forget the .com, .org, or what ever their extension is I want it to tell them.

FYI, I have a regex that works for RFC 2822 waiting in the wings if needed.

mfer’s picture

We have two distinct use cases here. Personally, I don't want to allow user@localhost on my sites but I understand there are a lot of sites (intranets in particular) that might want to allow it. So, my suggestion is that we add an argument to the function, a variable via variable_get, and the option to go either way.

Some psuedo code:

function valid_email_address($mail, $enforce_rfc2821 = NULL) {
  if (!isset($enforce_rfc2821)) = variable_get('enforce_rfc2821', TRUE);

  if ($enforce_rfc2821) return filter_var .....
    // do the more likely case first

  else // run a regex that parses for rfc 2822
}

Thoughts?

This is definitely more complicated and I'm not sure I'd expose a UI to change this setting but it supports both cases.

drewish’s picture

i've no objection to using the rfc 2821 so long as we document that that's what valid_email_address is using. there should probably be a changelog message too.

asimmonds’s picture

Using filter_var() could possibly introduce a security issue on PHP 5.2.0 and 5.2.1:
PHP ext/filter Email Validation Vulnerability, http://www.php-security.org/MOPB/PMOPB-45-2007.html

BioALIEN’s picture

I'm with RFC 2821 as will the majority of admins that deploy Drupal I am sure. For this, I would rather see us validate against user@localhost type of emails.

I hope the powers above will use common sense when deciding this. Drupal is largely used on the internet rather than intranet.

Dave Reid’s picture

RFC 2821 should absolutely be the default validation, since it is only a few cases where it is not preferred.

An option I've been brainstorming recently: allowing e-mails like user@machinename should be something fit for an intranet install profile that will somehow work to override email address verification to only allow e-mails of a certain domain, etc. The trouble would be how exactly to implement 'other' validation in valid_email_address. Maybe some kind of hook_validate?

mfer’s picture

If we go with #25 we can have drupal be configurable for both ways.

@Dave Reid what would a use case be for hook_validate to alter valid_email_address? Who would use it and how often would the case come up?

alexanderpas’s picture

RFC 2821 is probaly soon obsolete! RFC 5321 is here: http://tools.ietf.org/html/rfc5321
RFC 2822 is also probaly soon obsolete! RFC 5322 is here: http://tools.ietf.org/html/rfc5322

mfer’s picture

@alexanderpas Thanks for the update. I was planning on adding in the 2822 functionality this week. Looks like I have another fun filled spec to read... unless someone else wants to fill in the basics.

mfer’s picture

So, do we go against RFC 5322 or RFC 5321?

drewish’s picture

alexanderpas, the 532? standards are still drafts, I don't think it'd be wise to adopt those just yet. In any case it doesn't answer the question of which spec do we follow.

alexanderpas’s picture

is the e-mail address user@[127.0.0.1] valid? afaik yes.

in my opinion, we should only accept localhost e-mail addresses after user confirmation.

Notice: you have used a localhost e-mail address, mails send to this address will never be send to another computer, and you can only recieve mail send to this address from this computer.

Pancho’s picture

RFC 5321 and RFC 5322 have been released on October 2, 2008, and supersede the old rfcs.
Here is a commented comparison of the (few and subtle) changes.

This here is a crazy long regex for validating email addresses - never seen such a regex before. But I don't think we need to go that far. We need to do a good job, not a perfect one.

However, I also noticed that neither valid_url() nor valid_email_address() accept IDNs like www.übercart.com, which are perfectly valid. This is a real showstopper, especially for languages with a non-latin alphabet. IPv6 is not accepted either in the domain part.

So we definitely need to study the RFCs and rewrite the regexes correctly. Or we can search for the best premade one we can find and do a lot of testing. I suppose there will be acceptable ones in some other GPLed projects.

Heine’s picture

Title: valid_email_address doesn't accept @localhost addresses » valid_email_address doesn't accept IDNs or @localhost

The more I think about this I'm not sure I like the idea of visitors coming to a site, filling out the contact form with an address like user@example and having it pass validation.

A quick reminder to all participants in the discussion; the email address validation should only check for compliance to the relevant RFC, not whether the address is an existing and / or reachable address.

In the case of the quote, there's not much difference to user@example and user@a-non-existing-domain.com or even user@example.com.

Pancho’s picture

alexanderpas’s picture

don't forget punycode ;) http://www.xn--bercart-m2a.com/

mfer’s picture

mfer’s picture

2 Things.

First, international characters are not valid per the spec which says,

ALPHA = %x41-5A / %x61-7A ; A-Z / a-z

See appendix B of http://tools.ietf.org/html/rfc5234.

Also, note that RFC says, "Printable US-ASCII" in reference to an ALPHA character.

So, characters like ü are invalid per the spec.

Now, I'm not one to always follow spec.

Second, We send emails based on the validations of emails. If we accept IDNs we need a mechanism to convert them to a DNS routable name. For an example of this see http://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_ID...

drewish’s picture

alexanderpas’s picture

Issue tags: +IDN
c960657’s picture

alexanderpas’s picture

Title: valid_email_address doesn't accept IDNs or @localhost » valid_email_address doesn't accept IDNs

okay... that seems to make this issue completely fixable...

now we just need the following:
IDN to ASCII encoded domain. (using Nameprep (RFC 3491) and Punycode (RFC 3492) encoding)
ASCII encoded domain to IDN. (using Punycode (RFC 3492) decoding)

I'm thinking that this encoding only should happen if:
1) FILTER_VALIDATE_EMAIL determines it is not a valid e-mail address (i'm using the Pareto principle here)
2) the domain contains non-ASCII characters.
it just should just redo FILTER_VALIDATE_EMAIL after encoding.

alexanderpas’s picture

Status: Active » Postponed
marcvangend’s picture

Version: 7.x-dev » 8.x-dev

Moving this to D8 because #389278: Create IDN encoding and decoding functions has been moved to D8 as well.

Bojhan’s picture

Priority: Critical » Major
mr.baileys’s picture

klaus66’s picture

It maybe that an email address without a domain-ending is a valid email adress.

But for the normal practice i think it is not the right way to use this function for user registration.

On all big platforms like amazon or ebay an email address without a domain-ending will not accepted.

sun’s picture

Title: valid_email_address doesn't accept IDNs » Make valid_email_address() support IDNs
Category: bug » feature

This is a feature, not a bug.

killes@www.drop.org’s picture

Category: feature » bug
Status: Postponed » Needs review

This is most certainly a bug which should urgently be fixed in the stable versions.

Status: Needs review » Needs work

The last submitted patch, valid_email_address-308138-5.patch, failed testing.

Gábor Hojtsy’s picture

Anybody interested in fixing this beyond using it for issue ping-pong games?

catch’s picture

Category: bug » task
Priority: Major » Normal
Status: Needs work » Closed (fixed)

8.x now requires PHP 5.3, this means that foo@localhost e-mail addresses are supported. For Drupal 7 there is an easy workaround of upgrading to PHP 5.2.9+

Some people clearly don't like this - but that's not a major bug, so since that discussion has barely anything to do with the original patch that was committed, I'm marking this as 'closed (fixed)' as it should have been over a year ago. If you want to revert the use of FILTER_VALIDATE_EMAIL then please open a new issue to discuss that for 8.x.

It looks like killes has found a valid issue in 6.x that was fixed by this patch - we should probably have a separate 6.x issue for that since it can't use a fix requiring PHP 5.2.x anyway.

Mike Wacker’s picture

Strange, I have a D8 installation running on Ubuntu 10.10, and it doesn't want to accept mike@localhost as a valid email address. Is there something special I need to do here, or should I re-open the bug?

Mixologic’s picture