Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
valid_email_address uses a regex to detect if an email address is valid. In PHP 5.2 we can use
filter_var('example@example.com', FILTER_VALIDATE_EMAIL);
to do the same thing. This is a built in function to PHP and should be faster.
Before I write a patch I wonder if we should leave valid_email_address and change the logic to use the filter_var function or replace each of the valid_email_address calls with the filter_var call. Thoughts?
For more details and example usage check out http://www.w3schools.com/PHP/filter_validate_email.asp.
Comment | File | Size | Author |
---|---|---|---|
#5 | valid_email_address-308138-5.patch | 1.86 KB | mfer |
#1 | valid_email_address-308138-1.patch | 846 bytes | mfer |
Comments
Comment #1
mfer CreditAttribution: mfer commentedThis patch replaces the guts of the valid_email_address function.
Comment #2
Dave ReidVery neat use of the new PHP 5.2 requirement! Patch applies cleanly with a little offset. Testing is working great so far! Using this patch it actually caught the invalid e-mail of 'test@e' that core did not catch before. It even catches the two commented-out invalid e-mails from the contact.test that apparently needed to be commented out since they weren't caught by the old valid_email_address function!
$invalid_recipients = array('invalid', 'invalid@', /*'invalid@site', 'invalid@site.',*/ '@site.', '@site.com');
Maybe a couple more reviews and RTBC!
Comment #3
Dave ReidI'd have to look and see if there was more we could replace with filter_var. One likely suspect is valid_url. Not sure how it would work with local drupal paths however.
Comment #4
mfer CreditAttribution: mfer commented@Dave Reid thanks for pointing out the tests. I'll update for the offset and the tests and post a new patch.
I looked at valid_url as well. We will need something more like:
I didn't include this in the patch because I'm not entirely sure of the behavior of FILTER_VALIDATE_URL with it's different options. Doing some testing. Filters don't seem to be documented very well. I found some detail here - http://devolio.com/blog/archives/413-Data-Filtering-Using-PHPs-Filter-Fu...
Comment #5
mfer CreditAttribution: mfer commentedThe attached patch has the same change as the first but the contact form test has been updated to as described in #2.
$invalid_recipients = array('invalid', 'invalid@', /*'invalid@site', 'invalid@site.',*/ '@site.', '@site.com');
was updated to
$invalid_recipients = array('invalid', 'invalid@', 'invalid@site', 'invalid@site.', '@site.', '@site.com');
The tests passed.
I separated out a change to valid_url into a separate issue because using filter_var would change the response. Details at http://drupal.org/node/308977.
Comment #6
Dave ReidLooks great since this will narrow down invalid e-mail addresses! Patch applied cleanly and ran the contact & user tests with 100% pass. I believe it is RTBC!
Comment #7
Dries CreditAttribution: Dries commentedLooks like an improvement to me, and the tests pass. Committed. Thanks.
Comment #8
drewish CreditAttribution: drewish commentedI think this is a bit of a step backwards. Previously you could use 'drewish@localhost' as a valid email this is not longer allowed.
Comment #9
Dave ReidI really don't think that should be a valid e-mail. Suppose you send a user an e-mail through their personal contact form. When that user replies to your e-mail, it's going to send it to 'drewish@localhost' from their computer or web mail, which would be invalid. If someone wants to use localhost as an e-mail domain, they should be using somekind of externall- accessable domain like 'mylocalcomputer.dyndns.org'.
Comment #10
drewish CreditAttribution: drewish commentedlooks like there are problems with the function: #43402 FILTER_VALIDATE_EMAIL is not RFC2822 compliant
Comment #11
drewish CreditAttribution: drewish commenteddave reid, the function is called valid_email_address(), and 'drewish@localhost' is a valid email address. you could make an argument that we should have a function valid_globally_email() or some such but as the function is currently named it's a step backwards that could invalidate existing user data.
Comment #12
mfer CreditAttribution: mfer commentedFrom what I have read localhost fails RFC 2822 because localhost is an alias and not a fully qualified domain or IP. I'm digging into this but it still doesn't solve our problem here.
Do we roll back, go to a different regex that allows for aliased domain names, or do we stick to RFC 2822 on this?
Comment #13
drewish CreditAttribution: drewish commentedmfer, i think the relevant part of RFC 2822 is page 17 (emphasis is mine):
That leads me to believe that user@hostname is a valid email address.
Update: I forgot to mention that I'm in favor of rolling back to the regex.
Comment #14
mfer CreditAttribution: mfer commenteduser@hostname is not valid under RFC 2822. I've done a bit of reading and this is the case. You can even find detail of this on the email regular expressions page at http://www.regular-expressions.info/email.html
For the RFC 2822 regex it says
There is a difference between a domain name like localhost and a fully qualified domain name. Fully qualified domain names are unique.
In either case someone from above my drupal pay grade needs to decide our path forward. If we aren't following RFC 2822 I'd suggest a comment regarding where we didn't follow 2822.
Comment #15
mfer CreditAttribution: mfer commentedComment #16
mfer CreditAttribution: mfer commentedUpdating the title to describe the issue as it stands.
Comment #17
Dave ReidIs there a case where using @localhost.localdomain wouldn't work? It passes valid_email_address and looks to be a generally more accepted form over "localhost". I just think there is no real use case for @localhost besides testing. I cannot e-mail user@localhost so it should not be a "valid" email.
Comment #18
drewish CreditAttribution: drewish commentedDave Reid,
.localdomain
isn't a reserved TLD and Apple uses .local and Microsoft suggests using it. if you just want something that'll validate you can use foo@127.0.0.1.mfer, we need to be reading RFC 2822 rather than regular-expressions.info's commentary on it (on that page he discusses using happily using a regex that blocks .museum addresses).
i already quoted the part of the RFC with the grammar for an
address
token and you can see there that one of the productions for thedomain
token is thedot-atom
token. this is in turn defined as:after brushing up on ABNF by reading RFC 2234, i read the
1*atext
production to be one or moreatext
tokens then optionally followed by any number of period-one-or-more-atext tokens. please correct me if i'm reading that wrong but it sure looks like we're not following the standard the current implementation.my big problem with this change is that it invalidates existing data. i could have foo@localhost on my site and next time i go to submit that form it'll fail.
Comment #19
mfer CreditAttribution: mfer commented@drewish - you have now made me think more than I would have liked to today :)
After digging into RFC 2234 I believe you have the correct interpretation of dot-atom-text. For anyone interested see the * operator in section 3.6 of RFC 2234.
First, what ever direction we need to go I'll craft up a patch.
Next, This raises 2 questions for me:
1. In the test for this in contact.test it reads:
We have a test for 'invalid@site'. This is equivalent to example@localhost which is the issue at hand. Is there a reason for this?
Note: If this should be valid we need to remove it as an invalid test. If we go to a our own regex we should use one that finds 'invalid@site.' to be invalid.
2. Is there something that overrides RFC 2822 on the dot-atom? The interpretations and implementations I've read all over reject @localhost being valid. Or, am I just missing something?
Comment #20
mfer CreditAttribution: mfer commentedHere's the node with the initial contact.test which has the wrong invalid test case. http://drupal.org/node/232665
Comment #21
mfer CreditAttribution: mfer commentedAfter talking with drewish in IRC I'll craft up a patch that uses an RFC2822 compliant regex and update the tests to be correct.
And I thought this was going to be a simple patch.
Comment #22
mfer CreditAttribution: mfer commentedThis get's a little more interesting. RFC 2821 (see 4.1.2) doesn't allow user@localhost. It requires "Domain = (sub-domain 1*("." sub-domain)) / address-literal" This one requires and least one . to be present in the domain.
I love it when standards differ with each other. The function filter_var($mail, FILTER_VALIDATE_EMAIL) is RFC 2821 compliant. I've added 5 more invalid cases to contact.test and all the tests pass.
So, do we go with RFC 2822 or RFC 2821?
Comment #23
drewish CreditAttribution: drewish commentedGood points, though I think you meant section 4.1.3 Address Literals.
Comment #24
mfer CreditAttribution: mfer commentedThe more I think about this I'm not sure I like the idea of visitors coming to a site, filling out the contact form with an address like user@example and having it pass validation. If we go with RFC 2822 that's what we will have. The same thing would happen with email addresses on user pages. We could get email addresses that don't have domains we can route to.
I understand the intranet use case. But, the larger use case is where we want it to fail if it's not a domain as RFC 2821 calls out because that's the SMTP standard and that deals with the routing.
Basically, if my users forget the .com, .org, or what ever their extension is I want it to tell them.
FYI, I have a regex that works for RFC 2822 waiting in the wings if needed.
Comment #25
mfer CreditAttribution: mfer commentedWe have two distinct use cases here. Personally, I don't want to allow user@localhost on my sites but I understand there are a lot of sites (intranets in particular) that might want to allow it. So, my suggestion is that we add an argument to the function, a variable via variable_get, and the option to go either way.
Some psuedo code:
Thoughts?
This is definitely more complicated and I'm not sure I'd expose a UI to change this setting but it supports both cases.
Comment #26
drewish CreditAttribution: drewish commentedi've no objection to using the rfc 2821 so long as we document that that's what valid_email_address is using. there should probably be a changelog message too.
Comment #27
asimmonds CreditAttribution: asimmonds commentedUsing filter_var() could possibly introduce a security issue on PHP 5.2.0 and 5.2.1:
PHP ext/filter Email Validation Vulnerability, http://www.php-security.org/MOPB/PMOPB-45-2007.html
Comment #28
BioALIEN CreditAttribution: BioALIEN commentedI'm with RFC 2821 as will the majority of admins that deploy Drupal I am sure. For this, I would rather see us validate against
user@localhost
type of emails.I hope the powers above will use common sense when deciding this. Drupal is largely used on the internet rather than intranet.
Comment #29
Dave ReidRFC 2821 should absolutely be the default validation, since it is only a few cases where it is not preferred.
An option I've been brainstorming recently: allowing e-mails like user@machinename should be something fit for an intranet install profile that will somehow work to override email address verification to only allow e-mails of a certain domain, etc. The trouble would be how exactly to implement 'other' validation in valid_email_address. Maybe some kind of hook_validate?
Comment #30
mfer CreditAttribution: mfer commentedIf we go with #25 we can have drupal be configurable for both ways.
@Dave Reid what would a use case be for hook_validate to alter valid_email_address? Who would use it and how often would the case come up?
Comment #31
alexanderpas CreditAttribution: alexanderpas commentedRFC 2821 is probaly soon obsolete! RFC 5321 is here: http://tools.ietf.org/html/rfc5321
RFC 2822 is also probaly soon obsolete! RFC 5322 is here: http://tools.ietf.org/html/rfc5322
Comment #32
mfer CreditAttribution: mfer commented@alexanderpas Thanks for the update. I was planning on adding in the 2822 functionality this week. Looks like I have another fun filled spec to read... unless someone else wants to fill in the basics.
Comment #33
mfer CreditAttribution: mfer commentedSo, do we go against RFC 5322 or RFC 5321?
Comment #34
drewish CreditAttribution: drewish commentedalexanderpas, the 532? standards are still drafts, I don't think it'd be wise to adopt those just yet. In any case it doesn't answer the question of which spec do we follow.
Comment #35
alexanderpas CreditAttribution: alexanderpas commentedis the e-mail address user@[127.0.0.1] valid? afaik yes.
in my opinion, we should only accept localhost e-mail addresses after user confirmation.
Notice: you have used a localhost e-mail address, mails send to this address will never be send to another computer, and you can only recieve mail send to this address from this computer.
Comment #36
PanchoRFC 5321 and RFC 5322 have been released on October 2, 2008, and supersede the old rfcs.
Here is a commented comparison of the (few and subtle) changes.
This here is a crazy long regex for validating email addresses - never seen such a regex before. But I don't think we need to go that far. We need to do a good job, not a perfect one.
However, I also noticed that neither valid_url() nor valid_email_address() accept IDNs like www.übercart.com, which are perfectly valid. This is a real showstopper, especially for languages with a non-latin alphabet. IPv6 is not accepted either in the domain part.
So we definitely need to study the RFCs and rewrite the regexes correctly. Or we can search for the best premade one we can find and do a lot of testing. I suppose there will be acceptable ones in some other GPLed projects.
Comment #37
Heine CreditAttribution: Heine commentedA quick reminder to all participants in the discussion; the email address validation should only check for compliance to the relevant RFC, not whether the address is an existing and / or reachable address.
In the case of the quote, there's not much difference to user@example and user@a-non-existing-domain.com or even user@example.com.
Comment #38
PanchoPlease take notice of #209672: Use site name in From: header for system e-mails and #214114: Modify valid email core function to support RFC2822 display name. as well.
Comment #39
alexanderpas CreditAttribution: alexanderpas commenteddon't forget punycode ;) http://www.xn--bercart-m2a.com/
Comment #40
mfer CreditAttribution: mfer commentedFor valid_url modifications see #124492: valid_url() does not support all valid URL characters.
Comment #41
mfer CreditAttribution: mfer commented2 Things.
First, international characters are not valid per the spec which says,
See appendix B of http://tools.ietf.org/html/rfc5234.
Also, note that RFC says, "Printable US-ASCII" in reference to an ALPHA character.
So, characters like ü are invalid per the spec.
Now, I'm not one to always follow spec.
Second, We send emails based on the validations of emails. If we accept IDNs we need a mechanism to convert them to a DNS routable name. For an example of this see http://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_ID...
Comment #42
drewish CreditAttribution: drewish commentedmarked #250022: User was able to register with email address without ".com" on the end as a duplicate.
Comment #44
alexanderpas CreditAttribution: alexanderpas commentedComment #45
c960657 CreditAttribution: c960657 commentedFWIW, FILTER_VALIDATE_EMAIL accepts "foo@localhost" as of PHP 5.2.9 that was released today:
http://bugs.php.net/bug.php?id=47282
http://cvs.php.net/viewvc.cgi/php-src/ext/filter/logical_filters.c?r1=1....
http://cvs.php.net/viewvc.cgi/php-src/ext/filter/tests/016.phpt?r1=1.4.2...
Comment #46
alexanderpas CreditAttribution: alexanderpas commentedokay... that seems to make this issue completely fixable...
now we just need the following:
IDN to ASCII encoded domain. (using Nameprep (RFC 3491) and Punycode (RFC 3492) encoding)
ASCII encoded domain to IDN. (using Punycode (RFC 3492) decoding)
I'm thinking that this encoding only should happen if:
1) FILTER_VALIDATE_EMAIL determines it is not a valid e-mail address (i'm using the Pareto principle here)
2) the domain contains non-ASCII characters.
it just should just redo FILTER_VALIDATE_EMAIL after encoding.
Comment #47
alexanderpas CreditAttribution: alexanderpas commentedpostponed until #389278: Create IDN encoding and decoding functions is in.
Comment #48
marcvangendMoving this to D8 because #389278: Create IDN encoding and decoding functions has been moved to D8 as well.
Comment #49
Bojhan CreditAttribution: Bojhan commentedComment #50
mr.baileys#984568: valid email not passing validation (someuser@localhost) marked as duplicate.
Comment #51
klaus66 CreditAttribution: klaus66 commentedIt maybe that an email address without a domain-ending is a valid email adress.
But for the normal practice i think it is not the right way to use this function for user registration.
On all big platforms like amazon or ebay an email address without a domain-ending will not accepted.
Comment #52
sunThis is a feature, not a bug.
Comment #53
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedThis is most certainly a bug which should urgently be fixed in the stable versions.
Comment #55
Gábor HojtsyAnybody interested in fixing this beyond using it for issue ping-pong games?
Comment #56
catch8.x now requires PHP 5.3, this means that foo@localhost e-mail addresses are supported. For Drupal 7 there is an easy workaround of upgrading to PHP 5.2.9+
Some people clearly don't like this - but that's not a major bug, so since that discussion has barely anything to do with the original patch that was committed, I'm marking this as 'closed (fixed)' as it should have been over a year ago. If you want to revert the use of FILTER_VALIDATE_EMAIL then please open a new issue to discuss that for 8.x.
It looks like killes has found a valid issue in 6.x that was fixed by this patch - we should probably have a separate 6.x issue for that since it can't use a fix requiring PHP 5.2.x anyway.
Comment #57
Mike Wacker CreditAttribution: Mike Wacker commentedStrange, I have a D8 installation running on Ubuntu 10.10, and it doesn't want to accept mike@localhost as a valid email address. Is there something special I need to do here, or should I re-open the bug?
Comment #58
Mixologic