valid_url() doesn't accept URLs with @ in it [#41651]

Comment	File	Size	Author
#18	valid_url_7.patch	1.41 KB	edmund.kwok
#16	valid_url_6.patch	1.4 KB	edmund.kwok
#15	valid_url_5.patch	1.37 KB	edmund.kwok
#13	valid_url_4.patch	1.39 KB	edmund.kwok
#10	valid_url_3.patch	1.37 KB	edmund.kwok
#8	valid_url_2.patch	1.21 KB	edmund.kwok
#7	valid_url.patch	1.11 KB	chx
#2	common_14.patch	448 bytes	paddy_deburca

Comment #1

killes@www.drop.org commented 3 March 2006 at 13:37

Title:	Homepage field doesn't accept URLs with @ in it	» valid_url() doesn't accept URLs with @ in it
Version:	4.6.5	» x.y.z

can verify this.

Log in or register to post comments

Status	File	Size
new	common_14.patch	448 bytes

Comment #3

markus_petrux commented 3 March 2006 at 19:22

Just wanted to point that adding the atsign, as it is the current regexp used for URL validation, might be dangerous, or at least opens up something that it was not possible.

The atsign, if entered just after the protocol (ie. part of the domain) is used to specify username:password. Sometimes this is used to fool people with crafted URLs such as:

http://www.example.com\use-a-long-enough-string-here@www.badguy.com/malware.php

Log in or register to post comments

Comment #4

paddy_deburca commented 4 March 2006 at 09:17

Priority:

Normal

» Minor

I am changing the priority to Minor.

According to http://www.ietf.org/rfc/rfc1738.txt the atsign is a reserved character.

To embed the atsign in a url, a percent sign followed by the hexadecimal code should be used. Infact this URL coding standard should be enforced/required for all special characters.

So

http://www.flickr.com/photos/16637594@N00/
http://www.example.com\use-a-long-enough-string-here@www.badguy.com/malware.php

could be replaced by

http://www.flickr.com/photos/16637594%40N00/
http://www.example.com\use-a-long-enough-string-here%40www.badguy.com/malware.php

In my opinion the code, as is _without_ patch works by design and should _not_ be patched to resolve this issue.

However, the same RFC points out that there is more than a couple of potocols. The list includes

ftp: File Transfer protocol
http: Hypertext Transfer Protocol
gopher: The Gopher protocol
mailto: Electronic mail address
news: USENET news
nntp: USENET news using NNTP access
telnet: Reference to interactive sessions
wais: Wide Area Information Servers
file: Host-specific file names
prospero: Prospero Directory Service

and that these protocols normally have the following schemes

//<user>:<password>@<host>:<port>/<url-path>

To be more correct the valid_url should be compatible with this scheme. As a practical example adding mailto support would allow the weblink module to maintain e-mail addresses.

Paddy.

Log in or register to post comments

Comment #5

paddy_deburca commented 5 March 2006 at 09:43

I have developed on the valid_url to be more compliant with the IETF RFC (http://www.ietf.org/rfc/rfc1738.txt).

The code is as follows

function valid_url($url, $absolute = FALSE) {
  $allowed_characters = '[a-z0-9\/:_\-_\.\?\$,~=#&%\+\@]';

  $lowalpha = 'a-z';
  $hialpha  = 'A-Z';
  $alpha    = $lowalpha . $hialpha;
  $digit    = '0-9';
  $alphadigit = $alpha . $digit;
  $safe     = '\$\-\_\.\+';
  $extra    = '\!\*\'\(\)\,';
  $national = '\{\}\|\\\^\~\[\]\`';
  $punctuation = '\<\>\#\%\"';

  $reserved = '\;\/\?\:\@\&\=';
  $hex      = $digit . 'A-Fa-f';
  $escape   = '\%[' . $hex . ']{2}';
  
  $unreserved = $alpha . $digit . $safe . $extra;
  $uchar    = $unreserved . $escape;
  $xchar    = $unreserved . $reserved . $escape;
  
  $user     = '([' . $unreserved . '\;\?\&\=]|('. $escape . '))+';
  $password = '([' . $unreserved . '\;\?\&\=]|('. $escape . '))+';
  $login    = '(' . $user . '(\:' . $password . ')?\@)?';

  $toplabel = '([' . $alpha . ']|([' . $alpha . '][' . $alphadigit . '\-]*[' . $alphadigit .']))';
  $domainlabel = '([' . $alphadigit . ']|([' . $alphadigit . ']([' . $alphadigit . '\-])*[' . $alphadigit . ']))';
  $hostname = '(' . $domainlabel . '\.)*' . $toplabel;
  $port     = '[' . $digit . ']+';
  $hostnumber = '[' . $digit . ']{1,3}' . '(\.[' . $digit . ']{1,3}){3}';
  $host     = '(' . $hostname . '|' . $hostnumber . ')';
  $hostport = $host . '(\:' . $port . ')?';
  
  $search   = '([' . $unreserved . '\;\:\@\&\=]|('. $escape . '))*';
  $hsegment = '([' . $unreserved . '\;\:\@\&\=]|('. $escape . '))*';
  $hpath    = $hsegment . '(\/' . $hsegment . ')*' . '(\?' . $search . ')?';
  $proper_url = $login . $hostport . '(\/' . $hpath . ')?';

  $protocols = '(http|https|ftp):\/\/';

  if ($absolute) {
    watchdog('php', preg_match("/^(http|https|ftp):\/\/". $proper_url ."$/i", $url) ? 'Matched' : 'Oops', 0);
    return preg_match('/^' . $protocols . $proper_url. '$/i', $url);
  }
  else {
    return preg_match('/^'. $hpath .'$/i', $url);
  }
}

The resultant regular expression is

(([a-zA-Z0-9\$\-\_\.\+\!\*'\(\)\,\;\?\&\=]|(\%[0-9A-Fa-f]{2}))+(\:([a-zA-Z0-9\$\-\_\.\+\!\*'\(\)\,\;\?\&\=]|(\%[0-9A-Fa-f]{2}))+)?\@)?((([a-zA-Z0-9]|([a-zA-Z0-9]([a-zA-Z0-9\-])*[a-zA-Z0-9]))\.)*([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9]))|[0-9]{1,3}(\.[0-9]{1,3}){3})(\:[0-9]+)?(\/([a-zA-Z0-9\$\-\_\.\+\!\*'\(\)\,\;\:\@\&\=]|(\%[0-9A-Fa-f]{2}))*(\/([a-zA-Z0-9\$\-\_\.\+\!\*'\(\)\,\;\:\@\&\=]|(\%[0-9A-Fa-f]{2}))*)*(\?([a-zA-Z0-9\$\-\_\.\+\!\*'\(\)\,\;\:\@\&\=]|(\%[0-9A-Fa-f]{2}))+)?)?

so I would be loath to simplify the code.

It works for the test cases

http://drupal.org OK
http://drupal.org/ OK
http://123.145.167.189 OK
http://123.145.167.189:80 OK
http://www.flickr.com/photos/16637594@N00/ NOT OK
http://www.flickr.com/photos/16637594%40N00/ OK
http://www.example.com\use-a-long-enough-string-here%40www.badguy.com/malware.php NOT OK
http://paddy:paddy@drupal.org OK
http://paddy:paddy@drupal.org:80 OK

I will continue to test and debug.

I will provide a patch when thouroughly tested and working as expected.

Paddy.

Log in or register to post comments

Comment #6

paddy_deburca commented 5 March 2006 at 09:45

Sorry - please remove the call to watchdog().

Log in or register to post comments

Comment #7

chx commented 29 April 2006 at 16:04

Status:

Active

» Needs review

Status	File	Size
new	valid_url.patch	1.11 KB

But we have the code for this already... no need to reinvent the wheel.

Log in or register to post comments

Comment #8

edmund.kwok commented 5 September 2006 at 02:04

Status	File	Size
new	valid_url_2.patch	1.21 KB

Changed $path to $url in valid_url and patch works againts HEAD.

Both:

http://www.flickr.com/photos/16637594@N00/
and
http://www.flickr.com/photos/16637594%40N00/

are accepted as valid_url now.

Log in or register to post comments

Comment #9

chx commented 5 September 2006 at 08:40

Status:

Needs review

» Reviewed & tested by the community

then it's RTBC

Log in or register to post comments

Comment #10

edmund.kwok commented 5 September 2006 at 09:33

Status	File	Size
new	valid_url_3.patch	1.37 KB

Previous patch was not rolled againts head. This should be final.

Log in or register to post comments

Comment #11

dries commented 5 September 2006 at 10:58

Looking at this patch; don't we declare $allowed_characters twice?

Log in or register to post comments

Comment #12

chx commented 5 September 2006 at 11:13

Status:

Reviewed & tested by the community

» Needs work

every patch but the latest contained this line:
- $allowed_characters = '[a-z0-9\/:_\-_\.\?\$,~=#&%\+]';
so yes, now we define it twice but it should not be so.

Log in or register to post comments

Comment #13

edmund.kwok commented 5 September 2006 at 11:25

Status:

Needs work

» Reviewed & tested by the community

Status	File	Size
new	valid_url_4.patch	1.39 KB

My bad, missed out one line when rerolled. This hopefully is the final final..

Log in or register to post comments

Comment #14

drumm

he/him

NY, US

commented 6 September 2006 at 06:33

Status:

Reviewed & tested by the community

» Needs work

This doesn't seem to fix the actual issue, @ not being allowed.

The relevant RFC, http://www.ietf.org/rfc/rfc2396.txt ("revises and replaces" 1738) states @ is allowed in path segments.

Log in or register to post comments

Comment #15

edmund.kwok commented 6 September 2006 at 08:14

Status:

Needs work

» Needs review

Status	File	Size
new	valid_url_5.patch	1.37 KB

Changed $path to $url in valid_url.

Log in or register to post comments

Comment #16

edmund.kwok commented 5 October 2006 at 04:03

Status	File	Size
new	valid_url_6.patch	1.4 KB

Rerolled from root to latest HEAD.

Log in or register to post comments

Comment #17

drumm

he/him

NY, US

commented 27 December 2006 at 13:30

Version:	x.y.z	» 5.x-dev
Status:	Needs review	» Needs work

patching file includes/common.inc
Hunk #1 FAILED at 689.
Hunk #2 succeeded at 1141 with fuzz 2 (offset 121 lines).
1 out of 2 hunks FAILED -- saving rejects to file includes/common.inc.rej

I'm confused about why this isn't simply adding @ to the allowed characters list. What do all of the code changes do?

Log in or register to post comments

Status	File	Size
new	valid_url_7.patch	1.41 KB

From #3, just adding @ to the allowed character list might open up a security loophole. And for the whole purpose of the patch, I don't get the whole idea either; just fixing syntax errors with what chx came up with.

Log in or register to post comments

Comment #19

Steven commented 15 January 2007 at 22:14

From #3, just adding @ to the allowed character list might open up a security loophole. And for the whole purpose of the patch, I don't get the whole idea either; just fixing syntax errors with what chx came up with.

Phishing is not a reason to disallow valid URLs blindly. Flickr is a perfect example of legit sites using @.

Also, filter_xss_bad_protocol() does not guarantee anything other than the protocol. We should still at least validate the characters in absolute URLs to ensure at least a high degree of URL validity.

Log in or register to post comments

Comment #20

drumm

he/him

NY, US

commented 16 July 2007 at 00:26

Version:

5.x-dev

» 6.x-dev

I'd like to see this solved on the development version first.

Log in or register to post comments

Comment #21

catch

he/him

English

commented 30 October 2007 at 23:09

Status:

Needs review

» Needs work

One out of two hunks failed.

Log in or register to post comments

Comment #22

mfer commented 1 December 2008 at 23:22

Version:

6.x-dev

» 7.x-dev

I just ran into this. We should solve in in D7 and then backport the fix.

Log in or register to post comments

Comment #23

dave reid

he/him

English

Nebraska USA

commented 1 December 2008 at 23:55

I wonder if the '@' character would pass the PHP-5 function that we used to replace the internals of valid_email_address:

filter_var($url, FILTER_VALIDATE_URL);

Log in or register to post comments

Comment #24

mfer commented 2 December 2008 at 02:27

@Dave Reid I believe you mean the issue #308977: Replace valid_url with filter_var and FILTER_VALIDATE_URL

I'm wondering the same thing. What level of url checking do we expect out of valid_url()?

Log in or register to post comments

Comment #25

mfer commented 26 January 2009 at 02:04

Status:

Needs work

» Fixed

This was fixed when #308977: Replace valid_url with filter_var and FILTER_VALIDATE_URL went in.

Log in or register to post comments

Comment #26

dave reid

he/him

English

Nebraska USA

commented 26 January 2009 at 02:10

Status:

Fixed

» Needs work

@mfer That didn't actually get committed, it was won't fix. :)

Log in or register to post comments

Comment #27

mfer commented 26 January 2009 at 02:14

Status:

Needs work

» Fixed

oops, I meant #124492: valid_url() does not support all valid URL characters fixed this issue.

Log in or register to post comments

Comment #28

9 February 2009 at 02:20

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #29

thesurger commented 14 October 2011 at 19:13

Why won't this url work????
Figured out that it is the "=\" in the following url
http://wirelessweek.com/rss.aspx?type=cat&path=\carriers

Confused with what to use from the above comments...

Can someone please help me out?
Thank you.

Using Drupal 7 - Aggregator (Core module)
Found this in "common.inc"

function valid_url($url, $absolute = FALSE) {
  if ($absolute) {
    return (bool)preg_match("
      /^                                                      # Start at the beginning of the text
      (?:ftp|https?|feed):\/\/                                # Look for ftp, http, https or feed schemes
      (?:                                                     # Userinfo (optional) which is typically
        (?:(?:[\w\.\-\+!$&'\(\)*\+,;=]|%[0-9a-f]{2})+:)*      # a username or a username and password
        (?:[\w\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})+@          # combination
      )?
      (?:
        (?:[a-z0-9\-\.]|%[0-9a-f]{2})+                        # A domain name or a IPv4 address
        |(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\])         # or a well formed IPv6 address
      )
      (?::[0-9]+)?                                            # Server port number (optional)
      (?:[\/|\?]
        (?:[\w#!:\.\?\+=&@$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2})   # The path and query (optional)
      *)?
    $/xi", $url);
  }
  else {
    return (bool)preg_match("/^(?:[\w#!:\.\?\+=&@$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2})+$/i", $url);
  }
}

Log in or register to post comments

Comment #30

thesurger commented 14 October 2011 at 20:13

Category:	bug	» support
Status:	Closed (fixed)	» Needs work

Help, tried all of these, cannot get the "=\" to work in the query portion of the url

What I need=
http://wirelessweek.com?type=cat&path=\carriers

What I tried
---------------
http://wirelessweek.com?type=cat&path=%92carriers --> ascii
http://wirelessweek.com?type=cat&path=&92;carriers --> html
http://wirelessweek.com?type=cat&path=0x5Ccarriers --> hex

None of these will pull a feed from the above source.
Willing to make a change to the common.inc file, but don't understand all the short hand used in the drupal 7 version of this file.

Log in or register to post comments

Comment #31

thesurger commented 14 October 2011 at 20:49

Priority:

Minor

» Normal

BTW: just edited common.inc and set value $url="http://example.com" and was able to pass validation.

HOWEVER, now when I run the aggregator update it now gives a message saying "Mismatched tag" on line 143.
Module I am using is Core Drupal 7 - Aggregator.

Note: this same link worked in Drupal 6.22
What do I have to get it to work now that I upgraded to Drupal 7?

Log in or register to post comments

Comment #32

pasqualle

🇭🇺 Budapest

commented 10 October 2013 at 12:11

Status:

Needs work

» Closed (fixed)

the original issue was fixed, for further discussion see #295021: Expand/augment UrlHelper::isValid() to improve correctness and handle more URL types

Log in or register to post comments

valid_url() doesn't accept URLs with @ in it

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Comment #27

Comment #28

Comment #29

Comment #30

Comment #31

Comment #32

News items

Our community

Documentation

Drupal code base

Governance of community