valid_url() only does some very simple syntex checks to verify whether the URL is valid and thus accepts some malformed URLs. Still it does not allow all valid URLs (see e.g. #124492: valid_url() does not support all valid URL characters).

Only certain URL schemes are accepted (http, https and ftp), though other schemes may be relevant in various use cases (like this: #214516: Add RTSP to default list of allowed protocols).

I tried to expand the function with a number of checks controlled by options.

What do you think - is this totally overkill, or is it worth exploring? The code is still work in progress.

CommentFileSizeAuthor
#2 testValidUrl.2.patch3.05 KBgrndlvl
#1 testValidUrl.patch2.98 KBgrndlvl
url.txt5.74 KBc960657

Comments

grndlvl’s picture

StatusFileSize
new2.98 KB

created test for valid_url() used your proposed function for testing, function has some errors.

Here are my test results and the patch for the test is included.

grndlvl’s picture

StatusFileSize
new3.05 KB

test patch will not work with this function, made alterations after creating the patch to work with mentioned function, will roll different patch that works with the following function tomorrow.

** Note this message does not affect the test results show.

nancydru’s picture

Category: feature » bug
Priority: Minor » Normal

Invalid URL checking is not a feature - it's a bug. My users have reported that carets (^) do not work. Indeed they are not checked for in 5.x, 6.x, or 7.x. I have trouble reading "preg" patterns so I can't tell if your proposed patch fixes this.

Since this is causing "valid_url()" to fail, I'm changing this to a bug and hope it gets not only committed to 7, but all the way back to 5.

Here's one to add to your test:

http://thoth.lib.ucalgary.ca/uhtbin/cgisirsi/0/X/0/57/5/3?searchdata1=^C2671933&searchfield1=GENERAL^SUBJECT^GENERAL^^&user_id=WEBSERVER 
nancydru’s picture

I just checked, your patch does not include the caret.

  if (preg_match('`[^a-z0-9\-._~%^!$&\'()*+,;=/?:@[\]]`i', $$part, $reg)) {
nancydru’s picture

The patch (with the caret fix) is now incorporated into the Web Links module and seems to work fine.

May I suggest some way of taking a boolean also for the second param so that it doesn't change the current API?

damien tournoud’s picture

@NancyDru: RFC1738 tells us that:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

The caret symbol is not one of them and shouldn't appear in an URL unencoded.

nancydru’s picture

The example I gave above is from an actual issue in the Web Links queue. It is created from a large and active web site. Do you wish to inform them that their web site is invalid?

BTW, valid_url, in its present form, also fails when then caret is encoded.

nancydru’s picture

So, I went back and removed the caret that I added and tested, and this patch appears to have a debugging lone left in, because it showed above the heading a var_dump type line: array(1) { [0]=> string(1) "^" }

When I add a drupal_urlencode() before valid_url, I get

http%3A/%252Fthoth.lib.ucalgary.ca/uhtbin/cgisirsi/0/X/0/57/5/3%3Fsearchdata1%3D%5EC2671933%2526searchfield1%3DGENERAL%5ESUBJECT%5EGENERAL%5E%5E%2526user_id%3DWEBSERVER

which still fails to validate.

nancydru’s picture

Just to let you know that function valid_url($url, array $options = array()) { causes a WSOD in PHP 5.2.6.

flickerfly’s picture

For those who see the bug side of this, you may want to watch: http://drupal.org/node/124492 . mfer has come up with a fix that is getting worked through the system towards D7, but also with an eye to back-port to D5 and D6.

For here, other urls to test and extend for: tel:// (for phone numbers) and xmpp:// (for jabber IM)
These may be useful in a social networking scenario, among others. Also, moving towards the mobile browsers that are also phones, these could become more desired.

The XMPP URI stuff is something I had a very small part in a long time ago, but you can find more information here: http://wiki.jabber.org/index.php/XMPP_URIs and the RFC here: http://tools.ietf.org/html/rfc4622.

The tel URI is explained in http://www.ietf.org/rfc/rfc3966.txt and http://www.ietf.org/rfc/rfc2806.txt

Here are some examples of URL's to test against provided by http://msdn.microsoft.com/en-us/library/ms709071(VS.85).aspx and http://xmpp.org/extensions/xep-0147.html.

callto:192.168.103.77+type=ipcallto:someone@example.com+type=directory
callto:msils/someone@example.com+type=directory
callto:msils:1002/someone@example.com+type=directory
callto:12345+type=phone
callto:12345+gateway=fusion+type=phone
callto:someone@example.com
callto:12345+type=phone

xmpp:romeo@montague.net?message
xmpp:romeo@montague.net?message;subject=Test%20Message;body=Here%27s%20a%20test%20message

mfer’s picture

I propose a 4 part proposal for moving forward with better validation. Instead of expanding valid_url to do everything we add some additional validation functions.

  • We expand valid_url for additional schemes. This would allow us to have urls that start with ical:// and others.
  • We expand valid_url for to allow international characters and international domains. Then we could have urls like http://例え.テスト/メインページ. These really aren't urls so much as they are irls. The url provided here works and is routable.
  • Provide a more generic IRI validation function that can handle the cases in #10.
  • Make field validation configurable. So, you can choose which validation function to use. (This may already be happening as part of the fields in core effort).

The proposed spec for IRIs is at http://tools.ietf.org/html/rfc3987. Thoughts?

flickerfly’s picture

I like the sound of mfer's solution. Configurability means flexibility without compromise to security. So would we provide for custom validation of the URL, meaning the user puts in their own regex? I'm thinking something like the date/time config that allows custom time config.

My guts says "really bad idea", but might as well toss it on the table so that can get slaughtered logically instead of just by my gut reaction. :-)

mfer’s picture

@flickerfly for url/uri validation I don't think we should allow the users to input their own regex. As part of the fields in core effort there are validators for fields. Developers can assign those. I want to see less configuration options. Maybe an option to set the allow schemes that has now web interface.

hass’s picture

hass’s picture

About extending the function like discussed here, I'd like to please you to make sure module developers are able to define what protocols are valid via params. For e.g. I'd like to reuse the function in my modules, but make sure only http/https is allowed. I need this because I' like to validate the URL prior to using it much later in drupal_http_request().

alexanderpas’s picture

Issue tags: +IDN
alexanderpas’s picture

dropcube’s picture

Subscribe

sun’s picture

Title: Extend valid_url() » valid_url() accepts malformed URLs and rejects not all valid URLs
Version: 7.x-dev » 8.x-dev
Issue tags: +Needs backport to D7

Let's start with a re-roll of this patch against D8. Any takers?

neclimdul’s picture

In Drupal 8 we should probably be using filter_var($url, FILTER_VALIDATE_URL) as a more robust solution.

neclimdul’s picture

actually, we can do that in 7 too, is there a reason we aren't? Anyone know some history here?

alexanderpas’s picture

In Drupal 8 we should probably be using filter_var($url, FILTER_VALIDATE_URL) as a more robust solution.

we would still need to check the protocol, and use idn_to_ascii() to handle IDN domain names

pasqualle’s picture

http://mathiasbynens.be/demo/url-regex
this seems like a good list of urls to test against

mfer’s picture

@alexanderpas FILTER_VALIDATE_URL isn't a real validation option. It passes the work to parse_url which notes it can't be used for validation. URLs that are invalid pass it. If you look at the implementation it doesn't work.

Also, idn_to_ascii() is part of an add on for PHP and not part of PHP proper. Using that would create a dependency on functionality not there is many environments. Is the idea to add another dependency to Drupal?

@Pasqualle I had no idea about that page but it's great. Y'all should consider implementing the one from diegoperini if it passes all the tests.

I would love to see idn support and there are some other tickets for that floating around.

Mixologic’s picture

idn_to_ascii() is merely an implementation of the punycode. Urls could be converted to their punycode equivalents to check for validity.

More info here:
https://drupal.org/node/389278

dave reid’s picture

Title: valid_url() accepts malformed URLs and rejects not all valid URLs » filter_var() with FILTER_VALIDATE_URL accepts malformed URLs and rejects not all valid URLs

Agreed, we have the same problem with filter_var(). I would suspect we need to look into Symfony's UrlValidator class as it uses a regex approach, and may work better than our current standard. Or should be improved via pull request to add better support. I'd rather us re-use an existing code like that rather than implement something custom.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

avpaderno’s picture

Version: 8.9.x-dev » 9.1.x-dev

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

catch’s picture

Title: filter_var() with FILTER_VALIDATE_URL accepts malformed URLs and rejects not all valid URLs » Expand/augment UrlHelper::isValid() to improve correctness and handle more URL types
Category: Bug report » Task
Issue tags: +Needs tests, +Needs issue summary update

Core uses a (modified since this issue was opened) regular expression, not filter_var(), so the issue title here is wrong.

The two bugs in the issue summary were fixed in the meantime, I think we need an updated issue summary with which valid URLs are not allowed by UrlHelper::isValid() and which valid URLs are rejected. These could then be moved into test coverage and we could go from there. The idea in #24 seems fine but is an API addition rather than a bugfix.

catch’s picture

Issue tags: +Bug Smash Initiative

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.