Extend valid_url()
c960657 - August 13, 2008 - 21:53
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | IDN |
Description
valid_url() only does some very simple syntex checks to verify whether the URL is valid and thus accepts some malformed URLs. Still it does not allow all valid URLs (see e.g. #124492: valid_url() does not support all valid URL characters).
Only certain URL schemes are accepted (http, https and ftp), though other schemes may be relevant in various use cases (like this: #214516: Add RTSP to default list of allowed protocols).
I tried to expand the function with a number of checks controlled by options.
What do you think - is this totally overkill, or is it worth exploring? The code is still work in progress.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| url.txt | 5.74 KB | Ignored | None | None |

#1
created test for valid_url() used your proposed function for testing, function has some errors.
Here are my test results and the patch for the test is included.
#2
test patch will not work with this function, made alterations after creating the patch to work with mentioned function, will roll different patch that works with the following function tomorrow.** Note this message does not affect the test results show.
#3
Invalid URL checking is not a feature - it's a bug. My users have reported that carets (^) do not work. Indeed they are not checked for in 5.x, 6.x, or 7.x. I have trouble reading "preg" patterns so I can't tell if your proposed patch fixes this.
Since this is causing "valid_url()" to fail, I'm changing this to a bug and hope it gets not only committed to 7, but all the way back to 5.
Here's one to add to your test:
http://thoth.lib.ucalgary.ca/uhtbin/cgisirsi/0/X/0/57/5/3?searchdata1=^C2671933&searchfield1=GENERAL^SUBJECT^GENERAL^^&user_id=WEBSERVER#4
I just checked, your patch does not include the caret.
if (preg_match('`[^a-z0-9\-._~%^!$&\'()*+,;=/?:@[\]]`i', $$part, $reg)) {#5
The patch (with the caret fix) is now incorporated into the Web Links module and seems to work fine.
May I suggest some way of taking a boolean also for the second param so that it doesn't change the current API?
#6
@NancyDru: RFC1738 tells us that:
The caret symbol is not one of them and shouldn't appear in an URL unencoded.
#7
The example I gave above is from an actual issue in the Web Links queue. It is created from a large and active web site. Do you wish to inform them that their web site is invalid?
BTW, valid_url, in its present form, also fails when then caret is encoded.
#8
So, I went back and removed the caret that I added and tested, and this patch appears to have a debugging lone left in, because it showed above the heading a var_dump type line:
array(1) { [0]=> string(1) "^" }When I add a drupal_urlencode() before valid_url, I get
http%3A/%252Fthoth.lib.ucalgary.ca/uhtbin/cgisirsi/0/X/0/57/5/3%3Fsearchdata1%3D%5EC2671933%2526searchfield1%3DGENERAL%5ESUBJECT%5EGENERAL%5E%5E%2526user_id%3DWEBSERVERwhich still fails to validate.
#9
Just to let you know that
function valid_url($url, array $options = array()) {causes a WSOD in PHP 5.2.6.#10
For those who see the bug side of this, you may want to watch: http://drupal.org/node/124492 . mfer has come up with a fix that is getting worked through the system towards D7, but also with an eye to back-port to D5 and D6.
For here, other urls to test and extend for: tel:// (for phone numbers) and xmpp:// (for jabber IM)
These may be useful in a social networking scenario, among others. Also, moving towards the mobile browsers that are also phones, these could become more desired.
The XMPP URI stuff is something I had a very small part in a long time ago, but you can find more information here: http://wiki.jabber.org/index.php/XMPP_URIs and the RFC here: http://tools.ietf.org/html/rfc4622.
The tel URI is explained in http://www.ietf.org/rfc/rfc3966.txt and http://www.ietf.org/rfc/rfc2806.txt
Here are some examples of URL's to test against provided by http://msdn.microsoft.com/en-us/library/ms709071(VS.85).aspx and http://xmpp.org/extensions/xep-0147.html.
callto:192.168.103.77+type=ipcallto:someone@example.com+type=directory
callto:msils/someone@example.com+type=directory
callto:msils:1002/someone@example.com+type=directory
callto:12345+type=phone
callto:12345+gateway=fusion+type=phone
callto:someone@example.com
callto:12345+type=phone
xmpp:romeo@montague.net?message
xmpp:romeo@montague.net?message;subject=Test%20Message;body=Here%27s%20a%20test%20message
#11
I propose a 4 part proposal for moving forward with better validation. Instead of expanding valid_url to do everything we add some additional validation functions.
The proposed spec for IRIs is at http://tools.ietf.org/html/rfc3987. Thoughts?
#12
I like the sound of mfer's solution. Configurability means flexibility without compromise to security. So would we provide for custom validation of the URL, meaning the user puts in their own regex? I'm thinking something like the date/time config that allows custom time config.
My guts says "really bad idea", but might as well toss it on the table so that can get slaughtered logically instead of just by my gut reaction. :-)
#13
@flickerfly for url/uri validation I don't think we should allow the users to input their own regex. As part of the fields in core effort there are validators for fields. Developers can assign those. I want to see less configuration options. Maybe an option to set the allow schemes that has now web interface.
#14
Only to notify you all: #368472: valid_url() marks correct IDN domains as invalid
#15
About extending the function like discussed here, I'd like to please you to make sure module developers are able to define what protocols are valid via params. For e.g. I'd like to reuse the function in my modules, but make sure only
http/httpsis allowed. I need this because I' like to validate the URL prior to using it much later indrupal_http_request().#16
#17
Only to notify you all: #389278: Create IDN encoding and decoding functions
#18
Subscribe