By robertdouglass on
I want to write a regexp that validates web links. It's going pretty well so far, but I've identified a couple cases where the regexp fails to see a bad URL. Here's the code:
if (!preg_match(
// The protocols: http://
'/^((https|http|ftp|news):\/\/)?'.
// domains
'(([a-z]([a-z0-9\-_]*\.)+)'.
'(aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|[a-z]{2})'.
'(\/[a-z0-9_\-\.~]+)*'.
'(\/([a-z0-9_\-\.]*)(\?[a-z0-9+_\-\.\/%=&]*)?)?)'.
// OR ip addresses
'|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'.
// port number
'(:([0-9]{1,4}))'.
// forward slash 0 or 1 times
'((\/)|(\/(.*)))?'.
// end of the expression, case insensitive
'$/i', $text, $m)) {
return false;
}
and here are the two examples that fail:
$text = 'drupal.org:';
$text = 'http://www.yahoo.com:80abc';
Thanks for any suggestions!
Comments
This one fails to raise an error as well
- Robert Douglass
-----
My Drupal book: Building Online Communities with Drupal, phpBB and WordPress
Difficult
Try adding parenthesis around the OR block so that it looks like
( domain | ipaddress )
The port should be optional for both the IP and the domain.
What is the part '(\/([a-z0-9_\-\.]*)(\?[a-z0-9+_\-\.\/%=&]*)?)?)'. used for? Seems to be redundant on first sight.
i often use this site as a
i often use this site as a starting point
http://regexlib.com/Search.aspx?k=url
you might peruse the regexps on that page. Each regex tells you what it does & doesn't match.
I like to use
The Regex Coach for testing/developing my regular expressions
dado