Here's another yet another code cleanup for valid_url() this time, as found in common.inc.

I copied the regex for the domain from my patch for valid_email_address() and made sure the function returns a boolean. I also added [] to the allowed characters and relative paths cannot begin with a scheme (i.e. http://) anymore.

Since there is still no common.test available* to diff to, I'll post the test for it here (again):

class CommonValidURLTestCase extends DrupalWebTestCase {

  /**
   * Implementation of getInfo().
   */
  function getInfo() {
    return array(
      'name' => t('Common URL validator test'),
      'description' => t('Check predefined URI\'s for validity.'),
      'group' => t('System')
    );
  }

  /**
   * Implementation of setUp().
   */
  function setUp() {
    $this->uris = array( // 'uri' => array((bool)<absolute>, (bool)<expected result>);
      'http://example.com'    => array('absolute' => TRUE, 'test' => 'assertTrue'),
      'http://example.com:81' => array('absolute' => TRUE, 'test' => 'assertTrue'),
      'http://example.com/a'  => array('absolute' => TRUE, 'test' => 'assertTrue'),
      'unknown://example.com' => array('absolute' => TRUE, 'test' => 'assertTrue'),
      'ed2k://example.com'    => array('absolute' => TRUE, 'test' => 'assertTrue'),
      'ed-2k://example.com'   => array('absolute' => TRUE, 'test' => 'assertFalse'),
      'example-foo'           => array('absolute' => TRUE, 'test' => 'assertFalse'),
      'example.com/'          => array('absolute' => FALSE, 'test' => 'assertTrue'),
      'http://ex-ample.com'   => array('absolute' => FALSE, 'test' => 'assertFalse'),
      'example-bar'           => array('absolute' => FALSE, 'test' => 'assertFalse'),
      'http://example.com:81/~user?var=value&var2[]=value#id' => array('absolute' => TRUE, 'test' => 'assertTrue'),
      'example.com:81/~user?var=value&var2[]=value#id' => array('absolute' => FALSE, 'test' => 'assertTrue'),
    );
    parent::setUp();
  }
  
  function testCommonValidUrl() {
    foreach ($this->uris as $uri => $attributes) {
      $absolute = ($attributes['absolute']) ? 'absolute' : '';
      $this->$attributes['test'](
        valid_url($uri, $attributes['absolute']),
        'Testing ' . $absolute. ' url: \'' . $uri . '\' for validity. %s'
      );
    }
  }
}

* http://drupal.org/node/265563, http://drupal.org/node/151902 and http://drupal.org/node/265548, http://drupal.org/node/267883

CommentFileSizeAuthor
#3 valid_url.patch3.52 KBmrharolda
valid_url.patch1.12 KBmrharolda

Comments

mrharolda’s picture

Title: Replacement code and tests for valid_url() (+tests) » Replacement code and tests for valid_url()

typo in title...

catch’s picture

Status: Needs review » Needs work

Now that common.test exists, this needs a reroll so it can be reviewed as a single patch.

mrharolda’s picture

Status: Needs work » Needs review
StatusFileSize
new3.52 KB

Your wish is my command... ;)

I also added a few more test cases and cleaned the regex up a little...

catch’s picture

Status: Needs review » Needs work

All the tests pass.

We normally use double quotes for escaping "'" instead of single quotes and \', especially inside t() for the benefit of translators. Otherwise this looks good to me although I'm no regexp expert.

As a side note - if we're going to be adding tests for lots of individual functions should we name them after the function? So instead of Common URL validator test just valid_url(). These tests are really for developers so I think a function name is more helpful. There's a discussion about standards in groups, so probably better there than in this issue: http://groups.drupal.org/node/11020#comment-39531

mrharolda’s picture

Status: Needs work » Needs review

Changed the single quotes for better readability and translations...

As for the side note: As soon as Drupal gets an official standard concerning simpletests, I'll update these tests to reflect them.

c960657’s picture

Status: Needs review » Needs work

+ 'example-bar' => array('absolute' => FALSE, 'test' => 'assertFalse'),
Why is example-bar not a valid relative URL?

http://foo@example.net is a valid absolute URL, but the function claims it is not.
"" (the empty string) is a valid absolute URL, but the function claims it is not.
http://example.org:80:80 is not a valid URL, but the function claims it is.
foo*bar is a valid relative URL, but the function claims it is not.
http://foo is a valid absolute URL, but the function claims it is not (but perhaps this is intentional - in that case it should probably be mentioned in the documentation).

I think you need to take a look at RFC 3986. Appendix B contains some information about parsing URLs using regular expressions. The individual parts can be validated using regular expressions based on the ABNF rules in the RFC. Perhaps it is not worth the effort checking all rules mentioned in the RFC, but valid_url() should at least accept all valid URLs (or, if there are exceptions, these should be mentioned explicitly in the comments).

The documentation of the $absolute parameter is not clear about what should happen, if an absolute is supplied and $absolute == FALSE. Currently, when $absolute == TRUE, only absolute URLs are accepted, and when $absolute == FALSE, both relative and absolute URLs are accepted. With your patch, $absolute == FALSE no longer accepts absolute URLs. I don't know whether this is intentional or not, but in either case the documentation should probably be elaborated a bit. If the behaviour is kept unchanged, perhaps $absolute could be renamed to $require_absolute for clarity.

c960657’s picture

Status: Needs work » Closed (duplicate)

The regexp in valid_url() was changed and some tests were added in #124492: valid_url() does not support all valid URL characters.

Closing as duplicate.