Download & Extend

valid_url() marks correct IDN domains as invalid

Project:Drupal core
Version:7.x-dev
Component:base system
Category:bug report
Priority:critical
Assigned:Unassigned
Status:closed (works as designed)
Issue tags:IDN

Issue Summary

The code in http://api.drupal.org/api/function/valid_url/7 needs to support IDN (International domain names). The line that needs to be fixed should be:

(?:[a-z0-9\-\.]|%[0-9a-f]{2})+                        # A domain name or a IPv4 address

I have no ready regex for this validation... maybe someone else?

Comments

#1

Yes, we need to fix this. There are two parts to this...

First, would the name valid_url be correct? A url is a subset of a uri. That does not allow international characters (only ascii). Instead of a uri we would use an iri and the subset of that is a irl. This may be a matter of semantics but I'm still asking.

The current implementation is based on the spec RFC 3986. This is for uris. The iri spec is RFC 3987 and is still a draft/proposed standard. That being said, International domain names are out in the wild so this is a must.

We need to update the domain name and the path. What about the schema portion (http part)?

I think we need to replace \w with \pL_, a-z with \pL and 0-9 with \pN.

If someone writes up some tests for this I'll update the regex (unless someone else wants to).

#2

IDN support would only require an update to the domain name/hostname validation... the other parts don't need to change. I would also need http://drupal.org/node/295021#comment-1235860.

#3

@hass - well, if we are going to go international should we limit it to IDN or flat out allow routable urls like http://例え.テスト/メインページ (ICANN site)?

If we are going to allow international characters, and we should, we should allow them everywhere they will be used in a url, irl, or what ever.

#4

How should we ever check this with a regex? :-)

#5

#6

Status:active» postponed

postponed until #389278: Create IDN encoding and decoding functions is in.

#7

Subscribe

#8

#389278: Create IDN encoding and decoding functions has been moved to D8 with priority 'normal'. What to do with this issue?

#9

Status:postponed» closed (works as designed)

@marcvangend I'm marking this issue 'by design'. The intent of valid_url is to validate against urls. We are now talking about the iri space and not the uri space.

So, the current setup is by design. The path forward of encoding/decoding along with validation to handle idns is in that other issue. We can work from there.