The code in http://api.drupal.org/api/function/valid_url/7 needs to support IDN (International domain names). The line that needs to be fixed should be:

(?:[a-z0-9\-\.]|%[0-9a-f]{2})+                        # A domain name or a IPv4 address

I have no ready regex for this validation... maybe someone else?

Comments

mfer’s picture

Yes, we need to fix this. There are two parts to this...

First, would the name valid_url be correct? A url is a subset of a uri. That does not allow international characters (only ascii). Instead of a uri we would use an iri and the subset of that is a irl. This may be a matter of semantics but I'm still asking.

The current implementation is based on the spec RFC 3986. This is for uris. The iri spec is RFC 3987 and is still a draft/proposed standard. That being said, International domain names are out in the wild so this is a must.

We need to update the domain name and the path. What about the schema portion (http part)?

I think we need to replace \w with \pL_, a-z with \pL and 0-9 with \pN.

If someone writes up some tests for this I'll update the regex (unless someone else wants to).

hass’s picture

IDN support would only require an update to the domain name/hostname validation... the other parts don't need to change. I would also need http://drupal.org/node/295021#comment-1235860.

mfer’s picture

@hass - well, if we are going to go international should we limit it to IDN or flat out allow routable urls like http://例え.テスト/メインページ (ICANN site)?

If we are going to allow international characters, and we should, we should allow them everywhere they will be used in a url, irl, or what ever.

hass’s picture

How should we ever check this with a regex? :-)

alexanderpas’s picture

Issue tags: +IDN
alexanderpas’s picture

Status: Active » Postponed
dropcube’s picture

Subscribe

marcvangend’s picture

#389278: Create IDN encoding and decoding functions has been moved to D8 with priority 'normal'. What to do with this issue?

mfer’s picture

Status: Postponed » Closed (works as designed)

@marcvangend I'm marking this issue 'by design'. The intent of valid_url is to validate against urls. We are now talking about the iri space and not the uri space.

So, the current setup is by design. The path forward of encoding/decoding along with validation to handle idns is in that other issue. We can work from there.