Posted by hass on February 3, 2009 at 11:56am
5 followers
Jump to:
| Project: | Drupal core |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | closed (works as designed) |
| Issue tags: | IDN |
Issue Summary
The code in http://api.drupal.org/api/function/valid_url/7 needs to support IDN (International domain names). The line that needs to be fixed should be:
(?:[a-z0-9\-\.]|%[0-9a-f]{2})+ # A domain name or a IPv4 addressI have no ready regex for this validation... maybe someone else?
Comments
#1
Yes, we need to fix this. There are two parts to this...
First, would the name valid_url be correct? A url is a subset of a uri. That does not allow international characters (only ascii). Instead of a uri we would use an iri and the subset of that is a irl. This may be a matter of semantics but I'm still asking.
The current implementation is based on the spec RFC 3986. This is for uris. The iri spec is RFC 3987 and is still a draft/proposed standard. That being said, International domain names are out in the wild so this is a must.
We need to update the domain name and the path. What about the schema portion (http part)?
I think we need to replace \w with \pL_, a-z with \pL and 0-9 with \pN.
If someone writes up some tests for this I'll update the regex (unless someone else wants to).
#2
IDN support would only require an update to the domain name/hostname validation... the other parts don't need to change. I would also need http://drupal.org/node/295021#comment-1235860.
#3
@hass - well, if we are going to go international should we limit it to IDN or flat out allow routable urls like http://例え.テスト/メインページ (ICANN site)?
If we are going to allow international characters, and we should, we should allow them everywhere they will be used in a url, irl, or what ever.
#4
How should we ever check this with a regex? :-)
#5
#6
postponed until #389278: Create IDN encoding and decoding functions is in.
#7
Subscribe
#8
#389278: Create IDN encoding and decoding functions has been moved to D8 with priority 'normal'. What to do with this issue?
#9
@marcvangend I'm marking this issue 'by design'. The intent of valid_url is to validate against urls. We are now talking about the iri space and not the uri space.
So, the current setup is by design. The path forward of encoding/decoding along with validation to handle idns is in that other issue. We can work from there.