Remove restrictions on path aliases (support IRIs)
greggles - November 18, 2006 - 04:23
| Project: | Drupal |
| Version: | 5.x-dev |
| Component: | path.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Description
In my brief testing it's impossible to create a URL alias that includes characters which should be allowed.
In IRC UnConeD also pointed out that "core is broken" in this regard.

#1
Things to know:
This means, the URLs that result from user defined menu paths and aliases will always be valid, even menu paths that use punctuation like "#" or "!" or even random Unicode characters.
e.g.
Path/Alias =
blog/Bunnies are made of people!?Resulting URI =
http://example.com/base-path/?q=blog/Bunnies+are+made+of+people%21%3FPath/Alias =
blog/My résuméResulting URI =
http://example.com/base-path/?q=blog/My+r%C3%A9sum%C3%A9Path/Alias =
blog/アニメResulting URI =
http://example.com/base-path/?q=blog/%E3%82%A2%E3%83%8B%E3%83%A1In spite of this, path.module requires that path aliases contain only characters valid in relative URLs. This makes no sense. The attached path removes this restriction.
This is a necessary step towards allowing e.g. pathauto to support arbitrary languages. The current practice of transliteration of letters to ASCII and removal of accents is a hack which produces 'prettier URLs', but which are less meaningful to search engines. It is also useless for languages which do not use the latin script.
Note that the 'odd' escapes for the Unicode characters above is perfectly normal. This is the standard used for IRIs (the i18n'd form of URIs, see RFC 3987) and supported by all the major browsers and search engines.
However, because of phishing abuse, some browsers will not show the Unicode characters in some or all IRIs in the address bar and/or status bar. e.g. Japanese Wikipedia on Google.
#2
Lovely patch. Less restrictions, more features, less code, more comments.
#3
Committed to CVS HEAD! :)
#4