Closed (fixed)
Project:
Drupal core
Version:
5.x-dev
Component:
path.module
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
18 Nov 2006 at 04:23 UTC
Updated:
5 Dec 2006 at 19:45 UTC
Jump to comment: Most recent file
Comments
Comment #1
Steven commentedThings to know:
This means, the URLs that result from user defined menu paths and aliases will always be valid, even menu paths that use punctuation like "#" or "!" or even random Unicode characters.
e.g.
Path/Alias =
blog/Bunnies are made of people!?Resulting URI =
http://example.com/base-path/?q=blog/Bunnies+are+made+of+people%21%3FPath/Alias =
blog/My résuméResulting URI =
http://example.com/base-path/?q=blog/My+r%C3%A9sum%C3%A9Path/Alias =
blog/アニメResulting URI =
http://example.com/base-path/?q=blog/%E3%82%A2%E3%83%8B%E3%83%A1In spite of this, path.module requires that path aliases contain only characters valid in relative URLs. This makes no sense. The attached path removes this restriction.
This is a necessary step towards allowing e.g. pathauto to support arbitrary languages. The current practice of transliteration of letters to ASCII and removal of accents is a hack which produces 'prettier URLs', but which are less meaningful to search engines. It is also useless for languages which do not use the latin script.
Note that the 'odd' escapes for the Unicode characters above is perfectly normal. This is the standard used for IRIs (the i18n'd form of URIs, see RFC 3987) and supported by all the major browsers and search engines.
However, because of phishing abuse, some browsers will not show the Unicode characters in some or all IRIs in the address bar and/or status bar. e.g. Japanese Wikipedia on Google.
Comment #2
chx commentedLovely patch. Less restrictions, more features, less code, more comments.
Comment #3
dries commentedCommitted to CVS HEAD! :)
Comment #4
(not verified) commented