Drupal has a problem when clean urls are enabled.
Because mod_rewrite decodes the url query, anything in the query needs to be double encoded. The old code, and the code in the CVS patch you showed, only double encodes '%2f', '%26' and '%23' (most commonly used in URLs). However, it should double encode ALL %.
This problem can be seen with an attached file with a '+' in the name (eg, an attachment called 'apples+oranges.png'). This turns into '%2b' but the percent sign is not double encoded like it should be, and the link is broken with clean urls enabled.
This is corrected by the attached patch.
Comment | File | Size | Author |
---|---|---|---|
#160 | clean-url-284899-02.patch | 562 bytes | mattconnolly |
#155 | file_create_url-D6-1.patch | 3.43 KB | c960657 |
#147 | file_create_url-61.patch | 27.89 KB | c960657 |
#146 | file_create_url-60.patch | 27.89 KB | c960657 |
#143 | file_create_url-59.patch | 27.91 KB | c960657 |
Comments
Comment #1
gpk CreditAttribution: gpk commented>the code in the CVS patch you showed
Are you referring to a patch here http://drupal.org/node/191116?
There is also a NE (noescape) flag to the RewriteRule directive (http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewriterule) which I use when I use mod_rewrite to force a leading www. on the domain name. (This prevents extra % encoding getting into the URL.) Is this any use here? Possibly not, in that 1) the issue is talking about mod_rewrite decoding not encoding, and 2) adding the NE flag to Drupal's main rewrite rule would probably cause all sorts of compatibility problems on site upgrade ...
(As you can see I've only got my head partially round this issue so apologies if I'm just muddying the water ..)
Comment #2
fonant CreditAttribution: fonant commentedThis bug seems to be reproducible quite easily using Drupal's standard search facility:
Without the above patch, searching for:
"Fred & Ginger" works
"Fred # Ginger" works
"Fred / Ginger" works
"Fred // Ginger" works
"Fred /// Ginger" works
"Fred + Ginger" fails, returning with "Fred Ginger"
"Fred %2b Ginger" fails, returning with "Fred + Ginger"
"Fred %65 Ginger" fails, returning with "Fred e Ginger"
After applying the above patch, all the above searches work.
Running with clean URLs via mod_rewrite under Apache 2.2.6.
Comment #3
c960657 CreditAttribution: c960657 commentedMarking as duplicate of #238299: file_create_url should return valid URLs that has a patch as well as a test case. The patch passes all the Fred & Ginger examples in comment #2.
I don't think the NE mod_rewrite flag is of any use here. The B flag looks like it does the trick (I haven't tried), but that is only available as of Apache 2.2:
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
Comment #4
c960657 CreditAttribution: c960657 commentedThe B flag is actually only available as of Apache 2.2.8 (see "Add option to suppress URL unescaping" on http://www.apache.org/dist/httpd/CHANGES_2.2). But it may allow us to support even cleaner URLs.
Comment #5
mattconnolly CreditAttribution: mattconnolly commentedc960657: Although there is some significant overlap, I disagree that this is a duplicate.
The patch in #238299 improves the situation by explicitly fixing the '+' and '#'. The rest of the code is specifically to do with naming files, which is fine (and another issue).
My patch simplifies the drupal_urlencode function to correctly double encode any '%xx' symbol when using clean urls.
This will solve the clean url problem for any module that passes strings by means of a URL, not just naming attached files. For example, this improves doing searches on a drupal site.
Comment #6
c960657 CreditAttribution: c960657 commentedThe patch in #238299 also modifies drupal_urlencode(). Doesn't this fix the problems reported in this issue?
Comment #7
mattconnolly CreditAttribution: mattconnolly commentedThe code in the other patch has to do with file names and may fix the problem specifically for file attachments. It does include a change to the drupal_urlencode function which means that a '+' and '#' are double encoded. This is an improvement, however, the drupal_urlencode function only correctly double encodes '+' and '#'.
My patch ensures that all symbols are correctly double encoded when using clean urls for any function or module that uses the drupal_urlencode function. For example, this affects search query strings.
It is my opinion that both patches should be applied.
Comment #8
drewish CreditAttribution: drewish commentedwould need to go into HEAD and then be backported. i'd like to see better commenting on what's being done in the code.
Comment #9
mattconnolly CreditAttribution: mattconnolly commenteddrewish: Would you please have a look at the function drupal_urlencode in "includes/common.inc" and the comments that precede it.
There are quite adequate comments about the intended purpose of the function that already exist. In any case, I have made a slight adjustment to the comments to reflect the improved behaviour of the function.
Since this patch makes no alteration to the intended use of this function, I consider that the existing comments in the file are completely appropriate to describe the function. This patch improves the function to better implement what the original intention was.
Attached are patches for both 6.6 and 7.x-dev.
Comment #10
David StraussPatch for consideration = patch (code needs review)
Comment #11
c960657 CreditAttribution: c960657 commentedThe patch double encodes more characters than necessary. Encoding only the necessary characters results in cleaner URLs. E.g. searching for the string
foo!
generates the URL http://example.net/search/node/foo%2521 even though the shorter URL http://example.net/search/node/foo%21 works fine. I think URLs should be as clean as possible.Comment #12
mattconnolly CreditAttribution: mattconnolly commentedI disagree. I think simpler code that is more reliable is more important than how a URL looks to a user.
For example, the difference between a ' ' (space) and '+' in a URL are decoded exactly the same but could point to different file names for attachments (where I discovered the bug - and i've also seen this bug affect the apache_solr search module when the '+' is used as a query modifier).
If rawurlencode escapes the character then it will be unescaped twice with clean urls. Just because '%21' works doesn't mean it's right, it's not RFC 1738 compliant (for the second urldecode). This could cause problems with some browsers, servers, proxies, etc that might be expecting only RFC 1738 URLs.
Try this test code:
Without patch, this fails:
With the patch, it succeeds:
Remember, with clean urls on, the url is decoded twice: firstly by mod_rewrite and then a second time when it is parsed by PHP. Therefore the url should be RFC 1738 compliant at *both* decodes to guarantee that it works.
I don't believe that an extra 2 characters per non-RFC 1738 character is a problem. I'd rather my URLs actually work.
-Matt
Comment #13
mattconnolly CreditAttribution: mattconnolly commentedc960657: you're earlier post about the 'B' flag almost fixes the issue, but not quite. It still fails on characters that have special meaning in URLs, such as '%' and '&'. (I think this might actually be a bug in Apache, since it sounds like it should fix this problem.)
Test case: with [B] redirect option, there's no need to double encode in drupal_urlencode, so that function would then look like this:
1. user enters 'apples & oranges' in search field,
2. redirects to: http://drupal/search/node/apples%20%26%20oranges
3. search field shows 'apples ', PHP's $_GET variable shows:
array(2) {
["q"]=>
string(19) "search/node/apples "
["oranges"]=>
string(0) ""
}
Considering the [B] option in RewriteRule (1) doesn't always work, and (2) is not available to users running apache < 2.2.9; the only solution I can see to guarantee that URLs work with clean urls is to go ahead with my patch.
I've tested this on Apache 2.2.6 (Mac OS X) and Apache 2.2.9 (Ubuntu) with Drupal versions from 6.3 to 6.8 and current 7.x-dev
So do I change the status to "patch (reviewed & tested by the community)" or do I need someone else to test it too?
Comment #14
voxpelli CreditAttribution: voxpelli commentedI'm attaching a patch for .htaccess that contains a workaround for the broken B-flag (see Apache's bug report).
With my patch Drupal are getting the same data no matter if clean urls are used or not. As a result of this is it's not necessary for drupal_urlencode() to do double encoding.
Comment #15
voxpelli CreditAttribution: voxpelli commentedUpdate to my previous patch in #14 - it contained a mistake
Comment #17
Damien Tournoud CreditAttribution: Damien Tournoud commentedSeems to work on my test bed. Let's try to submit this one again.
Comment #19
voxpelli CreditAttribution: voxpelli commentedA new cleaned up version of the patch which I've tested locally and I'm able to run the tests - so if this one doesn't work then it's a flaw in the testbot(?)
Comment #20
voxpelli CreditAttribution: voxpelli commentedComment #22
c960657 CreditAttribution: c960657 commentedI like this approach. AFAICT it allows us to completely get rid of drupal_urlencode() etc. and gives us cleaner URLs.
The patch should remove the double escaping of special characters in drupal_urlencode() that are now no longer necessary.
Also, I suggest using this slightly optimized regexp in .htaccess:
^[^ ]+ ([^? ]+)
Comment #23
drewish CreditAttribution: drewish commentedwhich versions of apache is this compatible with? and which has this been tested on?
Comment #24
c960657 CreditAttribution: c960657 commentedThe THE_REQUEST variable was present in the source code for Apache 1.3.0, i.e. the first Apache 1.3 release from 1998. So I would assume that it is safe to use.
I have successfully tested with Apache 2.2.3 that it is included in Debian Etch.
Comment #25
voxpelli CreditAttribution: voxpelli commentedI've the same impression as c960657 that this should be supported by Apache 1.3 and up
I also updated the patch and removed the now unnecessary double encoding in drupal_encode() and updated the regular expression used in the .htaccess to a modified version of the one suggested by c960657
I'm still a little new on this patch making process - especially making a patch for multiple files. This one should work - but it seems to fail to automatically find includes/common.inc - any suggestions are welcome.
The test bot will probably complain about this one as well - I haven't had time to file a bug for it yet.
Comment #27
gpk CreditAttribution: gpk commentedRunning an old-ish version of XAMMP (details below) the modified rewrite rule seems to fail completely and I end up being redirected to the XAMPP home page on localhost. I also tried using the original .htaccess with the NE flag appended to the existing RewriteRule directive. While this didn't cause any visible problems with the site I couldn't get the example searches at #2 to work. Presumably this could do with some tests?
###### ApacheFriends XAMPP (basic package) version 1.6.2 ######
+ Apache 2.2.4
+ MySQL 5.0.41
+ PHP 5.2.2
Comment #28
voxpelli CreditAttribution: voxpelli commented@gpk: I tried out a fresh install of XAMPP 1.6.2 on Windows XP and after I had decommented/activated the rewrite-module in the httpd.conf file and restarted Apache I could run all of the normally unsuccessful searches in #2 with success in a Drupal 7 that's maximum 1-2 days old that has been patched with my latest patch.
Comment #29
gpk CreditAttribution: gpk commented@28: thx for quick reply. I did wonder if I'd had finger trouble, or maybe my copy of D7 is too old. I'll give it another whirl when I have a chance. The only other thing that comes to mind is that the path to the site for my install is something like [edited] http://localhost/drupal/drupal-7.x-dev-xxx/ but I don't see that should make any difrerence.
BTW what's the benefit of capturing the part of the path that will become $_GET['q'] in the additional RewriteCond rather than in the main RewriteRule?
Comment #30
voxpelli CreditAttribution: voxpelli commented@29: I can confirm that my patch has a problem when the page isn't placed directly at the root of a domain.
The benefit of capturing it through a RewriteCond is that they will be sent encoded to Drupal in the same way as a non clean url q would.
I will try to fix this and create a new patch when I have time in the coming week.
Comment #31
gpk CreditAttribution: gpk commented>sent encoded to Drupal in the same way as a non clean url q would
OK, I'm not quite clear why this doesn't work when capturing the path in the RewriteRule - I seemed to get something like this to work per #1 but it's either too late in the evening now or else mod_rewrite has for me reverted to completely opaque black magic!
Comment #32
c960657 CreditAttribution: c960657 commentedThis patch combines your patch with tests and Windows specific changes from #238299: file_create_url should return valid URLs (the tests are adjusted to the new drupal_urlencode()). It looks like the approach works, even with URLs containing unusual characters. If you think the Windows changes are outside the scope of this bug, we can pursue these in another issue if you like.
Don't forget to update the documentation for drupal_urlencode() to reflect the new simplified behaviour.
mod_rewrite being black magic is a known bug :-)
Comment #33
c960657 CreditAttribution: c960657 commentedComment #34
c960657 CreditAttribution: c960657 commentedvoxpelli, by submitting my last patch I didn't intend to "take control" over this issue. I was just contributing some tests that I made for another related issue (adjusted to check the now even clean URLs). So feel free to proceed. Hope you are still up to the task :-)
Comment #35
voxpelli CreditAttribution: voxpelli commented:) I will proceed as soon as I've got time to do so - hopefully this weekend
Comment #36
c960657 CreditAttribution: c960657 commentedThis removes the need to double-encode certain characters.
It turned out to be difficult to use THE_REQUEST when Drupal was installed in a subdirectory, so instead I initialize $_GET[q] in drupal_initialize_variables().
D6 users upgrading to D7 will experience that URLs containing characters that were previously double-encoded do not work after the upgrade. I don't know if this should just be noted in the release notes, or whether we should add some kind of automatic redirection. Comments?
Comment #37
c960657 CreditAttribution: c960657 commentedUse REQUEST_URI instead of REDIRECT_URL, because the latter did not work with Apache+PHP-CGI.
Comment #39
c960657 CreditAttribution: c960657 commentedReroll.
Comment #40
Dave ReidAdding tag.
Comment #42
c960657 CreditAttribution: c960657 commentedReroll.
Comment #43
Dave ReidWow...the hackiness is just flabbergasting. :/ I'm not sure how I feel about this.
Comment #44
c960657 CreditAttribution: c960657 commentedYes, it is an ugly hack :-( But the existing workaround for the mod_rewrite limitation isn't particularly pretty either, and – as mentioned in this issue and in #238299: file_create_url should return valid URLs – the workaround in drupal_urlencode() still has some issues.
Also, it seems to confuse both core and module developers that drupal_urlencode() should only be used when encoding the path component and not when encoding query string parameters (see #310139: drupal_query_string_encode() should not call drupal_urlencode() and e.g. #298498: Drupal assumes mod_rewrite bugs which are not there). The proposed patch offers better DX and allows users to use the same URL-encoding function for both the path and query string parameters.
Of course, it is required that the proposed patch actually works, not only on Apache but also on other web servers. Comments from people running non-Apache servers are welcome. The existing behaviour is targeted at working around an Apache limitation, so I'd be interested in hearing about whether clean URLs can be implemented on non-Apache servers with the existing and/or with the proposed behaviour.
Comment #46
c960657 CreditAttribution: c960657 commentedReroll.
Comment #47
sunHuh?
This would effectively make the entire URL rewriting superfluous. (?)
Comment #48
c960657 CreditAttribution: c960657 commentedYes, the intention with this patch is to move parsing of clean URLs from .htaccess to drupal_initialize_variables(). mod_rewrite still takes care of not invoking PHP if the request refers to an actual file or directory.
Comment #49
drewish CreditAttribution: drewish commentedwow... that would be pretty cool... i wonder what it would do for our ability to setup clean URLs on other webserver. it seems like it'd give us a lot more flexibility there.
Comment #50
aclight CreditAttribution: aclight commentedsubscribing
Comment #51
kkaefer CreditAttribution: kkaefer commentedsubscribe
Comment #52
c960657 CreditAttribution: c960657 commented@sun, would you care to elaborate on #47? Or did my answer in #48 address your concerns?
Comment #53
sunHm. So, effectively, this means we are taking over the URL rewriting for Apache, because Apache does not do it right under certain circumstances.
Yes, this might open up clean URL support to other web servers, as drewish mentioned.
However, I am concerned whether people will still be able to apply custom rewrite rules this way. Take this for instance of what people sometimes need or want to do:
Point being - doing this:
results in the following
$_SERVER
values:...but the current patch only checks REQUEST_URI. If we would implement support for REDIRECT_URL - it's perfectly possible that other web servers assign a different env variable.
Comment #55
c960657 CreditAttribution: c960657 commentedReroll.
WRT to making custom mod_rewrite rules, I suggest using the following form instead:
I'm not sure what you mean here. This is only intended to work around an Apache bug, and Apache always sets REQUEST_URI. We could allow other web servers to hook into the same workaround by using request_uri() instead, but the source code for that function doesn't imply that Drupal supports clean URLs on servers that do not set REQUEST_URI.
Non-Apache web servers can still use clean URLs by setting ?q= using some rewriting layer.
This patch has only been tested on Apache (on Windows with PHP running as Apache module, and on Linux with PHP running as CGI as well as Apache module). I would love to hear some test results from people running Drupal on other platforms.
Comment #57
c960657 CreditAttribution: c960657 commentedReroll.
Comment #59
c960657 CreditAttribution: c960657 commentedReroll.
Comment #61
c960657 CreditAttribution: c960657 commentedReroll.
Comment #63
c960657 CreditAttribution: c960657 commentedReroll.
Comment #65
c960657 CreditAttribution: c960657 commentedTest bot glitch.
Comment #67
c960657 CreditAttribution: c960657 commentedSetting to previous status - testbot was broken (failed to install HEAD).
Comment #69
c960657 CreditAttribution: c960657 commentedReroll.
I reverted the renaming of drupal_urlencode() to drupal_encode_path() and Drupal.encodeURIComponent to Drupal.encodePath() that was recently done in #310139: drupal_query_string_encode() should not call drupal_urlencode(). With this patch the functions can be used with paths as well as query string arguments, so the arguments for renaming them no longer applies.
Comment #70
c960657 CreditAttribution: c960657 commentedComment #72
c960657 CreditAttribution: c960657 commentedReroll.
Comment #74
c960657 CreditAttribution: c960657 commentedReroll.
Comment #75
Aron NovakI reviewed the code and tried it out and it seems to be solid.
Comment #76
sunThe PHPDoc summary should be on one line.
This needs a separate summary (that's a description).
Lastly, this patch may break the stream wrappers patch... so I suggest to defer it until that landed.
I'm on crack. Are you, too?
Comment #77
c960657 CreditAttribution: c960657 commentedThanks for the reviews.
Updated patch, now that #517814: File API Stream Wrapper Conversion has landed. The PHPDoc summaries are not only one line.
The Windows-specific parts of the patch are carried over from the previous version. I currently don't have a Windows box to test on.
Comment #79
c960657 CreditAttribution: c960657 commentedTestbot glitch?
Comment #81
c960657 CreditAttribution: c960657 commentedReroll.
Comment #83
c960657 CreditAttribution: c960657 commentedTest bot glitch?
Comment #84
voxpelli CreditAttribution: voxpelli commentedGreat job c960657 in keeping this issue up to date!
I've tested out the core part of the patch and made some modification.
I also looked at the file-related part and did some modifications there - but I didn't verify any of the file-related functionality and I really think, as was discussed a very long time ago, that the file-related stuff might belong in a separate issue, but I think that it doesn't really matter as long as it doesn't block the core url encoding issue from being fixed.
Some feedback on the patch below and two patches with the feedback applied - one with and one without the file-related changes:
This is one very long line doing a lot of things which makes it hard to read.
It also uses the strtok() in a way that for me looks a bit awkward.
In this line the rawurldecode() is also in the wrong place - breaking a site placed in a directory with an encoded letter in it.
Also I think rawurldecode() needs to be replaced by urldecode() to get a consistent result for both clean and non-clean urls.
Why should the basename and the directory support backslashes here? I looked through the history of the patches and couldn't find a reason :/
If we should support backslashes - souldn't that be a separate issue that would include support in more function than file_create_filename()?
I think drupal_encode() should be used here?
-2 days to code freeze. Better review yourself.
Comment #85
voxpelli CreditAttribution: voxpelli commentedForgot to remove the tests for the backslash support - attaching a corrected version of the full patch.
Comment #86
voxpelli CreditAttribution: voxpelli commentedMistakenly commented out the tests instead of removing them completely - sorry - this should be a correct patch.
Comment #87
c960657 CreditAttribution: c960657 commentedYour changes look good.
WRT splitting file-related changes into a separate bug: In order to verify that the URL generation works properly on all platforms, you have to test with a variety of files, and in order to do that, you need some file-related changes - at least on Windows. I guess we could split the Windows issue into an issue of its own, though.
In PHP on Windows, backslash is treated as a directory separator, but on Linux it is just a regular character that can be used in filenames. In earlier versions of this patch (e.g. file_create_url-31.patch) I supported that the platforms did this differently (note the different expected result of the test). However, several places in file.inc seems to strip backslashes like they were forward slashes, so in the latest edition I chose to make backslash work as a separator on all platforms.
I think the best approach is the one that limit the need for separate code paths for Unix and Windows. It is my impression, that this is the approach where we just silently convert backslashes to regular slashes as soon as possible. Alternatively, on Linux we could translate backslash to underscore, like we do for the invalid characters on Windows.
There have been reports the MSIE mess up backslashes in URLs, even when encoded as %5C, but I could not reproduce that:
http://www.fi.muni.cz/~kas/blog/index.cgi/computers/png-transparency-con...
https://bugzilla.mozilla.org/show_bug.cgi?id=402804
If we decide to allow backslash in filenames on Linux, we should have a test covering this (with different outcome on Linux and Windows - e.g. the test used in file_create_url-31.patch).
This review is powered by Dreditor.
Comment #89
voxpelli CreditAttribution: voxpelli commentedThe backslashes would need to work consistently throughout all of Drupal's file functions so I think that we should separate that out as it's own issue - but you have looked at it more than me so you decide.
Regarding the tests - I didn't look at them that much since I found it very hard to follow what did what with all the encoded stuff and no comments on what was actually encoded.
Can you make any changes you think might be suitable in regards to separating out the windows issue and add some comments to the the different strings in the test and submit a new patch?
Comment #90
sunI'm not sure whether we shouldn't stuff a drupal_alter() in here...?
Or in other words, do developers still have a sane/easy way of altering $_GET['q'] before it is processed, if modifying the request URI via mod_rewrite in .htaccess?
Aside from that this patch looks good.
Comment #91
voxpelli CreditAttribution: voxpelli commentedYou could add a value to $_GET['q'] in .htaccess as you've always been able to - this only affects the default value.
I think a drupal_alter() this early in the bootstrap might have some performance implications since it would be executed even on aggressively cached pages so perhaps we shouldn't add one unless it's really needed?
Comment #92
c960657 CreditAttribution: c960657 commentedUpdated patch without the backslash and Windows parts.
In bootstrap.inc I changed
+ 2
to+ 1
. I'm not sure why you decided to use+ 2
. Note that dirname() may behave differently depending whether Drupal is installed in the site root or in a subdirectory.I agree that developers can still modify $_GET['q'] using .htaccess. The new code in bootstrap.inc only kicks in if $_GET['q'] is not defined.
Regarding the tests - I didn't look at them that much since I found it very hard to follow what did what with all the encoded stuff and no comments on what was actually encoded.
I'm not sure how to explain it. All characters are just passed through drupal_urlencode(). The point of the test is to make sure that all characters are passed through without consufing mod_rewrite or other parts of the chain.
Comment #94
c960657 CreditAttribution: c960657 commentedAh, the culprit was the rtrim() that was changed to trim(). Now it works both in the site root and in a subdirectory.
Comment #95
voxpelli CreditAttribution: voxpelli commentedThis can still get into Drupal 7 since it's a bug fix - right? I haven't got around to review the latest patch yet - but I'm sure it will work fine.
Comment #96
c960657 CreditAttribution: c960657 commentedIt's mostly a bug fix, though it does rename drupal_encode_path() to drupal_urlencode() and Drupal.encodePath() to Drupal.encodeURIComponent() (that was actually the names used in D6). This is just a question about function names, so I can easily roll a version without this change. Adding the API change for now.
I does break URL aliases for URLs with special characters in them, but I think those are very rare.
Comment #98
c960657 CreditAttribution: c960657 commentedReroll.
Comment #99
mattyoung CreditAttribution: mattyoung commented.
Comment #100
c960657 CreditAttribution: c960657 commentedReroll.
Comment #102
c960657 CreditAttribution: c960657 commentedReroll.
Comment #104
c960657 CreditAttribution: c960657 commentedReroll.
Comment #105
c960657 CreditAttribution: c960657 commentedUpdated version due to the API freeze. This patch is without the renaming of drupal_encode_path() back to drupal_urlencode() (see comment #69).
file_create_url-42.patch still works if an API change is acceptable.
Comment #106
Damien Tournoud CreditAttribution: Damien Tournoud commented1. This is the equivalent of:
2. There is no test for "exotic" URLs yet. We should test hitting one of those "exotic" URL using the internal browser.
This review is powered by Dreditor.
Comment #107
c960657 CreditAttribution: c960657 commenteddrupal_environment_initialize() now uses strtok() (like it did prior to #84).
The change to bootstrap.inc is tested in FileDownloadTest::testFileCreateUrl() when private files are used. However, I have added tests to modules/simpletest/tests/menu.test and modules/path/path.test for menu hooks and path aliases containing exotic characters.
Comment #108
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis looks very good. One minor nitpick:
In drupal_settings_initialize(), we use:
I'm not sure exactly why, but we should be consistent here.
This review is powered by Dreditor.
Comment #109
c960657 CreditAttribution: c960657 commentedI have changed both occurences to strip
\/
. I cannot imagine when it is relevant to strip,
. The line was introduced in #56634: Login and logout pb: inconsistent use of $base_url without any explanation. I also changed drupal_settings_initialize() to use rtrim() — SCRIPT_NAME always begins with a slash (not a backslash — those are only introduced by dirname()), so there is no need to strip it and add it in the following line.Comment #110
chx CreditAttribution: chx commentedWooohoo very nice patch here. But
(deleted lines iirrelevant) so why it is a problem that mod_rewrite unescapes the path if you unescape as well? I think you need to mention slashes in this comment and add/adjust some comments to .htaccess too.
Comment #113
c960657 CreditAttribution: c960657 commentedLike this?
Comment #114
c960657 CreditAttribution: c960657 commentedReroll.
Comment #115
catchchx asked me to take a look in irc. I'm not a position to do a proper review (about to go to bed), but nothing raised any flags on a quick read of the issue and the patch. There's some minor docs changes / variable renaming here but no API change beyond fixing what's broken.
Comment #116
chx CreditAttribution: chx commentedI tihink this is good to go.
Comment #117
sunShouldn't this be named drupal_rawurlencode() then like all other PHP wrapper functions?
And that said, we now use
@ingroup php_wrappers
See http://api.drupal.org/api/group/php_wrappers/7 for a full list.
The note could be reworded/clarified a bit - i.e. move "when passed through url()" to the beginning of the sentence.
Could we clarify the inline comment a bit more and explain why this is needed?
huh?!
I'm on crack. Are you, too?
Comment #118
Damien Tournoud CreditAttribution: Damien Tournoud commentedNo. This function encodes the path part of an URL. This is *not* a custom wrapper around rawurlencode(). Actually, I would argue that using rawurlencode() for the path part of an URL is (even if legal) a little bit too heavy. We actually don't need to encode much except '?' and non-ASCII characters in the path part of an URL. But that's for a different issue.
Back to needs work for the three other issues pointed by sun.
Comment #119
c960657 CreditAttribution: c960657 commentedThe function was named drupal_urlencode() in earlier versions of the patch (and in D6), because in this revised edition I think it can be considered a wrapper around urlencode() or rawurlencode(). It works for paths as well as query string arguments. The existing drupal_encode_path() on the other hand only works for paths, not query string arguments. If API changes are still allowed, I suggest we rename it back to drupal_urlencode() (or drupal_rawurlencode()) like it was in D6.
Comment #120
Damien Tournoud CreditAttribution: Damien Tournoud commented@c960657: again: no! Encoding paths and encoding query string arguments is a completely different matter. You don't need to encode a lot of things for paths (for example, you don't need to encode
'='
). It's a good thing to encode them differently.Comment #121
sunYeah, I mentioned it, because we discussed this function name and its innards to death in another issue, you remember? ;)
I think the proper function name would be drupal_rawurlencode(), because we are not invoking urlencode() but rawurlencode().
But, yeah, it's also late. Not sure how often this function is invoked throughout contrib though. But I'm pretty sure that removing a pretty large WTF would help module developers more, even if they have to slightly change their code.
Comment #122
voxpelli CreditAttribution: voxpelli commentedEither drupal_encode_path() is a wrapper of rawurlencode() and then it should perhaps should be renamed - or it's not a wrapper of it and then the comment shouldn't state that it is. So either way the patch would need a change - right?
Comment #123
c960657 CreditAttribution: c960657 commentedOkay, I get your point. Though rawurlencode() is sufficient for both purposes, we need separate functions for encoding paths and query parameters if we want to achieve as “clean” URLs as possible. So let's keep drupal_encode_path() for now. Then we can tune drupal_encode_path() and drupal_http_build_query() later (or perhaps create a separate function function for escaping query parameters). The Doxygen comments have been updated accordingly.
I have updated autocomplete.js to use JavaScript's built-in encodeURIComponent() directly instead of Drupal.encodePath, because it needs to append to both paths and query strings, depending on whether clean URLs are being used on the site. Luckily, the user never sees these URLs, so aesthetics doesn't matter here. The current implementation of Drupal.encodePath actually works with both paths and query parameters, but this may change.
Comment #124
sunQuestion: What happens to $_REQUEST['q'] with this patch? We currently teach everyone that $_GET['q'] gets you the internal system path, while $_REQUEST['q'] gets you the unmodified path, which may be a URL alias in case there is one.
Comment #125
voxpelli CreditAttribution: voxpelli commented@sun: Isn't using $_REQUEST to do that wrong anyway?
In the default settings of PHP $_REQUEST['q'] will use the initial value of $_POST['q'], $_COOKIE['q'] or $_SYSTEM['q'] if any of them are set prior to taking the initial value of $_GET['q'] so any form containing an element with the name 'q' will break $_REQUEST['q'] as well as any user having a cookie with that name and, if you're really unlucky, on some system it will even break if there's an environmental variable with the name 'q'.
The only 'q' valid as a path is $_GET['q'] and you can never be sure if the value in $_REQUEST['q'] comes from $_GET['q'] or not and therefor it should never be used.
If the unmodified path needs to be exposed it would need to be exposed through a custom global and/or special functions instead of through $_REQUEST['q'].
Comment #126
c960657 CreditAttribution: c960657 commentedThis introduces a new function for that purpose, requested_path().
Comment #127
c960657 CreditAttribution: c960657 commentedPlease ignore the last patch.
Comment #128
sunerrrr, what? ;)
Let's rename this to request_path().
"Implements" is the correct one. Also needs to be fixed elsewhere in this patch.
I'm on crack. Are you, too?
Comment #130
c960657 CreditAttribution: c960657 commentedDone.
It seems the Doxygen standards have changed since last time I looked. I better read them over again.
Comment #132
c960657 CreditAttribution: c960657 commentedThe new request_path() didn't work when un-clean URLs were used (the testbot does not use clean URLs AFAIR). This updated patch uses a slightly different strategy. I did not use drupal_static(), because the original path cannot change during the lifetime of a request.
Comment #133
sunStale function name here.
Aside from that, this patch looks ready to fly for me now.
This review is powered by Dreditor.
Comment #134
c960657 CreditAttribution: c960657 commentedDone.
Comment #135
sunThanks!
Comment #136
c960657 CreditAttribution: c960657 commentedReroll (manually resolved a conflict in modules/simpletest/tests/menu_test.module).
Comment #137
scor CreditAttribution: scor commentedThe comment refers to a $path variable which does not exist in this function.
Minor nitpick, this line contains an extra whitespace.
This review is powered by Dreditor.
Comment #138
cburschkaWhat scor said. Also:
This concatenation looks like left-over debug output.
If the drupalGetContent() is actually meant to be displayed there, it should be inserted into the localized string with a placeholder.
I'm on crack. Are you, too?
Comment #139
c960657 CreditAttribution: c960657 commentedDone.
Comment #140
scor CreditAttribution: scor commentedlooks good, #137 and #138 have been addressed. back to RTBC.
Comment #141
cburschkaThe @param should probably stay.
I just noticed this now - there doesn't really seem to be a reason for switching from $path to the less meaningful $text on this function. The function still processes a path.
I'm on crack. Are you, too?
These are code-style/doc fixes; set back to RTBC once addressed.
Comment #142
c960657 CreditAttribution: c960657 commentedDone.
Comment #143
c960657 CreditAttribution: c960657 commentedClean reroll.
Comment #144
scor CreditAttribution: scor commentedpatch #143 does not apply any more, patching file modules/path/path.test, Hunk #1 FAILED at 60.
Comment #145
int CreditAttribution: int commented#143: file_create_url-59.patch queued for re-testing.
Comment #146
c960657 CreditAttribution: c960657 commentedReroll.
Comment #147
c960657 CreditAttribution: c960657 commentedChasing HEAD.
Comment #148
aspilicious CreditAttribution: aspilicious commentedhttp://qa.drupal.org/pifr/test/26656 its green, lets give it a proper review so it can be committed...
Comment #149
tttt CreditAttribution: tttt commentedIs this patch just for D7? I am using D6.14 and whenever someone attaches a file with # in the filename, the resulting link goes to page not found. Will this patch fix that? Thanks!
Comment #150
scor CreditAttribution: scor commented@ttt: the latest patch is for Drupal 7 only. It's not recommended to patch core (Drupal 6 in your case) until this patch has been backported and committed to Drupal 6.
Comment #151
Dries CreditAttribution: Dries commentedI reviewed this patch and I can't see anything wrong with it. Tests pass, code looks good, and it is well documented. I'm marking this RTBC. Will commit this later unless someone objects.
Comment #152
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks.
Comment #153
scor CreditAttribution: scor commentedPer #8, there might be some of this patch which can be backported to Drupal 6.
Comment #154
voxpelli CreditAttribution: voxpelli commentedNo parts of the patch can be backported as I see it since it changes double encoding to single encoding and isn't compatible with the old URL:s. If this should somehow be fixed in Drupal 6 it needs to be an alternative fix that doesn't change the URL:s.
Comment #155
c960657 CreditAttribution: c960657 commentedHere is patch for D6 using the approach suggested in #238299: file_create_url should return valid URLs. It does not change the URLs but only makes sure that URLs generated by url() and file_create_url() map to the expected page/file. To test it, you should upload some files with exotic file names, and create some path aliases containing exotic characters.
WRT D7, I have created a follow-up issue for dealing with the Windows-specific problems (see comment #87-#92): #701500: Cannot download files with 8bit filenames on Windows
Comment #156
ptam CreditAttribution: ptam commented#25: drupal_url_encoding_issue_284899.patch queued for re-testing.
Comment #157
wmostrey CreditAttribution: wmostrey commentedPlease note that all % should be double encode if you're running Apache, but that none should be double encoded if you're running Nginx.
Currently
"drag&drop"
encodes to%22drag%2526drop%22
, with double encoding on the ampersand."drag&drop"
."drag%26drop"
.If everything would be double encoded then
"drag&drop"
would encode to%2522drag%2526drop%2522
and decode to%22drag%26drop%22
in Nginx. This is definitely not the expected behavior.Comment #158
crifi CreditAttribution: crifi commentedSince this also affects menu items of combined taxonomy paths it would be nice to patch this also in D6. See #251868: Menu doesn't play well with taxonomy/term/%/all pages
Comment #159
michaelfavia CreditAttribution: michaelfavia commentedRegression issue here: #995512: Autocomplete doesn't work with slashes. The committed change precludes using slashes in the autocomplete inputs and throws JS errors when it is done.
Comment #160
mattconnolly CreditAttribution: mattconnolly commentedSo, looks like a solution has been moved forward to drupal 7.
I'd like to once again propose a very simple patch for drupal 6.x which fixes the lack of correct double escaping when using clean urls.
Aside from c960657's repeated complaints about urls looking uglier on these characters, it is far safer, and more compatible (works with older versions of apache). Newer versions of apache have the [B] flag in the rewrite module, but this does not re-escape everything, so it is not a complete solution, and is also not available in versions of Apache earlier than 2.2.
The current code correctly double-escapes "&" and "#". Other characters are not double escaped, such as "+" and '%' especially which can cause problems, for example in search fields. I really don't see why this one is so difficult to accept for Drupal 6.
Comment #161
mattconnolly CreditAttribution: mattconnolly commentedIf I make it active, does that mean the patch will get tested on drupal 6.x ??
Comment #162
pillarsdotnet CreditAttribution: pillarsdotnet commentedNo, but if you make it "needs review" then the testbot will make sure that it *applies* to 6.x-dev.
Comment #163
abi2 CreditAttribution: abi2 commentedurl-encode.patch queued for re-testing.
Comment #164
dandaman CreditAttribution: dandaman commentedI'm a bit confused. When running a 6.x site on Nginx, I'm finding some of these special characters double-encoded and therefore not working. If I remove all the special str_replace data for clean_urls, it seems to work on basic testing. Is there any other changes I need to make? Which patch should I be using? What can we do to get this working out of the box on Drupal 6.x? Thanks for all your hard work in figuring out all these issues.
Comment #165
jh81 CreditAttribution: jh81 commentedLooks to me like the patch in #147 is already in Drupal 7.12. But I have a file with "+" and "&" that are still not opening correctly and giving me a file not found error. Any ideas on how I can resolve this?
Comment #166
x3cion CreditAttribution: x3cion commentedThis Bug is linked by Submitting patches - Name your patch. Since it's nearly 7 years old: If this is "won't fix" by now, the "Caution" part should be reworded, shouldn't it?