Drupal url problem with clean urls

mattconnolly - July 20, 2008 - 08:34
Project:Drupal
Version:7.x-dev
Component:file system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs review
Issue tags:API change, clean URL
Description

Drupal has a problem when clean urls are enabled.

Because mod_rewrite decodes the url query, anything in the query needs to be double encoded. The old code, and the code in the CVS patch you showed, only double encodes '%2f', '%26' and '%23' (most commonly used in URLs). However, it should double encode ALL %.

This problem can be seen with an attached file with a '+' in the name (eg, an attachment called 'apples+oranges.png'). This turns into '%2b' but the percent sign is not double encoded like it should be, and the link is broken with clean urls enabled.

This is corrected by the attached patch.

AttachmentSizeStatusTest resultOperations
url-encode.patch634 bytesIdleFailed: Failed to apply patch.View details | Re-test

#1

gpk - August 7, 2008 - 10:33

>the code in the CVS patch you showed
Are you referring to a patch here http://drupal.org/node/191116?

There is also a NE (noescape) flag to the RewriteRule directive (http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewriterule) which I use when I use mod_rewrite to force a leading www. on the domain name. (This prevents extra % encoding getting into the URL.) Is this any use here? Possibly not, in that 1) the issue is talking about mod_rewrite decoding not encoding, and 2) adding the NE flag to Drupal's main rewrite rule would probably cause all sorts of compatibility problems on site upgrade ...

(As you can see I've only got my head partially round this issue so apologies if I'm just muddying the water ..)

#2

fonant - September 1, 2008 - 10:16

This bug seems to be reproducible quite easily using Drupal's standard search facility:

Without the above patch, searching for:

"Fred & Ginger" works
"Fred # Ginger" works
"Fred / Ginger" works
"Fred // Ginger" works
"Fred /// Ginger" works

"Fred + Ginger" fails, returning with "Fred Ginger"
"Fred %2b Ginger" fails, returning with "Fred + Ginger"
"Fred %65 Ginger" fails, returning with "Fred e Ginger"

After applying the above patch, all the above searches work.

Running with clean URLs via mod_rewrite under Apache 2.2.6.

#3

c960657 - September 2, 2008 - 19:23
Status:needs review» duplicate

Marking as duplicate of #238299: file_create_url should return valid URLs that has a patch as well as a test case. The patch passes all the Fred & Ginger examples in comment #2.

I don't think the NE mod_rewrite flag is of any use here. The B flag looks like it does the trick (I haven't tried), but that is only available as of Apache 2.2:
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule

#4

c960657 - October 19, 2008 - 13:11

The B flag is actually only available as of Apache 2.2.8 (see "Add option to suppress URL unescaping" on http://www.apache.org/dist/httpd/CHANGES_2.2). But it may allow us to support even cleaner URLs.

#5

mattconnolly - November 4, 2008 - 01:41
Status:duplicate» needs review

c960657: Although there is some significant overlap, I disagree that this is a duplicate.

The patch in #238299 improves the situation by explicitly fixing the '+' and '#'. The rest of the code is specifically to do with naming files, which is fine (and another issue).

My patch simplifies the drupal_urlencode function to correctly double encode any '%xx' symbol when using clean urls.

This will solve the clean url problem for any module that passes strings by means of a URL, not just naming attached files. For example, this improves doing searches on a drupal site.

#6

c960657 - November 4, 2008 - 06:37
Status:needs review» active

The patch in #238299 also modifies drupal_urlencode(). Doesn't this fix the problems reported in this issue?

#7

mattconnolly - November 5, 2008 - 08:02

The code in the other patch has to do with file names and may fix the problem specifically for file attachments. It does include a change to the drupal_urlencode function which means that a '+' and '#' are double encoded. This is an improvement, however, the drupal_urlencode function only correctly double encodes '+' and '#'.

My patch ensures that all symbols are correctly double encoded when using clean urls for any function or module that uses the drupal_urlencode function. For example, this affects search query strings.

It is my opinion that both patches should be applied.

#8

drewish - November 6, 2008 - 02:46
Version:6.2» 7.x-dev
Status:active» needs work

would need to go into HEAD and then be backported. i'd like to see better commenting on what's being done in the code.

#9

mattconnolly - November 6, 2008 - 08:02
Status:needs work» active

drewish: Would you please have a look at the function drupal_urlencode in "includes/common.inc" and the comments that precede it.

There are quite adequate comments about the intended purpose of the function that already exist. In any case, I have made a slight adjustment to the comments to reflect the improved behaviour of the function.

Since this patch makes no alteration to the intended use of this function, I consider that the existing comments in the file are completely appropriate to describe the function. This patch improves the function to better implement what the original intention was.

Attached are patches for both 6.6 and 7.x-dev.

AttachmentSizeStatusTest resultOperations
url-encode+comments-6.6.patch1.22 KBIdleFailed: Failed to apply patch.View details | Re-test
url-encode+comments-7.x.patch1.22 KBIdleFailed: Failed to apply patch.View details | Re-test

#10

David Strauss - December 7, 2008 - 12:29
Status:active» needs review

Patch for consideration = patch (code needs review)

#11

c960657 - December 7, 2008 - 22:23
Status:needs review» needs work

The patch double encodes more characters than necessary. Encoding only the necessary characters results in cleaner URLs. E.g. searching for the string foo! generates the URL http://example.net/search/node/foo%2521 even though the shorter URL http://example.net/search/node/foo%21 works fine. I think URLs should be as clean as possible.

#12

mattconnolly - December 15, 2008 - 12:53
Status:needs work» needs review

I disagree. I think simpler code that is more reliable is more important than how a URL looks to a user.

For example, the difference between a ' ' (space) and '+' in a URL are decoded exactly the same but could point to different file names for attachments (where I discovered the bug - and i've also seen this bug affect the apache_solr search module when the '+' is used as a query modifier).

If rawurlencode escapes the character then it will be unescaped twice with clean urls. Just because '%21' works doesn't mean it's right, it's not RFC 1738 compliant (for the second urldecode). This could cause problems with some browsers, servers, proxies, etc that might be expecting only RFC 1738 URLs.

Try this test code:

<?php
    $url
= "Hello!/file-name+with+plus.jpg";
   
$encoded = drupal_urlencode($url);
    print
"original url  : $url\n";
    print
"drupal encoded: $encoded\n";
    print
"single decoded: ".urldecode($encoded)."\n"; // simulating decode done by mod_rewrite
   
print "double decoded: ".urldecode(urldecode($encoded))."\n"; // simulating decode done by php
?>

Without patch, this fails:

original url  : Hello!/file-name+with+plus.jpg
drupal encoded: Hello%21/file-name%2Bwith%2Bplus.jpg
single decoded: Hello!/file-name+with+plus.jpg
double decoded: Hello!/file-name with plus.jpg

With the patch, it succeeds:

original url  : Hello!/file-name+with+plus.jpg
drupal encoded: Hello%2521/file-name%252Bwith%252Bplus.jpg
single decoded: Hello%21/file-name%2Bwith%2Bplus.jpg
double decoded: Hello!/file-name+with+plus.jpg

Remember, with clean urls on, the url is decoded twice: firstly by mod_rewrite and then a second time when it is parsed by PHP. Therefore the url should be RFC 1738 compliant at *both* decodes to guarantee that it works.

I don't believe that an extra 2 characters per non-RFC 1738 character is a problem. I'd rather my URLs actually work.

-Matt

#13

mattconnolly - December 17, 2008 - 03:42

c960657: you're earlier post about the 'B' flag almost fixes the issue, but not quite. It still fails on characters that have special meaning in URLs, such as '%' and '&'. (I think this might actually be a bug in Apache, since it sounds like it should fix this problem.)

Test case: with [B] redirect option, there's no need to double encode in drupal_urlencode, so that function would then look like this:

<?php
function drupal_urlencode($text) {
  if (
variable_get('clean_url', '0')) {
    return
str_replace(array('%2F', '//'),
                       array(
'/', '/%252F'),
                      
rawurlencode($text));
  }
  else {
    return
str_replace('%2F', '/', rawurlencode($text));
  }
}
?>

1. user enters 'apples & oranges' in search field,
2. redirects to: http://drupal/search/node/apples%20%26%20oranges
3. search field shows 'apples ', PHP's $_GET variable shows:

array(2) {
["q"]=>
string(19) "search/node/apples "
["oranges"]=>
string(0) ""
}

Considering the [B] option in RewriteRule (1) doesn't always work, and (2) is not available to users running apache < 2.2.9; the only solution I can see to guarantee that URLs work with clean urls is to go ahead with my patch.

I've tested this on Apache 2.2.6 (Mac OS X) and Apache 2.2.9 (Ubuntu) with Drupal versions from 6.3 to 6.8 and current 7.x-dev

So do I change the status to "patch (reviewed & tested by the community)" or do I need someone else to test it too?

#14

voxpelli - December 29, 2008 - 15:51

I'm attaching a patch for .htaccess that contains a workaround for the broken B-flag (see Apache's bug report).

With my patch Drupal are getting the same data no matter if clean urls are used or not. As a result of this is it's not necessary for drupal_urlencode() to do double encoding.

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899.patch479 bytesIdleFailed: Failed to run tests.View details | Re-test

#15

voxpelli - December 29, 2008 - 16:51

Update to my previous patch in #14 - it contained a mistake

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899.patch483 bytesIdleFailed: Failed to run tests.View details | Re-test

#16

System Message - January 7, 2009 - 23:40
Status:needs review» needs work

The last submitted patch failed testing.

#17

Damien Tournoud - January 9, 2009 - 15:03
Status:needs work» needs review

Seems to work on my test bed. Let's try to submit this one again.

#18

System Message - January 9, 2009 - 15:25
Status:needs review» needs work

The last submitted patch failed testing.

#19

voxpelli - January 9, 2009 - 16:16

A new cleaned up version of the patch which I've tested locally and I'm able to run the tests - so if this one doesn't work then it's a flaw in the testbot(?)

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899.patch478 bytesIdleFailed: Failed to run tests.View details | Re-test

#20

voxpelli - January 9, 2009 - 16:16
Status:needs work» needs review

#21

System Message - January 9, 2009 - 16:30
Status:needs review» needs work

The last submitted patch failed testing.

#22

c960657 - January 10, 2009 - 21:31

I like this approach. AFAICT it allows us to completely get rid of drupal_urlencode() etc. and gives us cleaner URLs.

The patch should remove the double escaping of special characters in drupal_urlencode() that are now no longer necessary.

Also, I suggest using this slightly optimized regexp in .htaccess: ^[^ ]+ ([^? ]+)

#23

drewish - January 10, 2009 - 21:44

which versions of apache is this compatible with? and which has this been tested on?

#24

c960657 - January 10, 2009 - 22:16

The THE_REQUEST variable was present in the source code for Apache 1.3.0, i.e. the first Apache 1.3 release from 1998. So I would assume that it is safe to use.

I have successfully tested with Apache 2.2.3 that it is included in Debian Etch.

#25

voxpelli - January 10, 2009 - 23:11
Status:needs work» needs review

I've the same impression as c960657 that this should be supported by Apache 1.3 and up

I also updated the patch and removed the now unnecessary double encoding in drupal_encode() and updated the regular expression used in the .htaccess to a modified version of the one suggested by c960657

I'm still a little new on this patch making process - especially making a patch for multiple files. This one should work - but it seems to fail to automatically find includes/common.inc - any suggestions are welcome.

The test bot will probably complain about this one as well - I haven't had time to file a bug for it yet.

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899.patch1.05 KBIdleFailed: Failed to run tests.View details | Re-test

#26

System Message - January 10, 2009 - 23:15
Status:needs review» needs work

The last submitted patch failed testing.

#27

gpk - January 11, 2009 - 18:44

Running an old-ish version of XAMMP (details below) the modified rewrite rule seems to fail completely and I end up being redirected to the XAMPP home page on localhost. I also tried using the original .htaccess with the NE flag appended to the existing RewriteRule directive. While this didn't cause any visible problems with the site I couldn't get the example searches at #2 to work. Presumably this could do with some tests?

###### ApacheFriends XAMPP (basic package) version 1.6.2 ######

+ Apache 2.2.4
+ MySQL 5.0.41
+ PHP 5.2.2

#28

voxpelli - January 11, 2009 - 19:52

@gpk: I tried out a fresh install of XAMPP 1.6.2 on Windows XP and after I had decommented/activated the rewrite-module in the httpd.conf file and restarted Apache I could run all of the normally unsuccessful searches in #2 with success in a Drupal 7 that's maximum 1-2 days old that has been patched with my latest patch.

#29

gpk - January 11, 2009 - 21:55

@28: thx for quick reply. I did wonder if I'd had finger trouble, or maybe my copy of D7 is too old. I'll give it another whirl when I have a chance. The only other thing that comes to mind is that the path to the site for my install is something like [edited] http://localhost/drupal/drupal-7.x-dev-xxx/ but I don't see that should make any difrerence.

BTW what's the benefit of capturing the part of the path that will become $_GET['q'] in the additional RewriteCond rather than in the main RewriteRule?

#30

voxpelli - January 11, 2009 - 21:24

@29: I can confirm that my patch has a problem when the page isn't placed directly at the root of a domain.

The benefit of capturing it through a RewriteCond is that they will be sent encoded to Drupal in the same way as a non clean url q would.

I will try to fix this and create a new patch when I have time in the coming week.

#31

gpk - January 11, 2009 - 22:00

>sent encoded to Drupal in the same way as a non clean url q would
OK, I'm not quite clear why this doesn't work when capturing the path in the RewriteRule - I seemed to get something like this to work per #1 but it's either too late in the evening now or else mod_rewrite has for me reverted to completely opaque black magic!

#32

c960657 - January 12, 2009 - 18:43

This patch combines your patch with tests and Windows specific changes from #238299: file_create_url should return valid URLs (the tests are adjusted to the new drupal_urlencode()). It looks like the approach works, even with URLs containing unusual characters. If you think the Windows changes are outside the scope of this bug, we can pursue these in another issue if you like.

Don't forget to update the documentation for drupal_urlencode() to reflect the new simplified behaviour.

or else mod_rewrite has for me reverted to completely opaque black magic!

mod_rewrite being black magic is a known bug :-)

#33

c960657 - January 12, 2009 - 18:44
AttachmentSizeStatusTest resultOperations
file_create_url-22.patch12.91 KBIdleFailed: Failed to apply patch.View details | Re-test

#34

c960657 - January 22, 2009 - 23:06

voxpelli, by submitting my last patch I didn't intend to "take control" over this issue. I was just contributing some tests that I made for another related issue (adjusted to check the now even clean URLs). So feel free to proceed. Hope you are still up to the task :-)

#35

voxpelli - January 23, 2009 - 13:34

:) I will proceed as soon as I've got time to do so - hopefully this weekend

#36

c960657 - March 11, 2009 - 21:57
Status:needs work» needs review

This removes the need to double-encode certain characters.

It turned out to be difficult to use THE_REQUEST when Drupal was installed in a subdirectory, so instead I initialize $_GET[q] in drupal_initialize_variables().

D6 users upgrading to D7 will experience that URLs containing characters that were previously double-encoded do not work after the upgrade. I don't know if this should just be noted in the release notes, or whether we should add some kind of automatic redirection. Comments?

AttachmentSizeStatusTest resultOperations
file_create_url-23.patch16.42 KBIdleFailed: Failed to apply patch.View details | Re-test

#37

c960657 - March 12, 2009 - 21:43

Use REQUEST_URI instead of REDIRECT_URL, because the latter did not work with Apache+PHP-CGI.

AttachmentSizeStatusTest resultOperations
file_create_url-24.patch16.58 KBIdleFailed: Failed to apply patch.View details | Re-test

#38

System Message - April 1, 2009 - 01:15
Status:needs review» needs work

The last submitted patch failed testing.

#39

c960657 - April 1, 2009 - 15:38
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-25.patch16.59 KBIdleFailed: Failed to apply patch.View details | Re-test

#40

Dave Reid - April 2, 2009 - 18:59

Adding tag.

#41

System Message - April 22, 2009 - 11:25
Status:needs review» needs work

The last submitted patch failed testing.

#42

c960657 - April 22, 2009 - 20:24
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-26.patch16.9 KBIdleFailed: Failed to apply patch.View details | Re-test

#43

Dave Reid - April 23, 2009 - 01:56

Wow...the hackiness is just flabbergasting. :/ I'm not sure how I feel about this.

#44

c960657 - April 24, 2009 - 20:54

Yes, it is an ugly hack :-( But the existing workaround for the mod_rewrite limitation isn't particularly pretty either, and – as mentioned in this issue and in #238299: file_create_url should return valid URLs – the workaround in drupal_urlencode() still has some issues.

Also, it seems to confuse both core and module developers that drupal_urlencode() should only be used when encoding the path component and not when encoding query string parameters (see #310139: drupal_query_string_encode() should not call drupal_urlencode() and e.g. #298498: Drupal assumes mod_rewrite bugs which are not there). The proposed patch offers better DX and allows users to use the same URL-encoding function for both the path and query string parameters.

Of course, it is required that the proposed patch actually works, not only on Apache but also on other web servers. Comments from people running non-Apache servers are welcome. The existing behaviour is targeted at working around an Apache limitation, so I'd be interested in hearing about whether clean URLs can be implemented on non-Apache servers with the existing and/or with the proposed behaviour.

#45

System Message - April 26, 2009 - 22:30
Status:needs review» needs work

The last submitted patch failed testing.

#46

c960657 - April 27, 2009 - 22:58
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-27.patch16.89 KBIdleFailed: Failed to apply patch.View details | Re-test

#47

sun - May 3, 2009 - 20:40
Status:needs review» needs work

Huh?

-  # Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteCond %{REQUEST_URI} !=/favicon.ico
-  RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
+  RewriteRule ^ index.php [L]

This would effectively make the entire URL rewriting superfluous. (?)

#48

c960657 - May 3, 2009 - 21:29

Yes, the intention with this patch is to move parsing of clean URLs from .htaccess to drupal_initialize_variables(). mod_rewrite still takes care of not invoking PHP if the request refers to an actual file or directory.

#49

drewish - May 4, 2009 - 01:25

wow... that would be pretty cool... i wonder what it would do for our ability to setup clean URLs on other webserver. it seems like it'd give us a lot more flexibility there.

#50

aclight - May 4, 2009 - 21:13

subscribing

#51

kkaefer - May 5, 2009 - 08:45

subscribe

#52

c960657 - May 18, 2009 - 20:38

@sun, would you care to elaborate on #47? Or did my answer in #48 address your concerns?

#53

sun - May 18, 2009 - 21:28
Status:needs work» needs review

Hm. So, effectively, this means we are taking over the URL rewriting for Apache, because Apache does not do it right under certain circumstances.

Yes, this might open up clean URL support to other web servers, as drewish mentioned.

However, I am concerned whether people will still be able to apply custom rewrite rules this way. Take this for instance of what people sometimes need or want to do:

  # Rewrite static URLs.
  RewriteRule ^logout$ /drupalvb/logout

  # Rewrite current-style URLs of the form 'x' to the form 'index.php?q=x'.
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

Point being - doing this:

RewriteRule ^user\/register$ node [L]
RewriteRule ^ index.php [L]

results in the following $_SERVER values:

  'REDIRECT_URL' => string '/node' (length=5)
  'REQUEST_URI' => string '/user/register' (length=14)

...but the current patch only checks REQUEST_URI. If we would implement support for REDIRECT_URL - it's perfectly possible that other web servers assign a different env variable.

#54

System Message - May 18, 2009 - 21:50
Status:needs review» needs work

The last submitted patch failed testing.

#55

c960657 - May 19, 2009 - 19:06
Status:needs work» needs review

Reroll.

WRT to making custom mod_rewrite rules, I suggest using the following form instead:

+  RewriteRule ^user\/register$ index.php?q=node [L]

...but the current patch only checks REQUEST_URI. If we would implement support for REDIRECT_URL - it's perfectly possible that other web servers assign a different env variable.

I'm not sure what you mean here. This is only intended to work around an Apache bug, and Apache always sets REQUEST_URI. We could allow other web servers to hook into the same workaround by using request_uri() instead, but the source code for that function doesn't imply that Drupal supports clean URLs on servers that do not set REQUEST_URI.

Non-Apache web servers can still use clean URLs by setting ?q= using some rewriting layer.

This patch has only been tested on Apache (on Windows with PHP running as Apache module, and on Linux with PHP running as CGI as well as Apache module). I would love to hear some test results from people running Drupal on other platforms.

AttachmentSizeStatusTest resultOperations
file_create_url-28.patch16.91 KBIdleFailed: Failed to apply patch.View details | Re-test

#56

System Message - May 25, 2009 - 10:36
Status:needs review» needs work

The last submitted patch failed testing.

#57

c960657 - May 25, 2009 - 20:58
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-29.patch16.92 KBIdleFailed: Failed to apply patch.View details | Re-test

#58

System Message - June 9, 2009 - 06:20
Status:needs review» needs work

The last submitted patch failed testing.

#59

c960657 - June 9, 2009 - 18:40
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-30.patch16.9 KBIdleFailed: Failed to apply patch.View details | Re-test

#60

System Message - June 11, 2009 - 06:20
Status:needs review» needs work

The last submitted patch failed testing.

#61

c960657 - June 11, 2009 - 17:23
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-31.patch16.62 KBIdleFailed: Failed to apply patch.View details | Re-test

#62

System Message - June 22, 2009 - 10:50
Status:needs review» needs work

The last submitted patch failed testing.

#63

c960657 - June 22, 2009 - 21:35
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-32.patch16.63 KBIdleFailed: Failed to apply patch.View details | Re-test

#64

System Message - June 24, 2009 - 01:10
Status:needs review» needs work

The last submitted patch failed testing.

#65

c960657 - June 24, 2009 - 21:27
Status:needs work» needs review

Test bot glitch.

#66

System Message - June 27, 2009 - 20:10
Status:needs review» needs work

The last submitted patch failed testing.

#67

c960657 - June 28, 2009 - 10:01
Status:needs work» needs review

Setting to previous status - testbot was broken (failed to install HEAD).

#68

System Message - July 4, 2009 - 10:30
Status:needs review» needs work

The last submitted patch failed testing.

#69

c960657 - July 6, 2009 - 16:22

Reroll.

I reverted the renaming of drupal_urlencode() to drupal_encode_path() and Drupal.encodeURIComponent to Drupal.encodePath() that was recently done in #310139: drupal_query_string_encode() should not call drupal_urlencode(). With this patch the functions can be used with paths as well as query string arguments, so the arguments for renaming them no longer applies.

AttachmentSizeStatusTest resultOperations
file_create_url-33.patch18.8 KBIdleFailed: Failed to apply patch.View details | Re-test

#70

c960657 - July 6, 2009 - 16:38
Status:needs work» needs review

#71

System Message - July 12, 2009 - 18:35
Status:needs review» needs work

The last submitted patch failed testing.

#72

c960657 - July 15, 2009 - 19:31
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-34.patch18.78 KBIdleFailed: Failed to apply patch.View details | Re-test

#73

System Message - August 17, 2009 - 09:41
Status:needs review» needs work

The last submitted patch failed testing.

#74

c960657 - August 17, 2009 - 16:03
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-35.patch18.76 KBIdleFailed: Failed to apply patch.View details | Re-test

#75

Aron Novak - August 17, 2009 - 16:38
Status:needs review» reviewed & tested by the community

I reviewed the code and tried it out and it seems to be solid.

#76

sun - August 17, 2009 - 17:13
Status:reviewed & tested by the community» needs review

+++ includes/common.inc 17 Aug 2009 10:52:17 -0000
@@ -3413,47 +3413,30 @@ function drupal_json($var = NULL) {
/**
- * Wrapper around urlencode() which avoids Apache quirks.
+ * Wrapper around rawurlencode() that does not escape slashes (for aesthetic
+ * reasons).
...
+function drupal_urlencode($text) {
+++ misc/drupal.js 17 Aug 2009 10:52:17 -0000
@@ -257,26 +257,24 @@ Drupal.freezeHeight = function () {
+ * Wrapper around encodeURIComponent() that does not escape slashes (for
+ * aesthetic reasons).
  */
...
+Drupal.encodeURIComponent = function (item, uri) {
+++ modules/simpletest/tests/file.test 17 Aug 2009 10:52:17 -0000
@@ -1940,18 +1940,194 @@ class FileDownloadTest extends FileTestC
+   * Test file_create_url() using FILE_DOWNLOADS_PRIVATE and clean URLs
+   * disabled.
+   */
+  function testFileCreateUrlPrivateCleanUrlDisabled() {

The PHPDoc summary should be on one line.

+++ modules/simpletest/tests/file.test 17 Aug 2009 10:52:17 -0000
@@ -1940,18 +1940,194 @@ class FileDownloadTest extends FileTestC
+  /**
+   * Check that the URL generated by file_create_url() for the specified file
+   * equals the specified URL, then fetch the URL and compare the contents to
+   * the file.
...
+  private function checkUrl($path, $expected_url) {

This needs a separate summary (that's a description).

Lastly, this patch may break the stream wrappers patch... so I suggest to defer it until that landed.

I'm on crack. Are you, too?

#77

c960657 - August 17, 2009 - 21:23

Thanks for the reviews.

Updated patch, now that #517814: File API Stream Wrapper Conversion has landed. The PHPDoc summaries are not only one line.

The Windows-specific parts of the patch are carried over from the previous version. I currently don't have a Windows box to test on.

AttachmentSizeStatusTest resultOperations
file_create_url-36.patch17.16 KBIdleFailed: Failed to apply patch.View details | Re-test

#78

System Message - August 26, 2009 - 11:49
Status:needs review» needs work

The last submitted patch failed testing.

#79

c960657 - August 26, 2009 - 18:34
Status:needs work» needs review

Testbot glitch?

#80

System Message - August 29, 2009 - 19:50
Status:needs review» needs work

The last submitted patch failed testing.

#81

c960657 - August 30, 2009 - 22:09
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-37.patch17.17 KBIdleFailed: Failed to apply patch.View details | Re-test

#82

System Message - September 2, 2009 - 21:40
Status:needs review» needs work

The last submitted patch failed testing.

#83

c960657 - September 3, 2009 - 15:25
Status:needs work» needs review

Test bot glitch?

#84

voxpelli - September 3, 2009 - 21:58

Great job c960657 in keeping this issue up to date!

I've tested out the core part of the patch and made some modification.

I also looked at the file-related part and did some modifications there - but I didn't verify any of the file-related functionality and I really think, as was discussed a very long time ago, that the file-related stuff might belong in a separate issue, but I think that it doesn't really matter as long as it doesn't block the core url encoding issue from being fixed.

Some feedback on the patch below and two patches with the feedback applied - one with and one without the file-related changes:

+++ includes/bootstrap.inc 30 Aug 2009 22:08:07 -0000
@@ -463,18 +463,26 @@ function drupal_environment_initialize()
+    $_GET['q'] = rawurldecode(substr(strtok($_SERVER['REQUEST_URI'], '?#'), strlen(rtrim(dirname($_SERVER['SCRIPT_NAME']), '/\\')) + 1));

This is one very long line doing a lot of things which makes it hard to read.

It also uses the strtok() in a way that for me looks a bit awkward.

In this line the rawurldecode() is also in the wrong place - breaking a site placed in a directory with an encoded letter in it.

Also I think rawurldecode() needs to be replaced by urldecode() to get a consistent result for both clean and non-clean urls.

+++ includes/file.inc 30 Aug 2009 22:08:08 -0000
@@ -862,18 +862,29 @@ function file_unmunge_filename($filename
+  // Support both "/" and "\" as directory separator.
+  $basename = strtr($basename, '\\', '/');
+  $directory = strtr($directory, '\\', '/');

Why should the basename and the directory support backslashes here? I looked through the history of the patches and couldn't find a reason :/

If we should support backslashes - souldn't that be a separate issue that would include support in more function than file_create_filename()?

+++ includes/stream_wrappers.inc 30 Aug 2009 22:08:08 -0000
@@ -577,19 +577,33 @@ class DrupalPublicStreamWrapper extends
+    return $GLOBALS['base_url'] . '/' . self::getDirectoryPath() . '/' . str_replace('%2F', '/', rawurlencode($path));

I think drupal_encode() should be used here?

-2 days to code freeze. Better review yourself.

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899_2.patch12.13 KBIdleFailed: 12802 passes, 3 fails, 0 exceptionsView details | Re-test
drupal_url_encoding_issue_284899_2_no_file.patch4.94 KBIdleFailed: 12832 passes, 9 fails, 0 exceptionsView details | Re-test

#85

voxpelli - September 4, 2009 - 17:40

Forgot to remove the tests for the backslash support - attaching a corrected version of the full patch.

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899_3.patch12.13 KBIdleFailed: Failed to apply patch.View details | Re-test

#86

voxpelli - September 5, 2009 - 10:27

Mistakenly commented out the tests instead of removing them completely - sorry - this should be a correct patch.

AttachmentSizeStatusTest resultOperations
drupal_url_encoding_issue_284899_4.patch11.79 KBIdleFailed: Failed to apply patch.View details | Re-test

#87

c960657 - September 5, 2009 - 10:37

Your changes look good.

WRT splitting file-related changes into a separate bug: In order to verify that the URL generation works properly on all platforms, you have to test with a variety of files, and in order to do that, you need some file-related changes - at least on Windows. I guess we could split the Windows issue into an issue of its own, though.

Why should the basename and the directory support backslashes here? I looked through the history of the patches and couldn't find a reason :/

In PHP on Windows, backslash is treated as a directory separator, but on Linux it is just a regular character that can be used in filenames. In earlier versions of this patch (e.g. file_create_url-31.patch) I supported that the platforms did this differently (note the different expected result of the test). However, several places in file.inc seems to strip backslashes like they were forward slashes, so in the latest edition I chose to make backslash work as a separator on all platforms.

I think the best approach is the one that limit the need for separate code paths for Unix and Windows. It is my impression, that this is the approach where we just silently convert backslashes to regular slashes as soon as possible. Alternatively, on Linux we could translate backslash to underscore, like we do for the invalid characters on Windows.

There have been reports the MSIE mess up backslashes in URLs, even when encoded as %5C, but I could not reproduce that:
http://www.fi.muni.cz/~kas/blog/index.cgi/computers/png-transparency-con...
https://bugzilla.mozilla.org/show_bug.cgi?id=402804

If we decide to allow backslash in filenames on Linux, we should have a test covering this (with different outcome on Linux and Windows - e.g. the test used in file_create_url-31.patch).

This review is powered by Dreditor.

#88

System Message - September 5, 2009 - 10:46
Status:needs review» needs work

The last submitted patch failed testing.

#89

voxpelli - September 5, 2009 - 11:20
Status:needs work» needs review

The backslashes would need to work consistently throughout all of Drupal's file functions so I think that we should separate that out as it's own issue - but you have looked at it more than me so you decide.

Regarding the tests - I didn't look at them that much since I found it very hard to follow what did what with all the encoded stuff and no comments on what was actually encoded.

Can you make any changes you think might be suitable in regards to separating out the windows issue and add some comments to the the different strings in the test and submit a new patch?

#90

sun - September 5, 2009 - 12:45

+++ includes/bootstrap.inc
@@ -469,6 +469,20 @@ function drupal_environment_initialize() {
+    $_GET['q'] = substr(urldecode($request_path), $base_path_len + 2);

I'm not sure whether we shouldn't stuff a drupal_alter() in here...?

Or in other words, do developers still have a sane/easy way of altering $_GET['q'] before it is processed, if modifying the request URI via mod_rewrite in .htaccess?

Aside from that this patch looks good.

#91

voxpelli - September 5, 2009 - 12:59

You could add a value to $_GET['q'] in .htaccess as you've always been able to - this only affects the default value.

I think a drupal_alter() this early in the bootstrap might have some performance implications since it would be executed even on aggressively cached pages so perhaps we shouldn't add one unless it's really needed?

#92

c960657 - September 5, 2009 - 13:56

Updated patch without the backslash and Windows parts.

In bootstrap.inc I changed + 2 to + 1. I'm not sure why you decided to use + 2. Note that dirname() may behave differently depending whether Drupal is installed in the site root or in a subdirectory.

I agree that developers can still modify $_GET['q'] using .htaccess. The new code in bootstrap.inc only kicks in if $_GET['q'] is not defined.

Regarding the tests - I didn't look at them that much since I found it very hard to follow what did what with all the encoded stuff and no comments on what was actually encoded.
I'm not sure how to explain it. All characters are just passed through drupal_urlencode(). The point of the test is to make sure that all characters are passed through without consufing mod_rewrite or other parts of the chain.

AttachmentSizeStatusTest resultOperations
file_create_url-38.patch14.16 KBIdleFailed: 12758 passes, 123 fails, 10 exceptionsView details | Re-test

#93

System Message - September 5, 2009 - 14:16
Status:needs review» needs work

The last submitted patch failed testing.

#94

c960657 - September 5, 2009 - 15:30
Status:needs work» needs review

Ah, the culprit was the rtrim() that was changed to trim(). Now it works both in the site root and in a subdirectory.

AttachmentSizeStatusTest resultOperations
file_create_url-39.patch14.16 KBIdleFailed: Failed to apply patch.View details | Re-test

#95

voxpelli - September 18, 2009 - 07:17

This can still get into Drupal 7 since it's a bug fix - right? I haven't got around to review the latest patch yet - but I'm sure it will work fine.

#96

c960657 - September 20, 2009 - 12:00
Issue tags:+API change

It's mostly a bug fix, though it does rename drupal_encode_path() to drupal_urlencode() and Drupal.encodePath() to Drupal.encodeURIComponent() (that was actually the names used in D6). This is just a question about function names, so I can easily roll a version without this change. Adding the API change for now.

I does break URL aliases for URLs with special characters in them, but I think those are very rare.

#97

System Message - September 21, 2009 - 08:45
Status:needs review» needs work

The last submitted patch failed testing.

#98

c960657 - September 21, 2009 - 21:34
Status:needs work» needs review

Reroll.

#99

mattyoung - September 22, 2009 - 06:40

.

#100

c960657 - September 23, 2009 - 15:21

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-40.patch14.18 KBIdleFailed: Failed to apply patch.View details | Re-test

#101

System Message - September 29, 2009 - 16:50
Status:needs review» needs work

The last submitted patch failed testing.

#102

c960657 - September 29, 2009 - 18:56
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-41.patch15.04 KBIdleFailed: Failed to apply patch.View details | Re-test

#103

System Message - October 11, 2009 - 22:50
Status:needs review» needs work

The last submitted patch failed testing.

#104

c960657 - October 18, 2009 - 19:58
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-42.patch15.07 KBIdlePassed: 14707 passes, 0 fails, 0 exceptionsView details | Re-test

#105

c960657 - October 31, 2009 - 20:05

Updated version due to the API freeze. This patch is without the renaming of drupal_encode_path() back to drupal_urlencode() (see comment #69).

file_create_url-42.patch still works if an API change is acceptable.

AttachmentSizeStatusTest resultOperations
file_create_url-43.patch13.17 KBIdlePassed: 14729 passes, 0 fails, 0 exceptionsView details | Re-test

#106

Damien Tournoud - October 31, 2009 - 20:48

+++ includes/bootstrap.inc 31 Oct 2009 19:56:01 -0000
@@ -478,18 +478,32 @@ function drupal_environment_initialize()
+    $request_path = $_SERVER['REQUEST_URI'];
+    $query_pos = strpos($_SERVER['REQUEST_URI'], '?');
+    if ($query_pos !== FALSE) {
+      $request_path = substr($request_path, 0, $query_pos);
+    }

1. This is the equivalent of:

<?php
$request_path
= strtok($_SERVER['REQUEST_URI'], '?');
?>

2. There is no test for "exotic" URLs yet. We should test hitting one of those "exotic" URL using the internal browser.

This review is powered by Dreditor.

#107

c960657 - October 31, 2009 - 21:53

drupal_environment_initialize() now uses strtok() (like it did prior to #84).

The change to bootstrap.inc is tested in FileDownloadTest::testFileCreateUrl() when private files are used. However, I have added tests to modules/simpletest/tests/menu.test and modules/path/path.test for menu hooks and path aliases containing exotic characters.

AttachmentSizeStatusTest resultOperations
file_create_url-44.patch19.35 KBIdlePassed: 14728 passes, 0 fails, 0 exceptionsView details | Re-test

#108

Damien Tournoud - November 1, 2009 - 11:54

This looks very good. One minor nitpick:

+++ includes/bootstrap.inc 31 Oct 2009 21:47:41 -0000
@@ -478,18 +478,28 @@ function drupal_environment_initialize()
+    $base_path_len = strlen(rtrim(dirname($_SERVER['SCRIPT_NAME']), '\\/'));

In drupal_settings_initialize(), we use:

<?php
$dir
= trim(dirname($_SERVER['SCRIPT_NAME']), '\,/');
?>

I'm not sure exactly why, but we should be consistent here.

This review is powered by Dreditor.

#109

c960657 - November 1, 2009 - 13:16

I have changed both occurences to strip \/. I cannot imagine when it is relevant to strip ,. The line was introduced in #56634: Login and logout pb: inconsistent use of $base_url without any explanation. I also changed drupal_settings_initialize() to use rtrim() — SCRIPT_NAME always begins with a slash (not a backslash — those are only introduced by dirname()), so there is no need to strip it and add it in the following line.

AttachmentSizeStatusTest resultOperations
file_create_url-45.patch20.14 KBIdlePassed: 14776 passes, 0 fails, 0 exceptionsView details | Re-test

#110

chx - November 29, 2009 - 08:13
Status:needs review» needs work

Wooohoo very nice patch here. But

<?php
 
// flag (this was added in Apache 2.2.8), because mod_rewrite unescapes the
  // path before passing it on to PHP.
   
$_GET['q'] = substr(urldecode($request_path), $base_path_len + 1);
?>

(deleted lines iirrelevant) so why it is a problem that mod_rewrite unescapes the path if you unescape as well? I think you need to mention slashes in this comment and add/adjust some comments to .htaccess too.

#113

c960657 - November 29, 2009 - 21:12
Status:needs work» needs review

Like this?

AttachmentSizeStatusTest resultOperations
file_create_url-46.patch20.49 KBIdlePassed on all environments.View details | Re-test
 
 

Drupal is a registered trademark of Dries Buytaert.