file_create_url should return valid URLs

bjaspan - March 24, 2008 - 23:04
Project:Drupal
Version:7.x-dev
Component:file system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:duplicate
Description

I noticed that when webform.module receives a file upload containing a space in the filename, it emails a link with that space included in the URL instead of escaped as + or %20. For example:

Submitted values are:
File: http://mysite.com/files/webform/barrytest/Barry Test.jpg

I think the problem lies in file_create_url(). It says:

<?php
 
switch (variable_get('file_downloads', FILE_DOWNLOADS_PUBLIC)) {
    case
FILE_DOWNLOADS_PUBLIC:
      return
$GLOBALS['base_url'] .'/'. file_directory_path() .'/'. str_replace('\\', '/', $path);
?>

Shouldn't $path be urlencode()d there? The patch is trivial but I do not have time at the moment to check to see whether it will break anything.

#1

bjaspan - March 24, 2008 - 23:07
Status:active» needs review

Here's an untested patch. It's aganst D5.7 but applies against HEAD too.

AttachmentSizeStatusTest resultOperations
file-create-url-238299-1.patch795 bytesIdleFailed: Failed to apply patch.View details | Re-test

#2

bjaspan - March 25, 2008 - 00:40

The previous patch is wrong. Here is a lightly tested version. I still have not verified that modifying file_create_url()'s output like this won't break anything, but at least the url is right now.

AttachmentSizeStatusTest resultOperations
file-create-url-238299-2.patch829 bytesIdleFailed: Failed to apply patch.View details | Re-test

#3

drewish - April 15, 2008 - 19:51

seems to work correctly for me, is there some reason we don't just pass the URL through url()? it would handle the escaping for us...

#4

Wim Leers - April 15, 2008 - 20:44

drewish: that was also my first thought.

#5

mooffie - April 17, 2008 - 20:11
Status:needs review» needs work

is there some reason we don't just pass the URL through url()?

At least four reasons:

1. url() doesn't urlencode() absolute URLs at all...which is what we plan to hand it.

...and if url() did urlencode() absolute urls:

2. "http://..." would turn, in the least horrific case, into "http%3A//", which would ruin it.

3. url() ultimately uses PHP's urlencode(), not rawurlencode(). The former translates spaces into '+'. When webservers serve files, they don't understand '+'. They must see '%20'. '+' is only valid in a query string.

4. urlencode() tranlates '/' into '%2F'. When serving files, Apache would return 404 for this. That's probably the reason Barry revised his patch.

As for the proposed patch:

It doesn't encode spaces in folder names, does it? So I'm marking this 'needs work'.

#6

mooffie - April 17, 2008 - 20:28

I just saw the following issue:

#154245: problem with filenames with '%' character

I'd say these two issues can be merged. I'm for marking this one a 'duplicate'.

#7

bjaspan - April 22, 2008 - 03:43

@#5: My patch does not address spaces in folder names because folder names are created on the server, not by an uploading user, so presumably the server knows what it is doing. I would not object to a patch that handled them, though.

@#6: I'm not familiar with the referenced issue but if it addresses the same problem and is older I have no obection to marking this issue as a duplicate.

#8

Dries - April 23, 2008 - 19:35

Would be great if we could create a test case for this.

#9

c960657 - April 24, 2008 - 21:04

file-create-url-238299-2.patch adds a /. to the path for files in the files folder (i.e. not in subfolders) like this:
http://example.com/sites/example/files/./foo.txt

This is probably mostly harmless, but I assume it is not intended.

For testing purposes I have been using a file called foo . # %20 % & + ! ' $ %2F %26 %23 é.gif.

#10

mgifford - June 7, 2008 - 18:27

I am interested in this solution. Far too often I'm having to tell folks how to create filenames that can be uploaded to the net neatly. I'd like to see the upload module to work like path_auto where it strips out all bad characters when it is creating the file. Just had a admin earlier today try to upload a file he'd named 'example #2.pdf' but really drupal should be smart enough to either provide an alert that this won't work, or rename it to 'example_2.pdf' which would be my preference.

Wouldn't it be nice if:
foo . # %20 % & + ! ' $ %2F %26 %23 é.gif.

became
foo_._!_e.gif

or something like that? Shouldn't human readable be a goal even for uploaded files?

#11

c960657 - August 14, 2008 - 21:40
Status:needs work» needs review

Here is an updated patch along with a unit test. A number of files with various evil filenames are created and then fetched through the webserver.

All tests passes on Linux. On Windows, some characters are not allowed in filenames AFAIK. Perhaps file_create_path() or similar could be extended to always return a valid filename no matter what garbage you throw at it. But this issue about filename clean-up is a subject worth its own bug.

So far the test only covers FILE_DOWNLOADS_PUBLIC (but this is not a problem wrt to the subject of this issue). I am not sure how to make a unit test implement hook_file_download() in order to make the files accessible via /system/files/...

AttachmentSizeStatusTest resultOperations
file_create_url.patch2.99 KBIdleFailed: Failed to apply patch.View details | Re-test

#12

c960657 - August 21, 2008 - 21:39

I have now added tests for private files as well, with and without clean URLs enabled. This revealed a bug in drupal_urlencode() that I also fixed.

I had to change url() to make it not cache the output from variable_get('clean_url', 0) in a static variable. Without this change, url() wouldn't respond to changes to the clean_url variable after the first call to url(). The comment in url() claimed that the value was cached for performance reasons, but I doubt the change has any noticeable impact - variable_get() is extremely fast.

The patch doesn't break any existing tests. I do get 2 exceptions in each of the DBLog and User modules, but that also happens with a clean CVS check-out.

#13

c960657 - August 23, 2008 - 20:27

Looks like I forgot to upload the actual patch mentiond in #12.

AttachmentSizeStatusTest resultOperations
file_create_url-2.patch8.81 KBIdleFailed: Failed to apply patch.View details | Re-test

#14

c960657 - August 23, 2008 - 20:48

Minor update (now uses $this->assertResponse(200)).

AttachmentSizeStatusTest resultOperations
file_create_url-3.patch8.82 KBIdleFailed: Failed to apply patch.View details | Re-test

#15

c960657 - August 25, 2008 - 20:42

I installed Drupal on Windows to see how it deals with filenames containing funny characters. Appearently – at least on my WinXP machine with PHP 5.2.6 – PHP assumes that all files are encoded using the Windows-1252 character set. If Drupal writes a file using file_put_contents('flødeskum.txt', 'stuff'), the filename reads flødeskum.txt in Explorer. If I create a file using Explorer with a filename containing a character not in Windows-1252 (e.g. a chinese character), readdir() returns ? instead of that character, and glob() doesn't find the file at all!

So it looks like Drupal cannot save files using their proper names on Windows. But luckily it isn't a big issue. The audience (i.e. everybody else except the site owner) wont notice any (big) difference, though. One exception: When using public files, the URLs contain more %-encoded characters than they do on Linux, and if you try to decode the %-encoding, you will not get the original filename but a mangled version.

Well, this updated patch tries to make the best of it on Windows. It also strips characters that are not allowed in filenames on Windows (those are * : ? " < > |). All the supplied tests now pass on Windows as well as on Linux.

AttachmentSizeStatusTest resultOperations
file_create_url-5.patch10.76 KBIdleFailed: Failed to apply patch.View details | Re-test

#16

c960657 - August 25, 2008 - 21:04

Slightly simplified by incorporating the patch from #284899: Drupal url problem with clean urls.

AttachmentSizeStatusTest resultOperations
file_create_url-6.patch10.54 KBIdleFailed: Failed to apply patch.View details | Re-test

#17

c960657 - August 26, 2008 - 10:56

It looks as if file_create_url-6.patch double-encodes more characters than necessary, so please review file_create_url-5.patch instead.

#18

c960657 - September 9, 2008 - 19:31

PHP on Windows' lack of support for UTF-8 appears to be confirmed by the last comment in this bug report.

#19

c960657 - September 13, 2008 - 15:03

Chasing HEAD.

Everybody, please try out the patch and see if you can come up with a filename that it doesn't support. For instance, you can test it using the following steps:

  1. Enable Upload module.
  2. Create a content node.
  3. Attach a file to that node - try to upload files with "exotic" filenames, e.g. containing control characters, non-Latin characters, spaces etc.
  4. View the node and verify that you can download the file.

Passes all tests (except two that also don't pass from a clean CVS checkout - this seemingly unrelated problem is reported in #299290: User tests exceptions).

AttachmentSizeStatusTest resultOperations
file_create_url-7.patch10.76 KBIdleFailed: Failed to apply patch.View details | Re-test

#20

c960657 - September 15, 2008 - 20:44

Chasing HEAD (following the big update of file.inc in #308434: Clean up file.inc ahead of hook_file and add unit tests.).

AttachmentSizeStatusTest resultOperations
file_create_url-8.patch10.93 KBIdleFailed: Failed to apply patch.View details | Re-test

#21

c960657 - September 24, 2008 - 20:45

Chasing HEAD.

AttachmentSizeStatusTest resultOperations
file_create_url-10.patch11.67 KBIdleFailed: Failed to apply patch.View details | Re-test

#22

c960657 - October 9, 2008 - 16:16

Chasing HEAD (following the big hook_file commit).

AttachmentSizeStatusTest resultOperations
file_create_url-11.patch11.45 KBIdleFailed: Failed to apply patch.View details | Re-test

#23

drewish - October 9, 2008 - 18:32
Status:needs review» needs work

i didn't look at the file.inc code skipping straight to the unit tests to see what the desired behavior was and had a very hard time following your unit tests. it seems like you've pushed too much code down into the _testFileCreateUrl() function. i'd suggest being more verbose and less "efficient" in terms of the code. helper functions are alright but the test should be a readable example of put in an input or series of inputs and check for an expected output.

feel free to hit me up on IRC if you'd like to discuss this.

#24

c960657 - October 9, 2008 - 23:29
Status:needs work» needs review

Reworked the unit tests based on IRC discussion with drewish.

AttachmentSizeStatusTest resultOperations
file_create_url-12.patch13.95 KBIdleFailed: Failed to apply patch.View details | Re-test

#25

drewish - October 10, 2008 - 00:21

The tests are much better now. They all pass on OSX with Apache2.

In file_create_url() I think I understand what's happening but I don't follow the comment:

+      if (substr(PHP_OS, 0, 3) == 'WIN') {
+        // PHP on Windows assumes that filenames are encoded in Windows-1252, but
+        // Drupal passes UTF-8-encoded filenames to filesystem functions, so
+        // non-US-ASCII characters end up mangled in the filesystem. We thus need
+        // to do the same mangling to the URLs in order for them to work.
+        $path = utf8_encode($path);
+      }

Looking at file_create_filename():

function file_create_filename($basename, $directory) {
-  $destination = $directory . '/' . $basename;
+  if (substr(PHP_OS, 0, 3) == 'WIN') {
+    // These characters are not allowed in Windows filenames
+    $basename = str_replace(array(':', '*', '?', '"', '<', '>', '|'), '_', $basename);
+  }
+
+  $destination = strtr($directory . '/' . $basename, '\\', '/');

It seem to me that the strtr is really windows specific code that could be moved up into the block where we filter the windows blacklist.

#26

drewish - October 14, 2008 - 20:42

#27

c960657 - October 20, 2008 - 19:26

The comment in file_create_url() is now more elaborate. Also, utf8_encode() is replaced with drupal_convert_to_utf8($path, 'Windows-1252'), because the former converts from ISO-8859-1. It actually turned out to mean a difference if one of the octets in the UTF-8-encoded string was in the range 0x80-0x9F where ISO-8859-1 and Windows-1252 differ.

Backslash is now only treated specially on Windows. On other platforms it is treated as any other character.

AttachmentSizeStatusTest resultOperations
file_create_url-14.patch16.16 KBIdleFailed: Failed to apply patch.View details | Re-test

#28

drewish - October 20, 2008 - 19:57

c960657, will this affect existing files?

#29

c960657 - October 20, 2008 - 21:14

The patch only affects URL generation and not how incoming requests are handled or how files are saved on the disk, so any existing URLs should continue to work.

The changes to the URL generation fixes all problems I could find using Apache 2.2 on WinXP and on Linux (on Linux I tested with PHP running as CGI as well as an Apache module). I cannot say how well it works on other platforms, though, but the unit tests will allow people to report any problems that may occur on their specific platform.

#30

drewish - November 3, 2008 - 02:09
Status:needs review» postponed

i'd like to have this wait on #329226: Store file_test.module's values in variables rather than globals so we can pass the file_test_file_download() header via the file_test_set_return() function rather than adding a separate variable.

#31

grendzy - November 5, 2008 - 21:02

#32

c960657 - November 7, 2008 - 17:15

Updated Doxygen comment for drupal_urlencode() (mod_rewrite actually unescapes all %-encoded characters).

AttachmentSizeStatusTest resultOperations
file_create_url-15.patch16.25 KBIdleFailed: Failed to apply patch.View details | Re-test

#33

drewish - November 8, 2008 - 04:09
Status:postponed» needs review

#329226: Store file_test.module's values in variables rather than globals got committed so here's a re-roll around it. all the tests look good.

AttachmentSizeStatusTest resultOperations
file_238299.patch12.77 KBIdleFailed: 7675 passes, 2 fails, 0 exceptionsView details | Re-test

#34

drewish - November 16, 2008 - 11:48

#35

aclight - November 16, 2008 - 14:32

Subscribing

#36

System Message - November 16, 2008 - 22:00
Status:needs review» needs work

The last submitted patch failed testing.

#37

drewish - November 17, 2008 - 08:03
Status:needs work» needs review

bumping after the test bot meltdown.

#38

c960657 - November 25, 2008 - 21:32

Updated to new comment convention (see #338403: Remove implementation of... comments from setUp(), getInfo() or tearDown()).

AttachmentSizeStatusTest resultOperations
file_create_url-16.patch14.78 KBIdleUnable to apply patch file_create_url-16.patchView details | Re-test

#39

c960657 - December 8, 2008 - 21:24

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-17.patch14.77 KBIdleFailed: Failed to install HEAD.View details | Re-test

#40

System Message - December 19, 2008 - 08:50
Status:needs review» needs work

The last submitted patch failed testing.

#41

c960657 - December 19, 2008 - 23:02
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-18.patch14.8 KBIdleFailed: Failed to apply patch.View details | Re-test

#42

c960657 - January 6, 2009 - 23:40

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-19.patch14.56 KBIdleFailed: Failed to install HEAD.View details | Re-test

#43

System Message - January 8, 2009 - 01:10
Status:needs review» needs work

The last submitted patch failed testing.

#44

c960657 - January 9, 2009 - 19:01
Status:needs work» needs review

Reroll.

AttachmentSizeStatusTest resultOperations
file_create_url-20.patch14.56 KBIdleFailed: Failed to apply patch.View details | Re-test

#45

System Message - January 10, 2009 - 14:05
Status:needs review» needs work

The last submitted patch failed testing.

#46

c960657 - January 10, 2009 - 15:16

Reroll.

Part of the previous patch was committed in #356721: Regression: Test failures with clean URLs, so there is now less to review :-)

AttachmentSizeStatusTest resultOperations
file_create_url-21.patch12.79 KBIdleFailed: Failed to apply patch.View details | Re-test

#47

c960657 - January 10, 2009 - 18:49
Status:needs work» needs review

#48

System Message - January 20, 2009 - 15:35
Status:needs review» needs work

The last submitted patch failed testing.

#49

c960657 - May 4, 2009 - 20:35
Status:needs work» duplicate

Work on this has proceeded in #284899: Drupal url problem with clean urls.

 
 

Drupal is a registered trademark of Dries Buytaert.