I noticed that when webform.module receives a file upload containing a space in the filename, it emails a link with that space included in the URL instead of escaped as + or %20. For example:

Submitted values are:
File: http://mysite.com/files/webform/barrytest/Barry Test.jpg

I think the problem lies in file_create_url(). It says:

  switch (variable_get('file_downloads', FILE_DOWNLOADS_PUBLIC)) {
    case FILE_DOWNLOADS_PUBLIC:
      return $GLOBALS['base_url'] .'/'. file_directory_path() .'/'. str_replace('\\', '/', $path);

Shouldn't $path be urlencode()d there? The patch is trivial but I do not have time at the moment to check to see whether it will break anything.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bjaspan’s picture

Status: Active » Needs review
FileSize
795 bytes

Here's an untested patch. It's aganst D5.7 but applies against HEAD too.

bjaspan’s picture

FileSize
829 bytes

The previous patch is wrong. Here is a lightly tested version. I still have not verified that modifying file_create_url()'s output like this won't break anything, but at least the url is right now.

drewish’s picture

seems to work correctly for me, is there some reason we don't just pass the URL through url()? it would handle the escaping for us...

Wim Leers’s picture

drewish: that was also my first thought.

mooffie’s picture

Status: Needs review » Needs work

is there some reason we don't just pass the URL through url()?

At least four reasons:

1. url() doesn't urlencode() absolute URLs at all...which is what we plan to hand it.

...and if url() did urlencode() absolute urls:

2. "http://..." would turn, in the least horrific case, into "http%3A//", which would ruin it.

3. url() ultimately uses PHP's urlencode(), not rawurlencode(). The former translates spaces into '+'. When webservers serve files, they don't understand '+'. They must see '%20'. '+' is only valid in a query string.

4. urlencode() tranlates '/' into '%2F'. When serving files, Apache would return 404 for this. That's probably the reason Barry revised his patch.

As for the proposed patch:

It doesn't encode spaces in folder names, does it? So I'm marking this 'needs work'.

mooffie’s picture

I just saw the following issue:

#154245: problem with filenames with '%' character

I'd say these two issues can be merged. I'm for marking this one a 'duplicate'.

bjaspan’s picture

@#5: My patch does not address spaces in folder names because folder names are created on the server, not by an uploading user, so presumably the server knows what it is doing. I would not object to a patch that handled them, though.

@#6: I'm not familiar with the referenced issue but if it addresses the same problem and is older I have no obection to marking this issue as a duplicate.

Dries’s picture

Would be great if we could create a test case for this.

c960657’s picture

file-create-url-238299-2.patch adds a /. to the path for files in the files folder (i.e. not in subfolders) like this:
http://example.com/sites/example/files/./foo.txt

This is probably mostly harmless, but I assume it is not intended.

For testing purposes I have been using a file called foo . # %20 % & + ! ' $ %2F %26 %23 é.gif.

mgifford’s picture

I am interested in this solution. Far too often I'm having to tell folks how to create filenames that can be uploaded to the net neatly. I'd like to see the upload module to work like path_auto where it strips out all bad characters when it is creating the file. Just had a admin earlier today try to upload a file he'd named 'example #2.pdf' but really drupal should be smart enough to either provide an alert that this won't work, or rename it to 'example_2.pdf' which would be my preference.

Wouldn't it be nice if:
foo . # %20 % & + ! ' $ %2F %26 %23 é.gif.

became
foo_._!_e.gif

or something like that? Shouldn't human readable be a goal even for uploaded files?

c960657’s picture

Status: Needs work » Needs review
FileSize
2.99 KB

Here is an updated patch along with a unit test. A number of files with various evil filenames are created and then fetched through the webserver.

All tests passes on Linux. On Windows, some characters are not allowed in filenames AFAIK. Perhaps file_create_path() or similar could be extended to always return a valid filename no matter what garbage you throw at it. But this issue about filename clean-up is a subject worth its own bug.

So far the test only covers FILE_DOWNLOADS_PUBLIC (but this is not a problem wrt to the subject of this issue). I am not sure how to make a unit test implement hook_file_download() in order to make the files accessible via /system/files/...

c960657’s picture

I have now added tests for private files as well, with and without clean URLs enabled. This revealed a bug in drupal_urlencode() that I also fixed.

I had to change url() to make it not cache the output from variable_get('clean_url', 0) in a static variable. Without this change, url() wouldn't respond to changes to the clean_url variable after the first call to url(). The comment in url() claimed that the value was cached for performance reasons, but I doubt the change has any noticeable impact - variable_get() is extremely fast.

The patch doesn't break any existing tests. I do get 2 exceptions in each of the DBLog and User modules, but that also happens with a clean CVS check-out.

c960657’s picture

FileSize
8.81 KB

Looks like I forgot to upload the actual patch mentiond in #12.

c960657’s picture

FileSize
8.82 KB

Minor update (now uses $this->assertResponse(200)).

c960657’s picture

FileSize
10.76 KB

I installed Drupal on Windows to see how it deals with filenames containing funny characters. Appearently – at least on my WinXP machine with PHP 5.2.6 – PHP assumes that all files are encoded using the Windows-1252 character set. If Drupal writes a file using file_put_contents('flødeskum.txt', 'stuff'), the filename reads flødeskum.txt in Explorer. If I create a file using Explorer with a filename containing a character not in Windows-1252 (e.g. a chinese character), readdir() returns ? instead of that character, and glob() doesn't find the file at all!

So it looks like Drupal cannot save files using their proper names on Windows. But luckily it isn't a big issue. The audience (i.e. everybody else except the site owner) wont notice any (big) difference, though. One exception: When using public files, the URLs contain more %-encoded characters than they do on Linux, and if you try to decode the %-encoding, you will not get the original filename but a mangled version.

Well, this updated patch tries to make the best of it on Windows. It also strips characters that are not allowed in filenames on Windows (those are * : ? " < > |). All the supplied tests now pass on Windows as well as on Linux.

c960657’s picture

FileSize
10.54 KB

Slightly simplified by incorporating the patch from #284899: Drupal url problem with clean urls.

c960657’s picture

It looks as if file_create_url-6.patch double-encodes more characters than necessary, so please review file_create_url-5.patch instead.

c960657’s picture

PHP on Windows' lack of support for UTF-8 appears to be confirmed by the last comment in this bug report.

c960657’s picture

FileSize
10.76 KB

Chasing HEAD.

Everybody, please try out the patch and see if you can come up with a filename that it doesn't support. For instance, you can test it using the following steps:

  1. Enable Upload module.
  2. Create a content node.
  3. Attach a file to that node - try to upload files with "exotic" filenames, e.g. containing control characters, non-Latin characters, spaces etc.
  4. View the node and verify that you can download the file.

Passes all tests (except two that also don't pass from a clean CVS checkout - this seemingly unrelated problem is reported in #299290: User tests exceptions).

c960657’s picture

FileSize
10.93 KB

Chasing HEAD (following the big update of file.inc in #308434: Clean up file.inc ahead of hook_file and add unit tests.).

c960657’s picture

FileSize
11.67 KB

Chasing HEAD.

c960657’s picture

FileSize
11.45 KB

Chasing HEAD (following the big hook_file commit).

drewish’s picture

Status: Needs review » Needs work

i didn't look at the file.inc code skipping straight to the unit tests to see what the desired behavior was and had a very hard time following your unit tests. it seems like you've pushed too much code down into the _testFileCreateUrl() function. i'd suggest being more verbose and less "efficient" in terms of the code. helper functions are alright but the test should be a readable example of put in an input or series of inputs and check for an expected output.

feel free to hit me up on IRC if you'd like to discuss this.

c960657’s picture

Status: Needs work » Needs review
FileSize
13.95 KB

Reworked the unit tests based on IRC discussion with drewish.

drewish’s picture

The tests are much better now. They all pass on OSX with Apache2.

In file_create_url() I think I understand what's happening but I don't follow the comment:

+      if (substr(PHP_OS, 0, 3) == 'WIN') {
+        // PHP on Windows assumes that filenames are encoded in Windows-1252, but
+        // Drupal passes UTF-8-encoded filenames to filesystem functions, so
+        // non-US-ASCII characters end up mangled in the filesystem. We thus need
+        // to do the same mangling to the URLs in order for them to work.
+        $path = utf8_encode($path);
+      }

Looking at file_create_filename():

 function file_create_filename($basename, $directory) {
-  $destination = $directory . '/' . $basename;
+  if (substr(PHP_OS, 0, 3) == 'WIN') {
+    // These characters are not allowed in Windows filenames
+    $basename = str_replace(array(':', '*', '?', '"', '<', '>', '|'), '_', $basename);
+  }
+
+  $destination = strtr($directory . '/' . $basename, '\\', '/');

It seem to me that the strtr is really windows specific code that could be moved up into the block where we filter the windows blacklist.

drewish’s picture

c960657’s picture

FileSize
16.16 KB

The comment in file_create_url() is now more elaborate. Also, utf8_encode() is replaced with drupal_convert_to_utf8($path, 'Windows-1252'), because the former converts from ISO-8859-1. It actually turned out to mean a difference if one of the octets in the UTF-8-encoded string was in the range 0x80-0x9F where ISO-8859-1 and Windows-1252 differ.

Backslash is now only treated specially on Windows. On other platforms it is treated as any other character.

drewish’s picture

c960657, will this affect existing files?

c960657’s picture

The patch only affects URL generation and not how incoming requests are handled or how files are saved on the disk, so any existing URLs should continue to work.

The changes to the URL generation fixes all problems I could find using Apache 2.2 on WinXP and on Linux (on Linux I tested with PHP running as CGI as well as an Apache module). I cannot say how well it works on other platforms, though, but the unit tests will allow people to report any problems that may occur on their specific platform.

drewish’s picture

Status: Needs review » Postponed

i'd like to have this wait on #329226: Store file_test.module's values in variables rather than globals so we can pass the file_test_file_download() header via the file_test_set_return() function rather than adding a separate variable.

grendzy’s picture

c960657’s picture

FileSize
16.25 KB

Updated Doxygen comment for drupal_urlencode() (mod_rewrite actually unescapes all %-encoded characters).

drewish’s picture

Status: Postponed » Needs review
FileSize
12.77 KB

#329226: Store file_test.module's values in variables rather than globals got committed so here's a re-roll around it. all the tests look good.

drewish’s picture

aclight’s picture

Subscribing

Status: Needs review » Needs work

The last submitted patch failed testing.

drewish’s picture

Status: Needs work » Needs review

bumping after the test bot meltdown.

c960657’s picture

FileSize
14.78 KB
c960657’s picture

FileSize
14.77 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
14.8 KB

Reroll.

c960657’s picture

FileSize
14.56 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
14.56 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

FileSize
12.79 KB

Reroll.

Part of the previous patch was committed in #356721: Regression: Test failures with clean URLs, so there is now less to review :-)

c960657’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Closed (duplicate)

Work on this has proceeded in #284899: Drupal url problem with clean urls.

heacu’s picture

subscribe