Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I noticed that when webform.module receives a file upload containing a space in the filename, it emails a link with that space included in the URL instead of escaped as + or %20. For example:
Submitted values are:
File: http://mysite.com/files/webform/barrytest/Barry Test.jpg
I think the problem lies in file_create_url(). It says:
switch (variable_get('file_downloads', FILE_DOWNLOADS_PUBLIC)) {
case FILE_DOWNLOADS_PUBLIC:
return $GLOBALS['base_url'] .'/'. file_directory_path() .'/'. str_replace('\\', '/', $path);
Shouldn't $path be urlencode()d there? The patch is trivial but I do not have time at the moment to check to see whether it will break anything.
Comment | File | Size | Author |
---|---|---|---|
#46 | file_create_url-21.patch | 12.79 KB | c960657 |
#44 | file_create_url-20.patch | 14.56 KB | c960657 |
#42 | file_create_url-19.patch | 14.56 KB | c960657 |
#41 | file_create_url-18.patch | 14.8 KB | c960657 |
#39 | file_create_url-17.patch | 14.77 KB | c960657 |
Comments
Comment #1
bjaspan CreditAttribution: bjaspan commentedHere's an untested patch. It's aganst D5.7 but applies against HEAD too.
Comment #2
bjaspan CreditAttribution: bjaspan commentedThe previous patch is wrong. Here is a lightly tested version. I still have not verified that modifying file_create_url()'s output like this won't break anything, but at least the url is right now.
Comment #3
drewish CreditAttribution: drewish commentedseems to work correctly for me, is there some reason we don't just pass the URL through url()? it would handle the escaping for us...
Comment #4
Wim Leersdrewish: that was also my first thought.
Comment #5
mooffie CreditAttribution: mooffie commentedAt least four reasons:
1. url() doesn't urlencode() absolute URLs at all...which is what we plan to hand it.
...and if url() did urlencode() absolute urls:
2. "http://..." would turn, in the least horrific case, into "http%3A//", which would ruin it.
3. url() ultimately uses PHP's urlencode(), not rawurlencode(). The former translates spaces into '+'. When webservers serve files, they don't understand '+'. They must see '%20'. '+' is only valid in a query string.
4. urlencode() tranlates '/' into '%2F'. When serving files, Apache would return 404 for this. That's probably the reason Barry revised his patch.
As for the proposed patch:
It doesn't encode spaces in folder names, does it? So I'm marking this 'needs work'.
Comment #6
mooffie CreditAttribution: mooffie commentedI just saw the following issue:
#154245: problem with filenames with '%' character
I'd say these two issues can be merged. I'm for marking this one a 'duplicate'.
Comment #7
bjaspan CreditAttribution: bjaspan commented@#5: My patch does not address spaces in folder names because folder names are created on the server, not by an uploading user, so presumably the server knows what it is doing. I would not object to a patch that handled them, though.
@#6: I'm not familiar with the referenced issue but if it addresses the same problem and is older I have no obection to marking this issue as a duplicate.
Comment #8
Dries CreditAttribution: Dries commentedWould be great if we could create a test case for this.
Comment #9
c960657 CreditAttribution: c960657 commentedfile-create-url-238299-2.patch adds a
/.
to the path for files in the files folder (i.e. not in subfolders) like this:http://example.com/sites/example/files/./foo.txt
This is probably mostly harmless, but I assume it is not intended.
For testing purposes I have been using a file called
foo . # %20 % & + ! ' $ %2F %26 %23 é.gif
.Comment #10
mgiffordI am interested in this solution. Far too often I'm having to tell folks how to create filenames that can be uploaded to the net neatly. I'd like to see the upload module to work like path_auto where it strips out all bad characters when it is creating the file. Just had a admin earlier today try to upload a file he'd named 'example #2.pdf' but really drupal should be smart enough to either provide an alert that this won't work, or rename it to 'example_2.pdf' which would be my preference.
Wouldn't it be nice if:
foo . # %20 % & + ! ' $ %2F %26 %23 é.gif.
became
foo_._!_e.gif
or something like that? Shouldn't human readable be a goal even for uploaded files?
Comment #11
c960657 CreditAttribution: c960657 commentedHere is an updated patch along with a unit test. A number of files with various evil filenames are created and then fetched through the webserver.
All tests passes on Linux. On Windows, some characters are not allowed in filenames AFAIK. Perhaps file_create_path() or similar could be extended to always return a valid filename no matter what garbage you throw at it. But this issue about filename clean-up is a subject worth its own bug.
So far the test only covers FILE_DOWNLOADS_PUBLIC (but this is not a problem wrt to the subject of this issue). I am not sure how to make a unit test implement hook_file_download() in order to make the files accessible via /system/files/...
Comment #12
c960657 CreditAttribution: c960657 commentedI have now added tests for private files as well, with and without clean URLs enabled. This revealed a bug in drupal_urlencode() that I also fixed.
I had to change url() to make it not cache the output from
variable_get('clean_url', 0)
in a static variable. Without this change, url() wouldn't respond to changes to theclean_url
variable after the first call to url(). The comment in url() claimed that the value was cached for performance reasons, but I doubt the change has any noticeable impact - variable_get() is extremely fast.The patch doesn't break any existing tests. I do get 2 exceptions in each of the DBLog and User modules, but that also happens with a clean CVS check-out.
Comment #13
c960657 CreditAttribution: c960657 commentedLooks like I forgot to upload the actual patch mentiond in #12.
Comment #14
c960657 CreditAttribution: c960657 commentedMinor update (now uses $this->assertResponse(200)).
Comment #15
c960657 CreditAttribution: c960657 commentedI installed Drupal on Windows to see how it deals with filenames containing funny characters. Appearently – at least on my WinXP machine with PHP 5.2.6 – PHP assumes that all files are encoded using the Windows-1252 character set. If Drupal writes a file using
file_put_contents('flødeskum.txt', 'stuff')
, the filename readsflødeskum.txt
in Explorer. If I create a file using Explorer with a filename containing a character not in Windows-1252 (e.g. a chinese character), readdir() returns?
instead of that character, and glob() doesn't find the file at all!So it looks like Drupal cannot save files using their proper names on Windows. But luckily it isn't a big issue. The audience (i.e. everybody else except the site owner) wont notice any (big) difference, though. One exception: When using public files, the URLs contain more %-encoded characters than they do on Linux, and if you try to decode the %-encoding, you will not get the original filename but a mangled version.
Well, this updated patch tries to make the best of it on Windows. It also strips characters that are not allowed in filenames on Windows (those are
* : ? " < > |
). All the supplied tests now pass on Windows as well as on Linux.Comment #16
c960657 CreditAttribution: c960657 commentedSlightly simplified by incorporating the patch from #284899: Drupal url problem with clean urls.
Comment #17
c960657 CreditAttribution: c960657 commentedIt looks as if file_create_url-6.patch double-encodes more characters than necessary, so please review file_create_url-5.patch instead.
Comment #18
c960657 CreditAttribution: c960657 commentedPHP on Windows' lack of support for UTF-8 appears to be confirmed by the last comment in this bug report.
Comment #19
c960657 CreditAttribution: c960657 commentedChasing HEAD.
Everybody, please try out the patch and see if you can come up with a filename that it doesn't support. For instance, you can test it using the following steps:
Passes all tests (except two that also don't pass from a clean CVS checkout - this seemingly unrelated problem is reported in #299290: User tests exceptions).
Comment #20
c960657 CreditAttribution: c960657 commentedChasing HEAD (following the big update of file.inc in #308434: Clean up file.inc ahead of hook_file and add unit tests.).
Comment #21
c960657 CreditAttribution: c960657 commentedChasing HEAD.
Comment #22
c960657 CreditAttribution: c960657 commentedChasing HEAD (following the big hook_file commit).
Comment #23
drewish CreditAttribution: drewish commentedi didn't look at the file.inc code skipping straight to the unit tests to see what the desired behavior was and had a very hard time following your unit tests. it seems like you've pushed too much code down into the _testFileCreateUrl() function. i'd suggest being more verbose and less "efficient" in terms of the code. helper functions are alright but the test should be a readable example of put in an input or series of inputs and check for an expected output.
feel free to hit me up on IRC if you'd like to discuss this.
Comment #24
c960657 CreditAttribution: c960657 commentedReworked the unit tests based on IRC discussion with drewish.
Comment #25
drewish CreditAttribution: drewish commentedThe tests are much better now. They all pass on OSX with Apache2.
In file_create_url() I think I understand what's happening but I don't follow the comment:
Looking at file_create_filename():
It seem to me that the strtr is really windows specific code that could be moved up into the block where we filter the windows blacklist.
Comment #26
drewish CreditAttribution: drewish commentedalso wanted to cross link to #135359: URLs broken for absolute file system paths
Comment #27
c960657 CreditAttribution: c960657 commentedThe comment in file_create_url() is now more elaborate. Also, utf8_encode() is replaced with drupal_convert_to_utf8($path, 'Windows-1252'), because the former converts from ISO-8859-1. It actually turned out to mean a difference if one of the octets in the UTF-8-encoded string was in the range 0x80-0x9F where ISO-8859-1 and Windows-1252 differ.
Backslash is now only treated specially on Windows. On other platforms it is treated as any other character.
Comment #28
drewish CreditAttribution: drewish commentedc960657, will this affect existing files?
Comment #29
c960657 CreditAttribution: c960657 commentedThe patch only affects URL generation and not how incoming requests are handled or how files are saved on the disk, so any existing URLs should continue to work.
The changes to the URL generation fixes all problems I could find using Apache 2.2 on WinXP and on Linux (on Linux I tested with PHP running as CGI as well as an Apache module). I cannot say how well it works on other platforms, though, but the unit tests will allow people to report any problems that may occur on their specific platform.
Comment #30
drewish CreditAttribution: drewish commentedi'd like to have this wait on #329226: Store file_test.module's values in variables rather than globals so we can pass the file_test_file_download() header via the file_test_set_return() function rather than adding a separate variable.
Comment #31
grendzy CreditAttribution: grendzy commentedAnother cross-link: #278770: file_create_url only returns absolute URL (containing the domain name)
Comment #32
c960657 CreditAttribution: c960657 commentedUpdated Doxygen comment for drupal_urlencode() (mod_rewrite actually unescapes all %-encoded characters).
Comment #33
drewish CreditAttribution: drewish commented#329226: Store file_test.module's values in variables rather than globals got committed so here's a re-roll around it. all the tests look good.
Comment #34
drewish CreditAttribution: drewish commentedmarked #312354: Attaching files with [#?]-characters results in broken link as a dupe.
Comment #35
aclight CreditAttribution: aclight commentedSubscribing
Comment #37
drewish CreditAttribution: drewish commentedbumping after the test bot meltdown.
Comment #38
c960657 CreditAttribution: c960657 commentedUpdated to new comment convention (see #338403: Use {@inheritdoc} on all class methods (including tests)).
Comment #39
c960657 CreditAttribution: c960657 commentedReroll.
Comment #41
c960657 CreditAttribution: c960657 commentedReroll.
Comment #42
c960657 CreditAttribution: c960657 commentedReroll.
Comment #44
c960657 CreditAttribution: c960657 commentedReroll.
Comment #46
c960657 CreditAttribution: c960657 commentedReroll.
Part of the previous patch was committed in #356721: Regression: Test failures with clean URLs, so there is now less to review :-)
Comment #47
c960657 CreditAttribution: c960657 commentedComment #49
c960657 CreditAttribution: c960657 commentedWork on this has proceeded in #284899: Drupal url problem with clean urls.
Comment #50
heacu CreditAttribution: heacu commentedsubscribe