Mime filename should be double quoted to handle filenames with spaces
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
Firefox does not handle filenames with spaces . When a user clicks on an attachment with spaces, the filename is truncated to the first whitespace. While IE & Safari both handle this, Firefox refuses to accept mime headers with unquoted filename parameters. According to Firefox's bugzilla/knowledgebase, Firefox's behavior is the correct behavior and it's a problem with most webservers or web applications. This problem can be easily corrected by surrounding the filename parameter with double quotes.
Current Implementation
In upload_file_download()
'Content-Disposition: '. $disposition .'; filename='. $nameSould be
'Content-Disposition: '. $disposition .'; filename="'. $name . '"'However, this breaks if the filename has double quotes within the name.
I'm not sure how to fix this. I don't know if this is the responsibility of mime_header_encode() or upload_file_download(). I'm reporting this bug, and also soliciting help.

#1
I failed to reproduce this on FF 1.5.0.6
I could both upload a file with a space in the name and then download it again.
If FF can't handle filenames with spaces in surely your upload wouldn't work either. Can you upload? (if so, ur FF is half broken by definition!).
Like I said thou, failed to reproduce this.
regards,
--AjK
for info:-
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
#2
Thanks for testing this.
I think I forgot to mention a crucial detail. This only happens on PRIVATE downloads. If private upload is turned on, Drupal handles writing the MIME header.
Upload works fine in both cases.
#3
I changed my test server to use "private". Now when I test this, it still works, downloads no problems.
I note that when I click the link, drupal adds a + symbol into the space (ie, it gets files/test+file_0.txt ) even thou there deff is a space in the file name??
regards,
--AjK
#4
AjK,
Thanks once again for testing this.
Please note that the files directory must not be accessible, ie it must not be a directory within the directory of your Drupal installation. If you simply switched from "public" to "private", this does not make the directory unavailable.
Move the "files" directory outside the Drupal directory, then turn on private file uploads. The URL will change to "[domain name]/system/files/[filename]".
Also note that "+" is synonymous with "%20" or "whitespace".
Please try again and let me know the results. Thanks again for the help.
#5
Your not going to like this. I moved my files/ directory well out of the web tree (/usr/local/TEST82614/files) and I still cannot reproduce your bug. I even ran Ethereal and captured the packets between my workstation and test server,
regards,
--AjK
#6
I can reproduce this issue. I wonder if the difference is that in AjK's case, the file in question is a text file and so is being served with an inline disposition? It would be worth testing again with a file that is served as "attachment" instead.
#7
This might have something to do with the fact that mime_header_encode() does not consider spaces as characters that need encoding. However, I remember many people were annoyed when we applied this blindly to e.g. email subjects, so we can't just make it stricter. Perhaps we need to define a strict and loose version of this?
#8
I definitely experienced this with PDF files.
#9
Here's what upload.module should do (I have here the same problem as described above) :
- See page 3 of http://www.ietf.org/rfc/rfc2183.txt :
NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters)parameter value containing only non-`tspecials' characters SHOULD be
represented as a single `token'. A short parameter value containing
only ASCII characters, but including `tspecials' characters, SHOULD
be represented as `quoted-string'. Parameter values longer than 78
characters, or which contain non-ASCII characters, MUST be encoded as
specified in [RFC 2184].
So, there is 3 ways to encode the file name, and Drupal only handles 2 of them.
There should be in upload.module something like this :
if (node_access('view', $node)) {
$name = mime_header_encode($file->filename);
// BEGIN PATCH ****
if ( there is a tspecials character in $name ) $name = '"' . $name . '"';
// END PATCH ****
$type = mime_header_encode($file->filemime);
// Serve images and text inline for the browser to display rather than download.
$disposition = ereg('^(text/|image/)', $file->filemime) ? 'inline' : 'attachment';
return array(
'Content-Type: '. $type .'; name='. $name,
'Content-Length: '. $file->filesize,
'Content-Disposition: '. $disposition .'; filename='. $name
);
}
Where :
tspecials := "(" / ")" / "<" / ">" / "@" /"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
See page 12 of http://www.ietf.org/rfc/rfc2045.txt .
This include spaces because, as stated in rfc2183 : parameter value containing only non-`tspecials' characters SHOULD be represented as a single `token'.
Wish this helps solve the problem, because the problem is real around here too (Firefox, Ubuntu).
#10
Here's my current patch. Seems to work :
<?php
// PATCH : upload.module - drupal 4.7.3 ----------------------------------------
function upload_file_download($file) {
$file = file_create_path($file);
$result = db_query("SELECT f.* FROM {files} f WHERE filepath = '%s'", $file);
if ($file = db_fetch_object($result)) {
if (user_access('view uploaded files')) {
$node = node_load($file->nid);
if (node_access('view', $node)) {
// PATCH! ==============================================================
// CASE 1 : only ascii, one token, no tspecials.
$name = $file->filename;
if ( preg_match( '/[^\x20-\x7E]/', $name ) )
{
// CASE 2 : non-ascii --> needs encoding.
$name = mime_header_encode($file->filename);
}
else if ( preg_match( '/[ \(\)\<\>\@\,\;\:\\\"\/\[\]\?\=]/', $name ) )
{
// CASE 3 : n tokens, tspecials characters, SHOULD be 'quoted-string'.
// tsp : "(" ")" "<" ">" "@" "," ";" ":" "\" <"> "/" "[" "]" "?" "="
$name = "\"$name\"";
}
// =====================================================================
$type = mime_header_encode($file->filemime);
// Serve images and text inline for the browser to display rather than download.
$disposition = ereg('^(text/|image/)', $file->filemime) ? 'inline' : 'attachment';
return array(
'Content-Type: '. $type .'; name='. $name,
'Content-Length: '. $file->filesize,
'Content-Disposition: '. $disposition .'; filename='. $name
);
}
else {
return -1;
}
}
else {
return -1;
}
}
}
?>
#11
hello.
i also had a problem opening files with spaces in it.
oddly enough though, the problem only keeps appearing with the public setting.
when i changed to the private setting, it was okay. which is fine by me for now.
using 4.7.3 based off a wamp setup.
uploaded using core upload.module
localhosted, tested remotely.
#12
As I said before, this should be added to mime_header_encode() so it can be re-used (optionally!) by other code.
#13
I was also having problems with filenames with spaces being truncated in Firefox and the above patch has worked for me - thanks! My files are Private in a directory outside of the web directories. Just a thought - I have the Firefox extension DownThemAll! installed (brilliant when downloading from a document management repository) and I wondered whether this might have changed Firefox's default behaviour?
#14
An issue for me as well. Here is a patch file.
#15
this should be fixed in HEAD and then backported.
The patch doesn't conform to our coding style.
#16
@Steven: How you propose adding it to mime_header_encode() so that its use is optional?
@killes: What specifically is wrong with the coding style? The all caps CASE comments?
I've got 50 sites running 4.6 and 4.7, and this is affecting my users.
#17
Some things I caught:
- First comment should end with a period.
- Comments need cleanup, no CASE stuff, just explain in a clean sentence.
- String concat should look like '"'. $file->filename .'"'; with spaces between dots and non-quotes.
- Control structures should have a break after } so like...
if (cond) {
// Do something.
}
else {
// Something else.
}
http://drupal.org/node/318
#18
Here is the 4.7 patch re-rolled to fix the coding style, including the picayune missing period on the first comment (but not missing periods on other comments pre-existing in the module).
I would have created a 5.0 patch too, but this area of the upload.module has been extensively modified in 5.0, and the bug may not even exist there anymore. I don't have time to test it in 5.0 right now.
#19
I'll backport once this is committed to 5/HEAD.
#20
...
#21
I am unfortunately very confused. I just updated from Drupal 4.7.4 to 4.7.6. I see that the Content-Disposition lines have been removed, but I can't find any record of why. This causes files attached to content to always be downloaded as index.php, instead of the actual file name. I had previous hacked the module to fix the space problem by using the technique I posted at http://drupal.org/node/23759. I only now realize that that upload module is not the same as this upload module, which is frustrating.
#22
This patch adds an optional second parameter to mime_header_encode for quoted-string encoding of ASCII strings containing special characters (e.g. space). This would be helpful for contrib modules such as audio.
#23
This tweak also escapes the backslash, in addition to double quote.
#24
What needs to happen to get some movement on this issue? We have patches for both 4.7 and 5 posted here. This bug causes lots of problems with Firefox, because all modern file systems support filenames with embedded spaces.
#25
I think the regexp needs to be inverted. Instead of a blacklist of characters, let's have a whitelist. Only if the string consists of characters XYZ should we not quote it. Quoting isn't harmful anyway.
Also please pay attention to code style and readability... there are loads of unnecessarily escaped characters in there, especially in the regexp. The choice of regexp delimiter is also poor, as it forces more escaping. Regexps are unreadable enough as it is, let's not complicate them more.
We should also avoid phrasings like "special characters". This can mean a couple hundred things depending on context. Let's be clear what we're talking about: "make it safe for embedding as a MIME header token according to RFC XXXX" or something.
#26
The only reason I used a blacklist was to follow RFC 2183 -- http://drupal.org/node/82614#comment-143416 -- with regards to its "tspecials" characters. We could come up with our own more readable whitelist (maybe A-Za-z0-9 and _) but that wouldn't be strictly inline with the RFC..
#27
The only reason I used a blacklist was to follow RFC 2183 -- http://drupal.org/node/82614#comment-143416 -- with regards to its "tspecials" characters. We could come up with our own more readable whitelist (maybe A-Za-z0-9 and _) but that wouldn't be strictly inline with the RFC..
#28
Here's an updated patch for HEAD. I used a whitelist of OK alphanumerics (and underscore) rather than blacklisting the known "tspecials" characters.
#29
subscribing. looks like this'll get bumped to 7.
#31
wow.. this has been around for 3 yrs and still no fix.. scary.. subscribing..
#32
Still applies with offset but needs updating for coding standards concat operator spacing changes.
#33
Here ya go.
#34
Patch looks good and applies cleanly, but doesn't seem to actually solve the original issue...
The original issue, by the way, no longer applies to Firefox, as described, but does occur in Opera.
#35
1.) Looks like you've extended the API, but not actually used the extension. If you define a new parameter for the function, shouldn't it be used wherever the content-disposition header is generated?
2.) Perhaps the parameter should be called "$strict" or "$alphanumeric" instead of "$quoted"...
3.) Does the RFC forbid filenames being quote-encased if they *don't* contain a space? If it does not, why not apply the quotes universally and forget about checking this? All you need to do is
<?php'Content-Disposition: '. $disposition .'; filename="'. addslashes($name) . '"'
?>
As in the original issue, but with addslashes to escape double-quotes and backslashes.
#36
There is no longer any place in Drupal core where this is a problem. It's an API extension for contrib modules which do pass along arbitrary file names and such in headers.
I don't care what the parameter is called. Basically it should not be set for headers such as an ASCII e-mail subject where spaces do not need to be wrapped in quotes.
Yes according to the RFC you should only add the quotes when needed. Of course the main standard for e-mail and HTTP-related RFCs is interoperability not strictly following RFCs to the letter.
#37
I haven't tested D7 thoroughly, but this is definitely still a problem in 6.6 for certain browsers (Opera, FF<3), and neither upload_file_download() nor mime_header_encode() have changed according to api.drupal.org.
I've tried several approaches and combinations of approaches in this thread, and have not yet found a way to reliably allow spaces in filenames when the download method is private.
Can anyone else confirm the issue still exists?
#38
You can see here http://api.drupal.org/api/function/upload_file_download/4.7 that file names were passed in the HTTP headers but I was under the impression they no longer are in drupal core. Contrib modules such as audio module on the other hand do need to pass arbitrary file names in headers.
#39
OK, my mistake; apparently the old 'bug' has been replaced with a new one; I get same problem now even with a 'space-free' filename. I'll report back after further testing and/or start a new issue.
As for mfb's latest patch, it does extend the API as needed and applies cleanly, so I'm marking RTBC.
#40
Sorry but this really needs some unit tests.
#41
Well actually it looks like all of unicode.inc needs unit tests..
#42
Just reporting back as promised: my look-a-like issue was actually being caused by an error in fckeditor_file_download(), which has been fixed in the most recent rc.
#43
see #276453: Tests needed: unicode.inc
#44
Added a bare minimum of unit tests..
#45
The unit tests look good but I don't think the docs for the $quoted parameter will pass snuff with the core committers:
+ * @param boolean $quoted+ * If TRUE, perform quoted-string encoding of non-alphanumeric ASCII.
The boolean after @param doesn't seem to match the coding standards. And the description doesn't really make sense. What is quoted-string encoding? When and why would I want to use it? When would I not?
Edit: I forgot to mention that once that's corrected I'd say this is RTBC.
#46
bleh, too many coding standards.. :p
BTW it might be cool, if mime_header_encode() supported an array of header field parameters as input; these would automagically be assembled into a valid header string. I.e.
<?phpmime_header_encode(array('attachment' => NULL, 'filename' => $filemime, 'modification-date' => date('r')));
?>
<?php'attachment; filename=' . mime_header_encode($filename, TRUE) . '; modification-date=' . mime_header_encode(date('r'), TRUE);
?>
#47
That looks a lot better, could we reference the appropriate RFC in the comments? I'm assuming that would be 2045...
#48
There are at least 4 RFCs that come into play, but seems like 2045 is the best.
#49
The last submitted patch failed testing.
#50
#51
It's a good idea to improve standard compliance. I'm not sure about the name of the "quoted" parameter, but I can't think of a better one.
There are no code-style problems and the tests pass, so RTBC.
However, you may have to replace test strings like "my test.txt" with something randomly generated, which seems to be the norm,
#52
Any chance of getting this into a drupal 6.x release?
#53
The last submitted patch failed testing.
#54
#55
Not to drag out a 3-year old issue even longer, but...
Could someone help me out with some clear steps and/or sample code to reproduce this? Tried the steps described by John Hwang (private files, non-web-accessible directory) with upload.module in both Firefox 3.5rc1 and Safari 4.0, the file gets auto-%20ed.
I saw in #36 that this is no longer reproduceable in core. Not to be dense, but then why can't contrib do whatever upload module is doing to avoid the problem?
You can also read this as "We need a more 'real world' test." The current tests really are just testing PHP at this point.
#56
As I mentioned at #36, this is an API feature that currently has no bearing on core that I know of, but is of use to contrib modules that need to pass along arbitrary strings in headers.
#57
Since this isn't used in core I should add a test module, similar to file_test.module. Or, perhaps I could just reuse file_test.module and add a test to file.test.