Problem/Motivation
Non-US-ASCII characters are stripped out of uploaded filenames. PHP's basename() does not properly support streams or filenames beginning with international characters.
Proposed resolution
Replaces all instances of basename() with wrapper function drupal_basename(). Revised function strips unsafe characters and preserves international characters; includes a check for directory separator characters.
New tests and test changes:
- create, use, save, upload, download, and copy files with international filenames
- generation of URL for file with international filename
- 404 while fetching nonexistant randomly named file including international characters
- correction to recombine chunked headers in DrupalWebTestCase::curlHeaderCallback()
Remaining tasks
Needs review and documentation for drupal_basename().
User interface changes
None.
API changes
Adds drupal_basename().
Original report by [OnkelTem]
Hi,
I'm using D5 and got much troubles yesterday with uploading files with non-latin characters in file names: all of them had been stripping. This behaviour reproduces on both my Debian Etch servers with PHP5 (but not with PHP4).
After investigations I found, that standard PHP functions basename() is in charge for stripping. The comments on php.net conforms this, basename() is not safe:
http://php.net/manual/en/function.basename.php
Passing UTF-8 string like '????.txt' to basename() results to 'txt', cutting off filename.
Looks like there are no appropriate solutions or workarounds and basename() simply should be replaced with alternative.
Comment | File | Size | Author |
---|---|---|---|
#135 | blackout2.patch | 28.94 KB | droplet |
#133 | blackout.patch | 28.42 KB | droplet |
#133 | interdiff-between-D8.patch | 9 KB | droplet |
#132 | blackout.patch | 27.61 KB | droplet |
#132 | interdiff-between-D8.patch | 8.39 KB | droplet |
Comments
Comment #1
OnkelTem CreditAttribution: OnkelTem commentedUPD.
basename() is locale dependent.
For example, this works, no stripping:
setlocale(LC_ALL, 'ru_RU.UTF-8');
$file->filename = trim(basename($_FILES["files"]["name"][$source]), '.');
This doesn't:
setlocale(LC_ALL, 'C');
$file->filename = trim(basename($_FILES["files"]["name"][$source]), '.');
Comment #2
OnkelTem CreditAttribution: OnkelTem commentedchx ( http://drupal.org/user/9446 ) has created a replacement for basename() called drupal_basename().
Code is:
function drupal_basename($path,$prefix = '') {
$path = preg_replace('|^.+[\\/]|', '', $path);
if ($prefix) {
$path = preg_replace('|'. preg_quote($prefix) .'$|', '', $path);
}
return $path;
}
Function goes to file.inc (is it good)?
My first patch. Replaced all occurences of basename() in 'includes' and standard 'modules' dirs.
This patch is only for current Drupal 5.7 release!!!
Comment #3
chx CreditAttribution: chx commentedThis is against HEAD with a little doxygen.
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commentedPatch does not apply (patch: **** malformed patch at line 100: Index: includes/locale.inc), but both the diagnostic and the implementation looks sane.
Also: this is a perfect usecase of unit testing. So this shouldn't go in without a good test plan.
Comment #5
chx CreditAttribution: chx commentedHere is a better patch. Damien, can you try writing test cases? Stuff like \0 in file names... maybe make them consisting all of 256 bytes.
Comment #6
drewish CreditAttribution: drewish commentedsubscribing
Comment #7
c960657 CreditAttribution: c960657 commentedFor paths like "/a/" and "/a///", the proposed drupal_basename() returns the empty string, but PHP's basename() returns "a". Is this intentional?
Comment #8
drewish CreditAttribution: drewish commentedreally needs some documentation for the parameters of drupal_basename. we could just copy and paste from php's docs.
Comment #9
drewish CreditAttribution: drewish commentedHere's a re-roll that fixes the failed hunk in scripts/run-tests.sh and adds some PHPDocs.
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedComment #11
drewish CreditAttribution: drewish commentedNow that file.test landed I guess we could write some tests for this...
Comment #12
webchickYes, tests please. This looks like something relatively obscure which would not come up in the course of regular usage. Perfect for a test case.
Comment #13
drewish CreditAttribution: drewish commentedi'll write some tests once #203204: Uploaded files have the permissions set to 600 gets committed
Comment #14
drewish CreditAttribution: drewish commentedi think i'd put the wrong issue in the above post... i don't see what the that has to do with this issue. probably one of the other file_test.module patches.
Comment #15
munzirtaha CreditAttribution: munzirtaha commentedsubscribing
Comment #16
hapydoyzer CreditAttribution: hapydoyzer commentedsubscribing
Comment #17
OnkelTem CreditAttribution: OnkelTem commentedPatch for D6.13
Comment #18
Jean-Philippe Fleury CreditAttribution: Jean-Philippe Fleury commentedsubscribing
Comment #19
Gozi CreditAttribution: Gozi commentedThe early pasted and linked chx's safe baseneme function not fully compatible the PHP original basename. I wrote own UTF-8 safe basename function, that works without problem.
Comment #20
c960657 CreditAttribution: c960657 commentedThis should probably be combined with the new dirname() implementation in #701358: The file API presumes a hiearchical file storage.
Note that on Windows, dirname() also treats backslash as a directory separator.
Comment #21
Ivan Simonov CreditAttribution: Ivan Simonov commentedsubscribing
duplicates of this issue:
#712834: Upload module cuts Cyrillic filenames.
#724748: Drupal-7 cuts Cyrillic filenames
Comment #22
GoofyX CreditAttribution: GoofyX commentedSubscribing...
Comment #23
c960657 CreditAttribution: c960657 commentedNew shot.
Comment #24
bloodmud CreditAttribution: bloodmud commented#17: drupal_6_basename.patch queued for re-testing.
Comment #25
c960657 CreditAttribution: c960657 commentedReroll.
Comment #27
kotnik CreditAttribution: kotnik commentedJust came from #853994: File names aren't transliterated, so server configuration needs to support UTF8 filenames., and patch from #25 doesn't apply to HEAD.
Comment #28
kotnik CreditAttribution: kotnik commentedOkay, here's reworked patch from #25 that applies to current HEAD.
Comment #29
kotnik CreditAttribution: kotnik commentedMarking for review.
Comment #31
andypost#853994-21: File names aren't transliterated, so server configuration needs to support UTF8 filenames.
Supposed to add hook_requirements() about result of locale()
Comment #32
Damien Tournoud CreditAttribution: Damien Tournoud commented@andypost: no, we don't want to rely on the broken locale feature.
Comment #33
kotnik CreditAttribution: kotnik commentedFor the hell of it, I simply can't figure out why did the patch from #28 failed MimeType test.
Anybody got a clue?
Comment #34
andypost@kotnik test falls because simpletest stream wrapper is not registered... strange. Broken function file_get_mimetype()
You are using file_uri_target() in drupal_basename() with none-existent schema (simpletest://)
Try to run File API - File mimetypes test locally
Call to a member function getTarget() on a non-object
EDIT: found strange simpletest_test_stream_wrappers() in simpletest.module
Comment #35
andypostChanged function file_uri_target() to work with DrupalLocalStreamWrapper class because getTarget() method could not be implementation specific so we can use it's static implementation
Comment #36
kotnik CreditAttribution: kotnik commentedAndypost, thanks for explanation and for fixing the patch.
Comment #37
andypostSuppose this major because drupal used in multilingual sites
Comment #38
sun(minor) Wrong string concatenation spacing here.
Aside from that, this patch pretty much simply replaces every basename() with with drupal_basename() throughout core. Bonus points for @ingroup php_wrappers.
Thus, RTBC.
Powered by Dreditor.
Comment #39
andypostRe-roll against current HEAD with fixed string concatenation pointed in #38
Comment #40
sunDang. webchick correctly asked for tests for drupal_basename().
Comment #41
andypostAdded non-latin prefix for generated filename to FileTestCase::createFile()
This prefix contains spaces to make sure that all file-related tests work fine.
Also I change a FileSaveDataTest::testWithFilename() to use non-latin file name because this issue starts
Changed FileTokenReplaceTestCase::testFileTokenReplacement() to test non-latin names
Fixed RetrieveFileTestCase::testFileRetrieving() because non-latine names returned as urlencoded and this also related to this issue
Comment #42
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis looks very good, but I don't understand this change:
The initial code looks correct to me.
Comment #43
drewish CreditAttribution: drewish commenteddoes this intersect with #858528: file_uri_target() purpose is unclear at all?
Comment #44
andypost@Damien Tournoud, This is a same as file_get_mimetype()
So it's just fallback to base implementation if no wrapper-class defined
Should we make getTarget() is static method?
For example FileMimeTypeTest::testFileMimeTypeDetection() is using simpletest wrapper
Which has no implementation, so we should use Base class implementation
Comment #46
andypostAfter talking with DamZ on IRC we decide to change schema in testFileMimeTypeDetection() to 'public://' because 'simpletest://' has no implementation and file_uri_target() leave as is.
Comment #47
andypostFixed broken test, non-latin names should be encoded.
Comment #49
andypostNew strange headers appears $11 and $15
' =?UTF-8?B?V0k=?=" '
So explode() alerts notice so I add ':' if header have no ':'
Comment #50
andypostChanging status
Comment #51
Damien Tournoud CreditAttribution: Damien Tournoud commentedThat's because the mime encoder is splitting the header in two lines :(
Comment #52
mikeytown2 CreditAttribution: mikeytown2 commentedfyi; another one that doesn't work right is "drupal_realpath". Found that out via lots of tests to the test bot.
Look at the previous patches in here: http://drupal.org/node/818818#comment-3116590
Comment #53
andypostWe should find culprit of this headers
headers definition http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
Comment #54
andypostThis caused by mime_header_encode() which splits header on chunks but header function does not join them back
Comment #55
andypostFinally fixed version.
- Fixed DrupalWebTestCase::curlHeaderCallback() to join chunks
- Added test to check filename transfered correctly via private download method (this one uses mime_header_decode to make sure that header encoded correctly)
Comment #56
andypostFixed tab character in DrupalWebTestCase::curlHeaderCallback()
Comment #58
andypostSame patch with debug
Comment #60
andypostSuppose now should work, first header "HTTP/" should be excluded and end of headers detected with trim()
PS same patch for simpletest module #869408: curlHeaderCallback() should join headers if no colon found
Comment #61
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe criteria for a follow-up header is not "it doesn't contain ':'", but "it starts with at least one SP or HT".
Comment #62
andypostChanged comment and condition
Comment #63
dhthwy CreditAttribution: dhthwy commentedNice patch. Here's a few problems I found:
This should probably read:
Alternative:
Alternative:
Alternative:
Alternative:
Comment #64
dhthwy CreditAttribution: dhthwy commenteddang I was hoping I'd get that in before you rolled another one.
Comment #65
Damien Tournoud CreditAttribution: Damien Tournoud commentedGetting closer!
This needs to read
array_pop($this->headers) . ' ' . trim($header);
and the other should betrim($header)
We need to normalize the whitespace here, because:Comment #66
andypost@dhthwy, thanx a lot for wording!
@Damien Tournoud, Fixed, but not sure that we need a whitespace normalization here because mime_header_decode() cares about it.
Comment #67
andypost#66: 278425-drupal_basename7-d7.patch queued for re-testing.
Comment #68
andypost#66: 278425-drupal_basename7-d7.patch queued for re-testing.
EDIT: Commited #858528: file_uri_target() purpose is unclear
Comment #69
andypost#66: 278425-drupal_basename7-d7.patch queued for re-testing.
Comment #71
andypostRe-roll against HEAD
Comment #72
andypostAnother round, missed 2 hunks
Comment #74
andypostChanged a test testPrivateFileTransfer() to check the contents of file against checking headers.
Comment #75
andypostMaybe better to raise this to critical bacause none-ascii file uploads still broken
Comment #76
c960657 CreditAttribution: c960657 commentedThe latest reroll no longer checks the Content-Disposition header. Is that intentional?
Comment #77
andypostWhile debuging tests I found that this header is not send on test run. You can see this in #72 testing results
Comment #78
wwwwoicn CreditAttribution: wwwwoicn commentedSubscribing
Comment #79
droplet CreditAttribution: droplet commented- upload non-latin filenames, worked
- download, worked
- display, worked
- unlink, worked
patch with lots hunks but succeeded apply, should it Re-roll ?
Comment #80
andypostProbably this could go into 7.1
Comment #81
catchComment #82
chx CreditAttribution: chx commentedI have reopened http://bugs.php.net/bug.php?id=37738 but I do not hope for a quick resolution there and if there would be one it'd take years before we could use that PHP version (remember, we dropped to 5.2.4-ubuntu-patched as the minimal PHP from 5.2.5...) so yeah let's go with this version.
Comment #83
Damien Tournoud CreditAttribution: Damien Tournoud commentedI don't see the point of this. Could we please remove it?
Comment #84
c960657 CreditAttribution: c960657 commented@Damien: The purpose is to make sure that
drupal_basename('public://')
returns '', not 'public:'.Comment #85
Damien Tournoud CreditAttribution: Damien Tournoud commentedI see. We could really use unit tests here.
Comment #86
andypostReroll against D8 with removal of trim suggested by Damien
@Damien Any ideas for unit tests? Default basename() implementation is not trimming scheme so fixed
@chx Another duplicate of php bug http://bugs.php.net/bug.php?id=37268
Comment #87
andypostSame patch for D7, additionally fixes hunk within user.install
#86 and #87 has 2 new hunks at modules/update/update.module
Comment #88
andypostAny reviews? This bug is really annoying for me
Comment #89
sunTook me some good minutes to figure out what is actually being done here, and why.
Can we add three inline comments, please? (one comment per atomic operation block; separator setup + trim, basename, suffix)
Also, the ternary preg_match() logic is extremely compact. Nice trick -- but when writing this as a regular if control structure, then we could skip the additional $suffix operation if $uri is '' empty anyway.
5 days to next Drupal core point release.
Comment #90
droplet CreditAttribution: droplet commentedreroll only.
Comment #91
xlsoul CreditAttribution: xlsoul commentedAwesome! That PATCH is ok for my DRUPAL7, Thankyou!
But I have to say it is a very important BUG in DRUPAL, and It must be pay attention to.
Comment #92
webchickMarking needs review.
Comment #93
xjmTagging issues not yet using summary template.
Comment #94
097633 CreditAttribution: 097633 commented#74: 278425-drupal_basename10-d7.patch queued for re-testing.
Comment #95
andypostneeds docs and #89
Comment #96
droplet CreditAttribution: droplet commentedIt's another issue.
While your file system / OS isn't UTF-8, you will failed to upload a file with any UTF-8 characters.
eg.
After patched:
€ = failed.
€-abc.png = failed.
Before patched
€ = failed.
€-abc.png = okay. (renamed to -abc.png)
Probably it will got error on Windows.
Comment #97
catchI don't see any mention of http://drupal.org/project/transliteration on this issue, I've not used that module before but it exists specifically for this problem afaik.
Comment #98
droplet CreditAttribution: droplet commentedtransliteration converted filename to US-ASCII characters.
Comment #99
andypostWithout transliteration module we can't get actual filename if none-ascii letters are used in it. With transliteration we never get actual filename too :(
EDIT: actual means the original filename of file been uploaded.
This mostly usefull when filename is descriptive and has more than one word, without this patch I cant manage none-ascii filenames because core returns them trimmed to last word
Comment #100
aaron CreditAttribution: aaron commentedsubscribe
Comment #101
OnkelTem CreditAttribution: OnkelTem commentedUpdated patch from #87 for current stable D7 release - 7.8. Two hunks was failing on some reason.
Comment #102
Caligan CreditAttribution: Caligan commentedAdded issue summary.
Comment #103
andypostThanx for Summary, @Marie Wendel
Patch needs re-roll for D8 'cos core now lives in core folder
Comment #104
realitylooprerolled for /core
Comment #106
andypostAdded inline comments inside drupal_basename() (needs review and probably correction)
Added hunk for new image width test
PS: D7.9 patch also needs re-roll
Comment #107
andypostD7 re-roll
Comment #108
andypostActually needs review
Comment #109
droplet CreditAttribution: droplet commentedThe patch is okay but as my comment #96, it still failed with non-UTF8 system. While upload is failed, we get following message:
For end users, they don't know why it can't move. I think we could suggest users to rename and try again.
Comment #110
andypostAnother way is to store filenames as hash but this will require to bootstrap core to retirn public files
Comment #111
chx CreditAttribution: chx commentedWhile this patch looks OK (didnt review deeply so not RTBC'ing) the "If your file system / OS isn't UTF-8, uploading a file with any >127 UTF-8 characters fails." is a separate issue.
Comment #112
droplet CreditAttribution: droplet commented#106: 278425-drupal_basename12.patch queued for re-testing.
Comment #113
chx CreditAttribution: chx commented+ $filename = preg_replace('@' . preg_quote($suffix) . '$@', '', $filename);
should be preg_quote($suffix, '@'). There's a drupal_web_test_case.php hunk in the patch which looks good but a separate issue. And, it needs tests or discussion why we can't test this (locale dependency might stop testability).
Comment #114
rogical CreditAttribution: rogical commentedpatched in #107 tests well,
can't wait to remove the transliterlation.
Comment #115
droplet CreditAttribution: droplet commentedreroll +@113
we already added non-latin tests.
non-latin test
-1 days to next Drupal core point release.
Comment #116
droplet CreditAttribution: droplet commentedI cancelled one interdiff but also showed. 2 interdiff are same but with diff format.
Comment #118
droplet CreditAttribution: droplet commentedComment #120
gghchem CreditAttribution: gghchem commentedI am using windows xp sp3 and drupal 7.10,after patched the patch 278425-drupal_basename12-d7.patch,I find it's not works well.
sometimes I can upload a file and always i cannot upload the files with chinese names.It always shown:
Warning: move_uploaded_file(E:\xampp\htdocs\eee\sites\default\files\bee/北京市西城区2008—2009学年度初三年级第一学期期末测试化学试卷.doc) [function.move-uploaded-file]: failed to open stream: Invalid argument 在 drupal_move_uploaded_file() (行 1597 在 E:\xampp\htdocs\eee\includes\file.inc).
Warning: move_uploaded_file() [function.move-uploaded-file]: Unable to move 'E:\xampp\tmp\php6F.tmp' to 'E:\xampp\htdocs\eee\sites\default\files\bee/北京市西城区2008—2009学年度初三年级第一学期期末测试化学试卷.doc' 在 drupal_move_uploaded_file() (行 1597 在 E:\xampp\htdocs\eee\includes\file.inc).
文件上传错误。无法移动上传的文件。
I had asked so many pepole to help me,but the problem is still on.
Comment #121
aspilicious CreditAttribution: aspilicious commented1) You only should assign bugs to you when you're going to fix them
2) Please leave as major...
3) This needs to be fixed in D8 first, will be backported soon after
If you could install an empty D8 site and test the patch in #115 chances are bigger this will get commited to D7 soon.
Comment #122
rogical CreditAttribution: rogical commentedsee your path: E:\xampp\htdocs\eee\sites\default\files\bee/北京市西城区2008—2009学年度初三年级第一学期期末测试化学试卷.doc
this is not correct, but I just confused why it outputs an error path.
Comment #123
rogical CreditAttribution: rogical commentedsorry, this is an cocurrent posting.
Comment #124
aspilicious CreditAttribution: aspilicious commentedOw gghchem you have problems because your temporary folder is incorrect. You should change it in the file system settings to temp or something like that (without '/').
Comment #125
aspilicious CreditAttribution: aspilicious commentedStill not critical...
Comment #126
droplet CreditAttribution: droplet commentedRead from #96, actually it doesn't support Windows very well. (maybe Windows Sever works). anyone try to test with Mac OS ?
(It's a well-known PHP problem. PHP6 will has a better support of UTF8)
Comment #127
biydng CreditAttribution: biydng commentedI am using windows 7 Home and drupal 7.10. and ran the patch 278425-drupal_basename-115.patch. When I tried to upload a file with Chinese file name to a node, the file name appeared on the screen and the one saved in database are all in correct chinese characters, but the actual file saved in the public file directory is gibberish: for example 智慧.pps instead of 智慧.pps. This caused the "Page not found" error when I view the node and clicked on the file attachment link. The link path has the correct url encoded path and Chinese character filename. See the first two screenshots attached.
I have a feeling it is my server or php module that is messing up the encoding of the filename. I am using PHP 5.2.17 and apache 2.2 and I have set default charset to utf-8 in both. I noticed that when I opened the Apache online documentation, it is also in that weird encoding. Please help! Thanks!
Comment #128
tim.plunkett#127, please read #121
Comment #129
droplet CreditAttribution: droplet commentedstay away Windows. or save filename as transliteration/hash and a preprocess script to restore original file name during download. even if you use Linux, the latter is the best IMO.
我們先解決 Drupal 8,再修正 Drupal 7,不要變更主題狀態 :)
在 Windows 之下很容易出錯,盡量避免使用 Windows
或者自已寫一段程式,將檔案名以音譯或 HASH 保存,在下載時才換回正確檔案名 ( 像 Discuz 這類的處理方法 )
#113 code reviewed ?
#114 test reviewed. + myself.
ready to go ?
Comment #130
tim.plunkettThis looks good to me. The comments make sense, the function makes sense, and the replacements are good.
Comment #131
catchOK read through this, code has had lots of review, most of the patch is find/replace and there's added test coverage for the bug.
Committed/pushed to 8.x., moving to 7.x for backport. This will need a change notice once it's been committed to 7.x (or someone could do that now too of course).
Comment #132
droplet CreditAttribution: droplet commentedComment #133
droplet CreditAttribution: droplet commentedmissed one file.
Comment #134
oriol_e9gis not needed here?
user.install
Comment #135
droplet CreditAttribution: droplet commentedI think all file management stuff are needed
Comment #136
andypostManually checked all places in D7 for #135 patch - except extended comments for introduced drupal_basename() and small fix for #113 it's the same as #107
we really need change notice which could be compiled from issue summary and http://bugs.php.net/bug.php?id=37738
Great to see patch commited before upcoming 7.11 release
Comment #137
webchickRead through the patch and the comments here. Lots of good work. It's unfortunate we need to do an API change for this. :( But according to https://bugs.php.net/bug.php?id=37738, this is not getting fixed in PHP anytime soon.
Committed and pushed to 7.x.
We need a change notice for this, since this will affect contributed modules.
Comment #138
chx CreditAttribution: chx commentedhttp://drupal.org/node/1424840
Comment #139
nicxvan CreditAttribution: nicxvan commentedComment #141
migimaru CreditAttribution: migimaru commentedIs there a backport for D6?
If not, is it safe to simply copy the drupal_basename() function from the above patch and then change all uses of basename() in all files to drupal_basename()?
Comment #142
xjmComment #143
nicxvan CreditAttribution: nicxvan commentedComment #143.0
nicxvan CreditAttribution: nicxvan commentedUpdated issue summary.