Using basename() is not locale safe
OnkelTem - July 4, 2008 - 05:50
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | file system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | drewish |
| Status: | needs work |
Description
Hi,
I'm using D5 and got much troubles yesterday with uploading files with non-latin characters in file names: all of them had been stripping. This behaviour reproduces on both my Debian Etch servers with PHP5 (but not with PHP4).
After investigations I found, that standard PHP functions basename() is in charge for stripping. The comments on php.net conforms this, basename() is not safe:
http://php.net/manual/en/function.basename.php
Passing UTF-8 string like 'Тест.txt' to basename() results to 'txt', cutting off filename.
Looks like there are no appropriate solutions or workarounds and basename() simply should be replaced with alternative.

#1
UPD.
basename() is locale dependent.
For example, this works, no stripping:
setlocale(LC_ALL, 'ru_RU.UTF-8');
$file->filename = trim(basename($_FILES["files"]["name"][$source]), '.');
This doesn't:
setlocale(LC_ALL, 'C');
$file->filename = trim(basename($_FILES["files"]["name"][$source]), '.');
#2
chx ( http://drupal.org/user/9446 ) has created a replacement for basename() called drupal_basename().
Code is:
function drupal_basename($path,$prefix = '') {
$path = preg_replace('|^.+[\\/]|', '', $path);
if ($prefix) {
$path = preg_replace('|'. preg_quote($prefix) .'$|', '', $path);
}
return $path;
}
Function goes to file.inc (is it good)?
My first patch. Replaced all occurences of basename() in 'includes' and standard 'modules' dirs.
This patch is only for current Drupal 5.7 release!!!
#3
This is against HEAD with a little doxygen.
#4
Patch does not apply (patch: **** malformed patch at line 100: Index: includes/locale.inc), but both the diagnostic and the implementation looks sane.
Also: this is a perfect usecase of unit testing. So this shouldn't go in without a good test plan.
#5
Here is a better patch. Damien, can you try writing test cases? Stuff like \0 in file names... maybe make them consisting all of 256 bytes.
#6
subscribing
#7
For paths like "/a/" and "/a///", the proposed drupal_basename() returns the empty string, but PHP's basename() returns "a". Is this intentional?
#8
really needs some documentation for the parameters of drupal_basename. we could just copy and paste from php's docs.
#9
Here's a re-roll that fixes the failed hunk in scripts/run-tests.sh and adds some PHPDocs.
#10
#11
Now that file.test landed I guess we could write some tests for this...
#12
Yes, tests please. This looks like something relatively obscure which would not come up in the course of regular usage. Perfect for a test case.
#13
i'll write some tests once #203204: Uploaded files have the permissions set to 600 gets committed
#14
i think i'd put the wrong issue in the above post... i don't see what the that has to do with this issue. probably one of the other file_test.module patches.
#15
subscribing
#16
subscribing
#17
Patch for D6.13