Problem/Motivation

Non-US-ASCII characters are stripped out of uploaded filenames. PHP's basename() does not properly support streams or filenames beginning with international characters.

Proposed resolution

Replaces all instances of basename() with wrapper function drupal_basename(). Revised function strips unsafe characters and preserves international characters; includes a check for directory separator characters.

New tests and test changes:

- create, use, save, upload, download, and copy files with international filenames
- generation of URL for file with international filename
- 404 while fetching nonexistant randomly named file including international characters
- correction to recombine chunked headers in DrupalWebTestCase::curlHeaderCallback()

Remaining tasks

Needs review and documentation for drupal_basename().

User interface changes

None.

API changes

Adds drupal_basename().

Original report by [OnkelTem]

Hi,

I'm using D5 and got much troubles yesterday with uploading files with non-latin characters in file names: all of them had been stripping. This behaviour reproduces on both my Debian Etch servers with PHP5 (but not with PHP4).

After investigations I found, that standard PHP functions basename() is in charge for stripping. The comments on php.net conforms this, basename() is not safe:

http://php.net/manual/en/function.basename.php

Passing UTF-8 string like '????.txt' to basename() results to 'txt', cutting off filename.

Looks like there are no appropriate solutions or workarounds and basename() simply should be replaced with alternative.

CommentFileSizeAuthor
#135 blackout2.patch28.94 KBdroplet
#133 blackout.patch28.42 KBdroplet
#133 interdiff-between-D8.patch9 KBdroplet
#132 blackout.patch27.61 KBdroplet
#132 interdiff-between-D8.patch8.39 KBdroplet
#127 chineseFile_1.jpg54.94 KBbiydng
#127 pageNotFound.jpg34.03 KBbiydng
#115 278425-drupal_basename-115.patch28.7 KBdroplet
#115 interdiff.patch5.01 KBdroplet
#115 interdiff.patch9.73 KBdroplet
#114 Screenshot at 2011-12-29 13:54:09.png35.84 KBrogical
#107 278425-drupal_basename12-d7.patch28.58 KBandypost
#106 278425-drupal_basename12.patch28.69 KBandypost
#104 278425-basename-not-locale-safe-104.patch1.86 KBrealityloop
#101 drupal_basename11-d7.8.patch29.18 KBOnkelTem
#90 278425-drupal_basename11-1.patch27.61 KBdroplet
#87 278425-drupal_basename11-d7.patch28.11 KBandypost
#86 278425-drupal_basename11.patch27.54 KBandypost
#74 278425-drupal_basename10-d7.patch29.4 KBandypost
#72 278425-drupal_basename9-d7.patch29.11 KBandypost
#71 278425-drupal_basename8-d7.patch29.06 KBandypost
#66 278425-drupal_basename7-d7.patch29.62 KBandypost
#62 278425-drupal_basename7-d7.patch29.56 KBandypost
#60 278425-drupal_basename7-d7.patch29.65 KBandypost
#58 278425-drupal_basename6debug-d7.patch31.88 KBandypost
#56 278425-drupal_basename6-d7.patch29.94 KBandypost
#55 278425-drupal_basename6-d7.patch29.94 KBandypost
#49 278425-drupal_basename5-d7.patch27.66 KBandypost
#47 278425-drupal_basename4-d7.patch26.94 KBandypost
#46 278425-drupal_basename3-d7.patch26.17 KBandypost
#41 278425-drupal_basename2-d7.patch26.42 KBandypost
#39 278425-drupal_basename-d7.patch22.21 KBandypost
#35 278425-basename-4.patch22.2 KBandypost
#28 basename-3.patch19.78 KBkotnik
#25 basename-2.patch29.33 KBc960657
#23 basename-1.patch30.4 KBc960657
#17 drupal_6_basename.patch7.76 KBOnkelTem
#9 file_278425.patch13.71 KBdrewish
#5 drupal_basename.patch13.09 KBchx
#3 drupal_basename.patch13.05 KBchx
#2 basename_replacement_includes.patch3.5 KBOnkelTem
#2 basename_replacement_modules.patch2.99 KBOnkelTem
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

OnkelTem’s picture

UPD.

basename() is locale dependent.

For example, this works, no stripping:

setlocale(LC_ALL, 'ru_RU.UTF-8');
$file->filename = trim(basename($_FILES["files"]["name"][$source]), '.');

This doesn't:

setlocale(LC_ALL, 'C');
$file->filename = trim(basename($_FILES["files"]["name"][$source]), '.');

OnkelTem’s picture

chx ( http://drupal.org/user/9446 ) has created a replacement for basename() called drupal_basename().

Code is:

function drupal_basename($path,$prefix = '') {
$path = preg_replace('|^.+[\\/]|', '', $path);
if ($prefix) {
$path = preg_replace('|'. preg_quote($prefix) .'$|', '', $path);
}
return $path;
}

Function goes to file.inc (is it good)?

My first patch. Replaced all occurences of basename() in 'includes' and standard 'modules' dirs.

This patch is only for current Drupal 5.7 release!!!

chx’s picture

Status: Active » Needs review
FileSize
13.05 KB

This is against HEAD with a little doxygen.

Damien Tournoud’s picture

Status: Needs review » Needs work

Patch does not apply (patch: **** malformed patch at line 100: Index: includes/locale.inc), but both the diagnostic and the implementation looks sane.

Also: this is a perfect usecase of unit testing. So this shouldn't go in without a good test plan.

chx’s picture

Status: Needs work » Needs review
FileSize
13.09 KB

Here is a better patch. Damien, can you try writing test cases? Stuff like \0 in file names... maybe make them consisting all of 256 bytes.

drewish’s picture

subscribing

c960657’s picture

For paths like "/a/" and "/a///", the proposed drupal_basename() returns the empty string, but PHP's basename() returns "a". Is this intentional?

drewish’s picture

really needs some documentation for the parameters of drupal_basename. we could just copy and paste from php's docs.

drewish’s picture

Title: Using basename() is not safe » Using basename() is not local safe
FileSize
13.71 KB

Here's a re-roll that fixes the failed hunk in scripts/run-tests.sh and adds some PHPDocs.

Damien Tournoud’s picture

Title: Using basename() is not local safe » Using basename() is not locale safe
drewish’s picture

Now that file.test landed I guess we could write some tests for this...

webchick’s picture

Status: Needs review » Needs work

Yes, tests please. This looks like something relatively obscure which would not come up in the course of regular usage. Perfect for a test case.

drewish’s picture

Assigned: Unassigned » drewish
Status: Needs work » Postponed

i'll write some tests once #203204: Uploaded files have the permissions set to 600 gets committed

drewish’s picture

Status: Postponed » Needs work

i think i'd put the wrong issue in the above post... i don't see what the that has to do with this issue. probably one of the other file_test.module patches.

munzirtaha’s picture

subscribing

hapydoyzer’s picture

subscribing

OnkelTem’s picture

FileSize
7.76 KB

Patch for D6.13

Jean-Philippe Fleury’s picture

subscribing

Gozi’s picture

The early pasted and linked chx's safe baseneme function not fully compatible the PHP original basename. I wrote own UTF-8 safe basename function, that works without problem.

function basename_safe($path) {
        $path = rtrim($path,'/');
        $path = explode('/',$path);
        return end($path);
}
c960657’s picture

Assigned: drewish » Unassigned
Status: Needs work » Active

This should probably be combined with the new dirname() implementation in #701358: The file API presumes a hiearchical file storage.

Note that on Windows, dirname() also treats backslash as a directory separator.

Ivan Simonov’s picture

GoofyX’s picture

Subscribing...

c960657’s picture

Status: Active » Needs review
FileSize
30.4 KB

New shot.

bloodmud’s picture

#17: drupal_6_basename.patch queued for re-testing.

c960657’s picture

FileSize
29.33 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch, basename-2.patch, failed testing.

kotnik’s picture

kotnik’s picture

FileSize
19.78 KB

Okay, here's reworked patch from #25 that applies to current HEAD.

kotnik’s picture

Status: Needs work » Needs review

Marking for review.

Status: Needs review » Needs work

The last submitted patch, basename-3.patch, failed testing.

andypost’s picture

Damien Tournoud’s picture

@andypost: no, we don't want to rely on the broken locale feature.

kotnik’s picture

For the hell of it, I simply can't figure out why did the patch from #28 failed MimeType test.

Anybody got a clue?

andypost’s picture

@kotnik test falls because simpletest stream wrapper is not registered... strange. Broken function file_get_mimetype()
You are using file_uri_target() in drupal_basename() with none-existent schema (simpletest://)

Try to run File API - File mimetypes test locally
Call to a member function getTarget() on a non-object

EDIT: found strange simpletest_test_stream_wrappers() in simpletest.module

andypost’s picture

Status: Needs work » Needs review
FileSize
22.2 KB

Changed function file_uri_target() to work with DrupalLocalStreamWrapper class because getTarget() method could not be implementation specific so we can use it's static implementation

kotnik’s picture

Andypost, thanks for explanation and for fixing the patch.

andypost’s picture

Priority: Normal » Major

Suppose this major because drupal used in multilingual sites

sun’s picture

Status: Needs review » Reviewed & tested by the community
+++ includes/file.inc	19 Jul 2010 06:13:51 -0000
@@ -1948,6 +1955,32 @@ function drupal_dirname($uri) {
+    $uri = preg_replace('@'. preg_quote($suffix) .'$@', '', $uri);

(minor) Wrong string concatenation spacing here.

Aside from that, this patch pretty much simply replaces every basename() with with drupal_basename() throughout core. Bonus points for @ingroup php_wrappers.

Thus, RTBC.

Powered by Dreditor.

andypost’s picture

Re-roll against current HEAD with fixed string concatenation pointed in #38

sun’s picture

Status: Reviewed & tested by the community » Needs work

Dang. webchick correctly asked for tests for drupal_basename().

andypost’s picture

Status: Needs work » Needs review
FileSize
26.42 KB

Added non-latin prefix for generated filename to FileTestCase::createFile()
This prefix contains spaces to make sure that all file-related tests work fine.

$filepath = 'Файл для тестирования ' . $this->randomName(); // Russian, means 'file for testing'

Also I change a FileSaveDataTest::testWithFilename() to use non-latin file name because this issue starts

$filename = 'Текстовый файл.txt'; // Russian, 'text file.txt'

Changed FileTokenReplaceTestCase::testFileTokenReplacement() to test non-latin names

Fixed RetrieveFileTestCase::testFileRetrieving() because non-latine names returned as urlencoded and this also related to this issue

Damien Tournoud’s picture

This looks very good, but I don't understand this change:

 function file_uri_target($uri) {
   if ($scheme = file_uri_scheme($uri)) {
-    return file_stream_wrapper_get_instance_by_scheme($scheme)->getTarget($uri);
+    if ($wrapper = file_stream_wrapper_get_instance_by_uri($uri)) {
+      return $wrapper->getTarget($uri);
+    }
+    else {
+      // getTarget() is not implementation specific, so we can directly
+      // call it without an instance.
+      return DrupalLocalStreamWrapper::getTarget($uri);
+    }
   }
   return FALSE;
 }

The initial code looks correct to me.

drewish’s picture

does this intersect with #858528: file_uri_target() purpose is unclear at all?

andypost’s picture

@Damien Tournoud, This is a same as file_get_mimetype()
So it's just fallback to base implementation if no wrapper-class defined

Should we make getTarget() is static method?

For example FileMimeTypeTest::testFileMimeTypeDetection() is using simpletest wrapper

  public function testFileMimeTypeDetection() {
    $prefix = 'simpletest://';

Which has no implementation, so we should use Base class implementation

Status: Needs review » Needs work

The last submitted patch, 278425-drupal_basename2-d7.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
26.17 KB

After talking with DamZ on IRC we decide to change schema in testFileMimeTypeDetection() to 'public://' because 'simpletest://' has no implementation and file_uri_target() leave as is.

andypost’s picture

Fixed broken test, non-latin names should be encoded.

Status: Needs review » Needs work

The last submitted patch, 278425-drupal_basename4-d7.patch, failed testing.

andypost’s picture

New strange headers appears $11 and $15 ' =?UTF-8?B?V0k=?=" '

array ( 0 => 'HTTP/1.1 200 OK ', 1 => 'Date: Thu, 29 Jul 2010 05:24:39 GMT ', 2 => 'Server: Apache/2.2.9 (Debian) PHP/5.2.13 ', 3 => 'X-Powered-By: PHP/5.2.13 ZendServer/5.0 ', 4 => 'Set-Cookie: ZDEDebuggerPresent=php,phtml,php3; path=/ ', 5 => 'Expires: Sun, 19 Nov 1978 05:00:00 GMT ', 6 => 'Last-Modified: Thu, 29 Jul 2010 05:24:39 +0000 ', 7 => 'Cache-Control: private ', 8 => 'ETag: "1280381079" ', 9 => 'Content-Length: 87 ', 10 => 'Content-Disposition: inline; filename="=?UTF-8?B?0KTQsNC50Lsg0LTQu9GPINGC0LXRgdGC0LjRgNC+0LLQsNC90LjRjyBtSmE3Nk4=?= ', 11 => ' =?UTF-8?B?V0k=?=" ', 12 => 'x-foo: Bar ', 13 => 'Vary: Accept-Encoding ', 14 => 'Content-Type: text/plain; name="=?UTF-8?B?0KTQsNC50Lsg0LTQu9GPINGC0LXRgdGC0LjRgNC+0LLQsNC90LjRjyBtSmE3Nk4=?= ', 15 => ' =?UTF-8?B?V0k=?=" ', 16 => ' ', )

So explode() alerts notice so I add ':' if header have no ':'

        if (strpos($header, 'HTTP/') === 0) {
          $name = ':status';
          $value = $header;
        }
        else {
+          if (strpos($header, ':') === FALSE) {
+            $header .= ':';
+          }
           list($name, $value) = explode(':', $header, 2);
           $name = strtolower($name);        }
andypost’s picture

Status: Needs work » Needs review

Changing status

Damien Tournoud’s picture

That's because the mime encoder is splitting the header in two lines :(

mikeytown2’s picture

fyi; another one that doesn't work right is "drupal_realpath". Found that out via lots of tests to the test bot.
Look at the previous patches in here: http://drupal.org/node/818818#comment-3116590

andypost’s picture

We should find culprit of this headers
headers definition http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

andypost’s picture

This caused by mime_header_encode() which splits header on chunks but header function does not join them back

andypost’s picture

Finally fixed version.

- Fixed DrupalWebTestCase::curlHeaderCallback() to join chunks
- Added test to check filename transfered correctly via private download method (this one uses mime_header_decode to make sure that header encoded correctly)

andypost’s picture

Fixed tab character in DrupalWebTestCase::curlHeaderCallback()

Status: Needs review » Needs work

The last submitted patch, 278425-drupal_basename6-d7.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
31.88 KB

Same patch with debug

Status: Needs review » Needs work

The last submitted patch, 278425-drupal_basename6debug-d7.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
29.65 KB

Suppose now should work, first header "HTTP/" should be excluded and end of headers detected with trim()

PS same patch for simpletest module #869408: curlHeaderCallback() should join headers if no colon found

Damien Tournoud’s picture

Status: Needs review » Needs work

The criteria for a follow-up header is not "it doesn't contain ':'", but "it starts with at least one SP or HT".

andypost’s picture

Status: Needs work » Needs review
FileSize
29.56 KB

Changed comment and condition

dhthwy’s picture

Status: Needs review » Needs work

Nice patch. Here's a few problems I found:

+ * Gets the filename from a given path.
+ *
+ * PHP's basename() does not properly support streams or filenames beginning
+ * with a non-US-ASCII character.
+ * @see http://bugs.php.net/bug.php?id=37738
+ *
+ * @see basename()
+ * @ingroup php_wrappers
+ */

This should probably read:

+ * Gets the filename from a given path.
+ *
+ * PHP's basename() does not properly support streams or filenames beginning
+ * with a non-US-ASCII character.
+ *
+ * @see http://bugs.php.net/bug.php?id=37738
+ * @see basename()
+ *
+ * @ingroup php_wrappers
+ */
// Headers starts with HTTP version and ends with empty line. They could be
+    // splited on chunks so they should be joined while been receiced. Details
+    // in rfc2616 http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

Alternative:

// Headers start with an HTTP version and ends with an empty line. They could be
+    // split in chunks so they should be joined while being received. Details
+    // in rfc2616 http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
+      // Prefix with non-latin characters used to ensure that all file-related
+      // tests works with international filenames.

Alternative:

+      // Prefix with non-latin characters to ensure that all file-related
+      // tests work with international filenames.
+    // URLs could not contains characters outside the ASCII set so has to be
+    // encoded.

Alternative:

+    // URLs can't contain characters outside the ASCII set so $filename has to be
+    // encoded.
+    // Test the filename transfered correclly.

Alternative:

+    // Test that the file transfered correctly.
dhthwy’s picture

Status: Needs work » Needs review

dang I was hoping I'd get that in before you rolled another one.

Damien Tournoud’s picture

Getting closer!

-    $this->headers[] = $header;
+    // Header fields can be extended over multiple lines by preceding each
+    // extra line with at least one SP or HT. They should be joined on receive.
+    // Details are in RFC2616 section 4.
+    if ($header[0] == ' ' || $header[0] == "\t") {
+      $this->headers[] = array_pop($this->headers) . $header;
+    }
+    else {
+      $this->headers[] = $header;
+    }

This needs to read array_pop($this->headers) . ' ' . trim($header); and the other should be trim($header) We need to normalize the whitespace here, because:

"Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT."

"Any LWS that occurs between field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream."

andypost’s picture

@dhthwy, thanx a lot for wording!

@Damien Tournoud, Fixed, but not sure that we need a whitespace normalization here because mime_header_decode() cares about it.

andypost’s picture

#66: 278425-drupal_basename7-d7.patch queued for re-testing.

andypost’s picture

andypost’s picture

#66: 278425-drupal_basename7-d7.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 278425-drupal_basename7-d7.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
29.06 KB

Re-roll against HEAD

andypost’s picture

Another round, missed 2 hunks

Status: Needs review » Needs work

The last submitted patch, 278425-drupal_basename9-d7.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
29.4 KB

Changed a test testPrivateFileTransfer() to check the contents of file against checking headers.

andypost’s picture

Maybe better to raise this to critical bacause none-ascii file uploads still broken

c960657’s picture

The latest reroll no longer checks the Content-Disposition header. Is that intentional?

andypost’s picture

While debuging tests I found that this header is not send on test run. You can see this in #72 testing results

wwwwoicn’s picture

Subscribing

droplet’s picture

Status: Needs review » Reviewed & tested by the community

- upload non-latin filenames, worked
- download, worked
- display, worked
- unlink, worked

patch with lots hunks but succeeded apply, should it Re-roll ?

andypost’s picture

Probably this could go into 7.1

catch’s picture

Version: 7.x-dev » 8.x-dev
Issue tags: +Needs backport to D7
chx’s picture

I have reopened http://bugs.php.net/bug.php?id=37738 but I do not hope for a quick resolution there and if there would be one it'd take years before we could use that PHP version (remember, we dropped to 5.2.4-ubuntu-patched as the minimal PHP from 5.2.5...) so yeah let's go with this version.

Damien Tournoud’s picture

Status: Reviewed & tested by the community » Needs work
+  if (file_uri_scheme($uri)) {
+    $uri = file_uri_target($uri);
+  }

I don't see the point of this. Could we please remove it?

c960657’s picture

@Damien: The purpose is to make sure that drupal_basename('public://') returns '', not 'public:'.

Damien Tournoud’s picture

I see. We could really use unit tests here.

andypost’s picture

Status: Needs work » Needs review
FileSize
27.54 KB

Reroll against D8 with removal of trim suggested by Damien

@Damien Any ideas for unit tests? Default basename() implementation is not trimming scheme so fixed

  $file = 'public://aa.txt';
  print 'basename:' . basename($file) . ".\n";
// returns - basename:aa.txt.
  $file = 'public://';
  print 'basename:' . basename($file) . ".\n";
// returns - basename:public:.

@chx Another duplicate of php bug http://bugs.php.net/bug.php?id=37268

andypost’s picture

Same patch for D7, additionally fixes hunk within user.install

#86 and #87 has 2 new hunks at modules/update/update.module

andypost’s picture

Any reviews? This bug is really annoying for me

sun’s picture

Status: Needs review » Needs work
+++ b/includes/file.inc
@@ -2210,6 +2210,30 @@ function drupal_dirname($uri) {
+function drupal_basename($uri, $suffix = NULL) {
+  $separators = '/';
+  if (DIRECTORY_SEPARATOR != '/') {
+    $separators .= DIRECTORY_SEPARATOR;
+  }
+  $uri = rtrim($uri, $separators);
+  $uri = preg_match('@[^' . preg_quote($separators, '@') . ']+$@', $uri, $matches) ? $matches[0] : '';
+  if ($suffix) {
+    $uri = preg_replace('@' . preg_quote($suffix) . '$@', '', $uri);
+  }
+  return $uri;
+}

Took me some good minutes to figure out what is actually being done here, and why.

Can we add three inline comments, please? (one comment per atomic operation block; separator setup + trim, basename, suffix)

Also, the ternary preg_match() logic is extremely compact. Nice trick -- but when writing this as a regular if control structure, then we could skip the additional $suffix operation if $uri is '' empty anyway.

5 days to next Drupal core point release.

droplet’s picture

reroll only.

xlsoul’s picture

Awesome! That PATCH is ok for my DRUPAL7, Thankyou!

But I have to say it is a very important BUG in DRUPAL, and It must be pay attention to.

webchick’s picture

Status: Needs work » Needs review

Marking needs review.

xjm’s picture

Tagging issues not yet using summary template.

097633’s picture

#74: 278425-drupal_basename10-d7.patch queued for re-testing.

andypost’s picture

Status: Needs review » Needs work

needs docs and #89

droplet’s picture

It's another issue.

While your file system / OS isn't UTF-8, you will failed to upload a file with any UTF-8 characters.

eg.
After patched:
€ = failed.
€-abc.png = failed.

Before patched
€ = failed.
€-abc.png = okay. (renamed to -abc.png)

Probably it will got error on Windows.

catch’s picture

I don't see any mention of http://drupal.org/project/transliteration on this issue, I've not used that module before but it exists specifically for this problem afaik.

droplet’s picture

transliteration converted filename to US-ASCII characters.

andypost’s picture

Without transliteration module we can't get actual filename if none-ascii letters are used in it. With transliteration we never get actual filename too :(

EDIT: actual means the original filename of file been uploaded.

This mostly usefull when filename is descriptive and has more than one word, without this patch I cant manage none-ascii filenames because core returns them trimmed to last word

aaron’s picture

subscribe

OnkelTem’s picture

FileSize
29.18 KB

Updated patch from #87 for current stable D7 release - 7.8. Two hunks was failing on some reason.

Caligan’s picture

Added issue summary.

andypost’s picture

Thanx for Summary, @Marie Wendel

Patch needs re-roll for D8 'cos core now lives in core folder

realityloop’s picture

Status: Needs work » Needs review
FileSize
1.86 KB

rerolled for /core

Status: Needs review » Needs work

The last submitted patch, 278425-basename-not-locale-safe-104.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
28.69 KB

Added inline comments inside drupal_basename() (needs review and probably correction)

Added hunk for new image width test

PS: D7.9 patch also needs re-roll

andypost’s picture

Status: Needs review » Needs work
FileSize
28.58 KB

D7 re-roll

andypost’s picture

Status: Needs work » Needs review

Actually needs review

droplet’s picture

The patch is okay but as my comment #96, it still failed with non-UTF8 system. While upload is failed, we get following message:

File upload error. Could not move uploaded file.

For end users, they don't know why it can't move. I think we could suggest users to rename and try again.

andypost’s picture

Another way is to store filenames as hash but this will require to bootstrap core to retirn public files

chx’s picture

While this patch looks OK (didnt review deeply so not RTBC'ing) the "If your file system / OS isn't UTF-8, uploading a file with any >127 UTF-8 characters fails." is a separate issue.

droplet’s picture

#106: 278425-drupal_basename12.patch queued for re-testing.

chx’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

+ $filename = preg_replace('@' . preg_quote($suffix) . '$@', '', $filename);

should be preg_quote($suffix, '@'). There's a drupal_web_test_case.php hunk in the patch which looks good but a separate issue. And, it needs tests or discussion why we can't test this (locale dependency might stop testability).

rogical’s picture

patched in #107 tests well, test

can't wait to remove the transliterlation.

droplet’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
9.73 KB
5.01 KB
28.7 KB

reroll +@113

we already added non-latin tests.

+++ b/core/modules/simpletest/tests/file.testundefined
@@ -2220,11 +2222,14 @@ class FileSaveDataTest extends FileHookTestCase {
+    $filename = 'Текстовый файл.txt';

+++ b/core/modules/system/system.testundefined
@@ -2240,7 +2240,7 @@ class RetrieveFileTestCase extends DrupalWebTestCase {
+    $filename = 'Файл для тестирования ' . $this->randomName();

non-latin test

-1 days to next Drupal core point release.

droplet’s picture

I cancelled one interdiff but also showed. 2 interdiff are same but with diff format.

Status: Needs review » Needs work

The last submitted patch, interdiff.patch, failed testing.

droplet’s picture

Status: Needs work » Needs review
gghchem’s picture

Version: 8.x-dev » 7.10
Assigned: Unassigned » gghchem
Priority: Major » Critical

I am using windows xp sp3 and drupal 7.10,after patched the patch 278425-drupal_basename12-d7.patch,I find it's not works well.
sometimes I can upload a file and always i cannot upload the files with chinese names.It always shown:
Warning: move_uploaded_file(E:\xampp\htdocs\eee\sites\default\files\bee/北京市西城区2008—2009学年度初三年级第一学期期末测试化学试卷.doc) [function.move-uploaded-file]: failed to open stream: Invalid argument 在 drupal_move_uploaded_file() (行 1597 在 E:\xampp\htdocs\eee\includes\file.inc).
Warning: move_uploaded_file() [function.move-uploaded-file]: Unable to move 'E:\xampp\tmp\php6F.tmp' to 'E:\xampp\htdocs\eee\sites\default\files\bee/北京市西城区2008—2009学年度初三年级第一学期期末测试化学试卷.doc' 在 drupal_move_uploaded_file() (行 1597 在 E:\xampp\htdocs\eee\includes\file.inc).
文件上传错误。无法移动上传的文件。
I had asked so many pepole to help me,but the problem is still on.

aspilicious’s picture

Version: 7.10 » 8.x-dev
Assigned: gghchem » Unassigned
Priority: Critical » Major

1) You only should assign bugs to you when you're going to fix them
2) Please leave as major...
3) This needs to be fixed in D8 first, will be backported soon after

If you could install an empty D8 site and test the patch in #115 chances are bigger this will get commited to D7 soon.

rogical’s picture

Version: 8.x-dev » 7.10
Priority: Major » Critical

see your path: E:\xampp\htdocs\eee\sites\default\files\bee/北京市西城区2008—2009学年度初三年级第一学期期末测试化学试卷.doc

this is not correct, but I just confused why it outputs an error path.

rogical’s picture

Version: 7.10 » 8.x-dev

sorry, this is an cocurrent posting.

aspilicious’s picture

Ow gghchem you have problems because your temporary folder is incorrect. You should change it in the file system settings to temp or something like that (without '/').

aspilicious’s picture

Priority: Critical » Major

Still not critical...

droplet’s picture

Read from #96, actually it doesn't support Windows very well. (maybe Windows Sever works). anyone try to test with Mac OS ?

(It's a well-known PHP problem. PHP6 will has a better support of UTF8)

biydng’s picture

Version: 8.x-dev » 7.10
Category: bug » support
Priority: Major » Critical
Issue tags: -API addition, -Needs backport to D7
FileSize
34.03 KB
54.94 KB

I am using windows 7 Home and drupal 7.10. and ran the patch 278425-drupal_basename-115.patch. When I tried to upload a file with Chinese file name to a node, the file name appeared on the screen and the one saved in database are all in correct chinese characters, but the actual file saved in the public file directory is gibberish: for example 智慧.pps instead of 智慧.pps. This caused the "Page not found" error when I view the node and clicked on the file attachment link. The link path has the correct url encoded path and Chinese character filename. See the first two screenshots attached.

I have a feeling it is my server or php module that is messing up the encoding of the filename. I am using PHP 5.2.17 and apache 2.2 and I have set default charset to utf-8 in both. I noticed that when I opened the Apache online documentation, it is also in that weird encoding. Please help! Thanks!

tim.plunkett’s picture

Version: 7.10 » 8.x-dev
Category: support » bug
Priority: Critical » Major
Issue tags: +API addition, +Needs backport to D7

#127, please read #121

droplet’s picture

stay away Windows. or save filename as transliteration/hash and a preprocess script to restore original file name during download. even if you use Linux, the latter is the best IMO.

我們先解決 Drupal 8,再修正 Drupal 7,不要變更主題狀態 :)
在 Windows 之下很容易出錯,盡量避免使用 Windows
或者自已寫一段程式,將檔案名以音譯或 HASH 保存,在下載時才換回正確檔案名 ( 像 Discuz 這類的處理方法 )

#113 code reviewed ?
#114 test reviewed. + myself.

ready to go ?

tim.plunkett’s picture

Status: Needs review » Reviewed & tested by the community

This looks good to me. The comments make sense, the function makes sense, and the replacements are good.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

OK read through this, code has had lots of review, most of the patch is find/replace and there's added test coverage for the bug.

Committed/pushed to 8.x., moving to 7.x for backport. This will need a change notice once it's been committed to 7.x (or someone could do that now too of course).

droplet’s picture

Status: Patch (to be ported) » Needs review
FileSize
8.39 KB
27.61 KB
droplet’s picture

missed one file.

oriol_e9g’s picture

is not needed here?

user.install

$file->filename = basename($file->uri);
droplet’s picture

FileSize
28.94 KB

I think all file management stuff are needed

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Manually checked all places in D7 for #135 patch - except extended comments for introduced drupal_basename() and small fix for #113 it's the same as #107

we really need change notice which could be compiled from issue summary and http://bugs.php.net/bug.php?id=37738

Great to see patch commited before upcoming 7.11 release

webchick’s picture

Title: Using basename() is not locale safe » Change notice for: Using basename() is not locale safe
Category: bug » task
Priority: Major » Critical
Status: Reviewed & tested by the community » Active
Issue tags: -Needs backport to D7 +Needs change record

Read through the patch and the comments here. Lots of good work. It's unfortunate we need to do an API change for this. :( But according to https://bugs.php.net/bug.php?id=37738, this is not getting fixed in PHP anytime soon.

Committed and pushed to 7.x.

We need a change notice for this, since this will affect contributed modules.

chx’s picture

Title: Change notice for: Using basename() is not locale safe » Using basename() is not locale safe
Category: task » bug
Status: Active » Fixed
nicxvan’s picture

Assigned: Unassigned » nicxvan

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

migimaru’s picture

Is there a backport for D6?
If not, is it safe to simply copy the drupal_basename() function from the above patch and then change all uses of basename() in all files to drupal_basename()?

xjm’s picture

Priority: Critical » Major
Issue tags: -Needs change record
nicxvan’s picture

Assigned: nicxvan » Unassigned
nicxvan’s picture

Issue summary: View changes

Updated issue summary.