As it stands, Imagefield saves an uploaded image file with its original filename, character for character. For example:

"this%20file.jpg" is saved as "this%20file.jpg"
"that+file.jpg" is saved as "that+file.jpg"
etc...

Filenames with encoded entities (or %'s in general) pose a problem when trying to access the image via a URL. Currently it can can break the new image upload previews for Imagefield, direct access to the image via url, scaled versions via imagecache, thickboxed versions, etc.

All that is really needed is a little filename cleanup in _imagefield_widget_prepare_form_values() to avoid this.

Right after:

  // Attach new files 
  if ($file = file_check_upload($fieldname . '_upload')) {
    $file = (array)$file;

We can clean things up by adding:

    // urldecode and cleanup the filename get rid of problems
    $file['filename'] = str_replace("%","",urldecode($file['filename']));

It's very simple change to avoid problems down the line and (if nothing else) clean up some very ugly filenames people seem to get when using saved web images.

CommentFileSizeAuthor
#2 imagefield-enc-chars.patch633 bytesMoonshine

Comments

catch’s picture

Status: Reviewed & tested by the community » Needs work

patch needs to be in unified diff format. This change looks sensible though.

Moonshine’s picture

StatusFileSize
new633 bytes

Well hopefully this will do. I've made serveral other changes to imagefield so the line numbering may be off. It's just the simple addition described above.

sun’s picture

Title: Image filenames with encoded characters need cleaning » Encoded chars in filename are not cleaned

Good catch, however, that is not enough. I was a bit shocked, that there is neither such a filename cleanup in Drupal's file.inc/upload.module, nor a function to clean a filename yet. Kind of file_make_url_safe().

Any special char needs to be removed from a filename. For example, try downloading this:
http://testdrupal/files/1 Copy of %58 2% &# in €ur%C3%B6.jpg
(Example link)

Ran some tests:
While %20 seems to be correctly converted and @ or $ seems to be no problem, any other character, especially such as #, & or will break a link pointing to such a file.
Bear in mind that almost any filesystem is able to save files with such filenames. In my tests, Drupal correctly decoded almost all chars but € and ö in front of generating the file. Regarding German umlauts I'm not sure if they were misinterpreted by Apache running on Windows, since I've seen some of our clients using such filenames on production sites running on Unix.
However, characters like # will definitely break the URI (and not the filename).

I wonder if this bug should really be discussed solely for Imagefield. (Tests run with upload.module)
I'd suggest to move this issue to Drupal's file system queue.

sun’s picture

Priority: Normal » Critical
smk-ka’s picture

There is a new module that takes care of transliteration and cleaning of filenames:

Transliterate filenames

sun’s picture

Status: Needs work » Closed (won't fix)

Excellent! file_translit works for me.