When i try to create a directory or upload, browse a file with special characters I receive an error.

For instance when i try to create this directory: Gehäuse
It will create a directory named: gehuse

When i create this directory manually on the server via shell and then try to browse it i will see this: Geh�use

Both on the shell as well as apache can display "Gehäuse" correctly.

Im not sure if this is caused by your module or by PHP itself.

Comments

JamieR’s picture

Files and directory names are run through this function to remove odd characters. This is a problem where I work with bad file names causing problems. But you could comment out the two lines and return $name if you don't like it. That will fix the renaming issue.

/**
 * Returns a filename based on the $name paramater that has been
 * striped of special characters, it's spaces changed to underscores,
 * converted to lower case, and shortened to 14 characters.
 */
function _sanitize_filename($name) {
  $name = preg_replace("/[^[:space:]a-zA-Z0-9*_.-]/", "", $name);
  return substr(str_replace(' ','_',$name),0,20);
}

The other issue of browsing and getting the ? probably has to do with the local character encoding you are running on the webserver. I know drupal is all utf-8 so you might need to run that on the web server. Just a thought. Fileshare isn't doing anything in between there.

Hope that helps!
Jamie.

clandmeter’s picture

Well I'm not sure what it is but its for sure a big problem (atleast for me).

I am running a lamp server based on Ubuntu dapper LTS which is also all UTF-8 but this is just the problem.
I had problems with charset on the console apache... but when i switch my locale to non UTF-8 all is fine again.

And i see that you are having the same problem as well on your site: http://jamieruderman.com/?q=node/240

As the fact that this is drupal specific ive run a test just to see if this is true:

When i create a new page in Drupal with an attachment it doesn't have this problem. I can attach a file called "Périphériques" without problems and download it again.

I also tried to see if this could be PHP specific but when i use "scandir" function in PHP it displays directory's correctly.
My apache is showing these characters correctly.
Drupal is also showing these characters correctly.

One thing which is strange to me is when i upload a file on my site with your module it works without problems but when i try it on your site it doesn't. Creating a directory gives problems both on your site and mine.

To make a summarize:

Listing directory's with special characters creates these strange ? symbols and will generate errors
Creating special character directory's will be filtered and be created without them.
Uploading special character files doesn't give me problems and are not filtered (except on your site).

About the directory listing, could that be a javascript error? Creating the directory could be one of your filters but I'm not sure.

JamieR’s picture

Version: 4.7.x-1.x-dev » 4.7.x-2.0

So I commented out line 606 on jamieruderman.com:
//$name = preg_replace("/[^[:space:]a-zA-Z0-9*_.-]/", "", $name);
and the fileshare uploads and folders I made worked.

Have a look: http://jamieruderman.com/?q=node/240

But ftp did not as you said. However I attribute that to my local machine that created the file being a Mac and not runnin the os in utf-8... but that's just a guess. When it's getting uploaded through the web I believe either the browser or the webserver is doing some character conversion that ftp is not.

I would like this to work without having to remove line 606 however. Do you think you could re-write that to include foreign characters? This is probably a problem for other languages as well.

Thanks! Jamie.

clandmeter’s picture

Version: 4.7.x-2.0 » 4.7.x-1.x-dev

I just did some more tests and found out that when creating directory's the "_sanitize_filename" will strip the special characters (maybe a more appropriate name for this function would be in place?).

So now i can create and upload files with special characters I still have 2 problems.

1. Files/directory's I upload via ftp will not be UTF8 compatible. This means fileshare module cannot read them
2. Files/directory's uploaded via fileshare will look very strange when you login with FTP or console.

Wouldn't it be possible to store these files/directory's in a ISO charset?

JamieR’s picture

I'm not going to try and manage encodings with fileshare. UTF-8 is clearly the best and the future solution to all encoding problems. If this is an issue I would look into changing the encoding of your local system. I'm afraid I don't know anything more. I would just have to start searching on google. I'm sure this isn't an isolated issue.

Jamie.

JamieR’s picture

I was able to set the encoding format to utf-8 in my ftp client (http://www.panic.com/transmit/) and I imagine any other decent ftp client would be able to do the same. Once I changed to utf-8 I was able to upload without having the encoding problems. See http://jamieruderman.com/?q=node/240 the file named "péripécoding_utf8_ftp.txt" was uploaded via ftp.

So there you have it. And once the ftp client is set to utf-8 the files read correctly as well.

Only thing left is to re-write _sanitize_filename() to support all characters... can you help?

clandmeter’s picture

I've looked in my FlashFXP ftp client but cannot find this option. I will try Filezilla later on.
I think also the FTP server should support UTF-8 and what i have checked is that proftpd 1.2.1 (the one im using currently) does not support it. Which ftpd software are you using?

As for the sanitize filenames i can try to find a solution. Could you first tell me why you are filtering all these characters so i know what I am looking for? Is there any security issue regarding filenames?

JamieR’s picture

Status: Active » Fixed

I made the revisions to HEAD and 4.7.x-1.x ... I think... but I'm still trying to figure out how to update the 4.7.x-2.0 download... I hate trying to manage the CVS system... anyway you can see what I did here:
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/fileshare/fil...

I've been told that it is a security risk to allow all characters. I think the blacklist should take care of any problems, as well as avoid any cross platform file name issues as well.

Thanks, Jamie.

clandmeter’s picture

Ive just looked at your commit and I think it will work in my situation.

As for the other problem. I have found an FTP client which can list files/directory's created in UTF8. But when i want to download a file or directory i cannot. This is probably because my ftpd software (proftpd) doesn't (yet) support UTF8. Now I know UTF8 is the way to go and is superior but if lots of programs do not support it? This would mean international users can only use fileshare to access/manage files within Drupal.

Ok so lets just say this is a problem we can't fix, then could we still allow users to upload files via ftp and afterwards convert those files to UTF8, maybe with an option which says "Convert all files in this fileshare to UTF-8?"

Maybe with something like: convmv -r --notest -f iso-8859-1 -t utf8 $directory

Or you could create a function with utf8_encode so you won't need the convmv binary.

Anonymous’s picture

Status: Fixed » Closed (fixed)
druvision’s picture

Project: Fileshare » Drupal core
Version: 4.7.x-1.x-dev » 7.x-dev
Component: Module » language system

This is still an issue, for all uploaded files.

I've opened a new issue for core: Supporting unicode file names

geshan’s picture

I'm getting a similar problem in a RSS being fetched in form of a JS widget. When I test it in a simple php file it works fine but wired characters are displayed in Drupal. I tried removing $scripts and $styles as well but it did not help.

"http://www.handball-welt.de/o.red.c/modules/ hbwnewsbox.php?Gender=1&cstliga=8&cstlang=1&cstlmt=5&width=510"

above is the widget code. works fine in simple php file.