Swedish characters Å,Ä,Ö,å,ä,ö in file names display as a square.
My web site uses UTF-8, is that perhaps the problem?

Comments

zewa’s picture

No its because the fopen function or its relatives in PHP are written in ANSI standard, not UTF.
There is no real workaround for this, so it's kinda best to rename those files you want to download

Anders Östberg’s picture

Thanks.
This is a major problem, and if that is how Filebrowser will work I'll have to look for another solution.

zewa’s picture

at least it used to be how the old version worked.

Greetings
Zewa

Yoran’s picture

Status: Active » Postponed (maintainer needs more info)

Yes this is a major problem but I can't reproduce it. I also use UTF8 and french language and I have no problem with our specific characters (for filenames and descriptions).

Can you have a look at the page source code to see if this is a font problem or an encoding problem.

zewa’s picture

mmhm ... can you tell me what version of PHP you are using Yoran?

I use PHP 5.2.5 + PHP 4.4.8 + PEAR with the Xampp Package 1.6.6a for development.

Greetings
Zewa

Anders Östberg’s picture

The page's charset is utf-8 and font is Arial. If I force the browser to display the page using "Western European (Windows)" encoding, the national characters are displayed correctly. PHP version is 5.2.9-1.

Anders Östberg’s picture

Additional info; I tried adding utf8_encode() for the display-name output, and the characters are then displayed correctly, so I would assume this has to do with not converting the file names to utf-8. I couldn't make this work properly for the file url though, I don't know how and where in the code to correctly convert all characters, so I'll have to leave this to the maintainer.

Anders Östberg’s picture

Version: 6.x-2.0-rc9 » 6.x-2.0-rc10

Still a problem with rc10

Yoran’s picture

Sorry for my late anwser.

I tried to understand what was going on and my guess is that you filesystem is not using UTF-8 as filename encoding and PHP readdir is just taking what the filesystem is giving to it, whatever encoding it is. I made a try with EXT3 FileSystem and ISO-8859-15 encoding, and I can reproduce the issue.

So, the problem of you solution (utf8_encode) is that it will not work in any situation as we can have many kind of encoding for any filesystem. Perhaps better solution is to use mb_convert_encoding instead. Can you give this a try :

replace (in filebrowser.module)
     'display-name' => $file_name,
by
     'display-name' => mb_convert_encoding($file_name, "UTF-8","ISO-8859-1"),

If this working, perhaps I can add a new setting "filesystem encoding" with UTF-8 as default. What do you thing ?

Anders Östberg’s picture

That works but I'm afraid this is only half the solution.
The trickier part is the url, I can't make the links work with any kind of encoding/conversion.

zewa’s picture

well this part can be handled via path-auto. it can rewrite your url links.
filebrowser than has to recode it to the encoding needed.

Greetings
Zewa

Yoran’s picture

Well I see :/ I can use some kind of transilteraion but I fear duplicates with this solution.

Perhaps a hash can do the job...

Anders Östberg’s picture

I'm unfortunately running out of time to get my site into production so I'll have to give up on this now and implement some other Windows/Swedish-specific solution. I'll keep an eye on filebrowser though, it's a great idea and exactly what I need if it only could handle UTF-8 and national characters. Thanks for your efforts in building this and trying to resolve the problem.

XaraX’s picture

replace (in filebrowser.module)
     'display-name' => $file_name,
by
     'display-name' => mb_convert_encoding($file_name, "UTF-8","ISO-8859-9"),

this workaround works for Turkish characters. as you pointed out in your post, a new setting "filesystem encoding" would be useful IMHO.

thanks Yoran.

Anders Östberg’s picture

I've tried that, and many similar conversions, but it only solves half the problem. The displayname is correct in the browser, but the link to the file or subdirectory is still incorrect.

Yoran’s picture

Status: Postponed (maintainer needs more info) » Fixed

Well, I finally kind of found a way about this national stuff where underlying fs is not UTF-8. Main problem was about urls (as usual) so I added a new db table in order to make an association between each single file and a numeric id.

This way there is no filename problems in URLS. I also reworked the all code in order to introduce an "FS encoding" option in the "folder presentation" group of the node. I tested it a lot all this afternoom on a ISO-8859-15 folder structure with success. Main idea is that UTF8 stay the internal storage encoding and every string are converted when needed before using PHP I/O functions.

Result is pretty stable (see next -DEV tarball or HEAD CVS). Browsing is smooth but more important, the new table is kept synchronized with fs. Later on I think we can use this table to store cached data (with a lifetime), descriptions for files edited by user, etc...

tarball generation is also working and is now using the same channel than plain "private" download. Actually there is no "public" download anymore, everything go through filebrowser/download/{FID}. Unfortunately, I didn't manage to fix the national character problem inside zip files. I'll give a try by using external zip command (on unix at least).

Well, please test and tell me how this version is working for you all.

Anders Östberg’s picture

Not bad at all! :-)

With a presentation encoding of Windows-1252 I can browse files and directories with national characters now - nice job.

A couple of issues:

- I get a warning message each time new files are browsed: "user warning: Field 'path' doesn't have a default value query: INSERT INTO node_dir_listing_content (nid) VALUES (134) in \includes\common.inc on line 3422."

- A downloaded files gets the complete file system path as file name (underline instead of ":", "\" etc)

- National characters in the downloaded file's name are not translated from UTF-8

Yoran’s picture

Good news :)

About issues now :
1/ Can you have a look at this table content, normally all columns should be filled. I don't really like this insert query as it looks like the Drupal schema is not updated.

2/ I'll have a look at this, it should be easy.

3/ Yes I know, this is a problem that I'll not be able to solve soon as I looks like there is an internal issue with PHP/Zip about national characters. We tried with Gozi (#707792: ZIP archive UTF-8 filenames problem) to find the right charset conversion with no success. I think more and more about replacing this dodge library with a call to zip command.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.