Swedish characters in file names

Anders Östberg - May 14, 2009 - 21:21
Project:Filebrowser
Version:6.x-2.0-rc10
Component:Directory Listing Pages
Category:bug report
Priority:normal
Assigned:Unassigned
Status:postponed (maintainer needs more info)
Description

Swedish characters Å,Ä,Ö,å,ä,ö in file names display as a square.
My web site uses UTF-8, is that perhaps the problem?

#1

zewa - May 15, 2009 - 05:26

No its because the fopen function or its relatives in PHP are written in ANSI standard, not UTF.
There is no real workaround for this, so it's kinda best to rename those files you want to download

#2

Anders Östberg - May 15, 2009 - 07:29

Thanks.
This is a major problem, and if that is how Filebrowser will work I'll have to look for another solution.

#3

zewa - May 15, 2009 - 16:44

at least it used to be how the old version worked.

Greetings
Zewa

#4

Yoran - May 18, 2009 - 17:58
Status:active» postponed (maintainer needs more info)

Yes this is a major problem but I can't reproduce it. I also use UTF8 and french language and I have no problem with our specific characters (for filenames and descriptions).

Can you have a look at the page source code to see if this is a font problem or an encoding problem.

#5

zewa - May 18, 2009 - 19:03

mmhm ... can you tell me what version of PHP you are using Yoran?

I use PHP 5.2.5 + PHP 4.4.8 + PEAR with the Xampp Package 1.6.6a for development.

Greetings
Zewa

#6

Anders Östberg - May 18, 2009 - 19:23

The page's charset is utf-8 and font is Arial. If I force the browser to display the page using "Western European (Windows)" encoding, the national characters are displayed correctly. PHP version is 5.2.9-1.

#7

Anders Östberg - May 19, 2009 - 14:08

Additional info; I tried adding utf8_encode() for the display-name output, and the characters are then displayed correctly, so I would assume this has to do with not converting the file names to utf-8. I couldn't make this work properly for the file url though, I don't know how and where in the code to correctly convert all characters, so I'll have to leave this to the maintainer.

#8

Anders Östberg - May 21, 2009 - 15:43
Version:6.x-2.0-rc9» 6.x-2.0-rc10

Still a problem with rc10

#9

Yoran - June 16, 2009 - 22:11

Sorry for my late anwser.

I tried to understand what was going on and my guess is that you filesystem is not using UTF-8 as filename encoding and PHP readdir is just taking what the filesystem is giving to it, whatever encoding it is. I made a try with EXT3 FileSystem and ISO-8859-15 encoding, and I can reproduce the issue.

So, the problem of you solution (utf8_encode) is that it will not work in any situation as we can have many kind of encoding for any filesystem. Perhaps better solution is to use mb_convert_encoding instead. Can you give this a try :

replace (in filebrowser.module)
     'display-name' => $file_name,
by
     'display-name' => mb_convert_encoding($file_name, "UTF-8","ISO-8859-1"),

If this working, perhaps I can add a new setting "filesystem encoding" with UTF-8 as default. What do you thing ?

#10

Anders Östberg - June 19, 2009 - 16:02

That works but I'm afraid this is only half the solution.
The trickier part is the url, I can't make the links work with any kind of encoding/conversion.

#11

zewa - July 3, 2009 - 07:36

well this part can be handled via path-auto. it can rewrite your url links.
filebrowser than has to recode it to the encoding needed.

Greetings
Zewa

#12

Yoran - July 19, 2009 - 10:41

Well I see :/ I can use some kind of transilteraion but I fear duplicates with this solution.

Perhaps a hash can do the job...

#13

Anders Östberg - July 19, 2009 - 13:06

I'm unfortunately running out of time to get my site into production so I'll have to give up on this now and implement some other Windows/Swedish-specific solution. I'll keep an eye on filebrowser though, it's a great idea and exactly what I need if it only could handle UTF-8 and national characters. Thanks for your efforts in building this and trying to resolve the problem.

#14

XaraX - January 8, 2010 - 15:53

replace (in filebrowser.module)
     'display-name' => $file_name,
by
     'display-name' => mb_convert_encoding($file_name, "UTF-8","ISO-8859-9"),

this workaround works for Turkish characters. as you pointed out in your post, a new setting "filesystem encoding" would be useful IMHO.

thanks Yoran.

#15

Anders Östberg - January 11, 2010 - 15:44

I've tried that, and many similar conversions, but it only solves half the problem. The displayname is correct in the browser, but the link to the file or subdirectory is still incorrect.

 
 

Drupal is a registered trademark of Dries Buytaert.