non-english characters in CSV file

Robrecht Jacques - September 16, 2006 - 08:32
Project:Node import
Version:HEAD
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:duplicate
Description

See: http://drupal.org/node/29574#comment-62792

node_import uses fgetcsv,
which should use the PHP LANG environment variable
(which is en_US.UTF-8, in my case).

However, all non-english (in my case, Hebrew) characters are still imported as question marks.

Any help will be appriciated.

Environment: Drupal 4.6, PHP4.3.11

#1

orionvortex - October 19, 2006 - 06:09

I would also like this feature. I really hope that someone comes up with a workaround soon so that we can import non-english characters.

#2

Robrecht Jacques - June 1, 2007 - 12:32
Status:active» duplicate

Duplicate of import file with greek characters. The way to do it is to save your file in UTF8.

#3

mademarest - October 20, 2007 - 11:13

is there a "for dummies" explanation?

I'm using Excel 2007.

How 'bout a way of doing it without "programming?"

thanks for your help!!

(turkish CSV into Drupal without losing foreign characters...thanks)

#4

netgenius - March 27, 2008 - 11:46

Just to add, I can't get import to work either - accented characters (such as í) cause the string to be truncated at that point. I've tried both 8-bit and Unicode versions of the source file.

http://www.php.net/fgetcsv says:

"Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function."

... Which makes me think that maybe node_import needs to ask what character set the import file is using, and then set LANG accordingly. However, that also implies that a Unicode source ought to work - It doesn't for me. Maybe I need some other PHP settings?

#5

onelittleant - July 12, 2008 - 18:58

Thanks to Robrecht!

I'm on Windows Vista. The solution was to open the .csv file using notepad, then choose save as, and change the encoding to UTF-8. My import data truncation problems were eliminated!

This change worked for strange characters in english content (like slanted double quotes) as well as german characters like ß, ä, etc.

 
 

Drupal is a registered trademark of Dries Buytaert.