When I import csv-files with danish characters like "æ ø å" the current character and the rest of the text i that field is gone.

Input filen
"Vi prøver lige igen"

Output
"Vi p"

I have tried to change the textencoding for my csv-file, without luck.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Character encoding must be UTF-8 for the parser to work properly. Care to post the file?

Nchase’s picture

I'm having German characters like öäü. The problem is: whenever I change the encoding from ansi to utf-8 it breaks the node title. It doesn't map it anymore. Changing it back to ansi it maps the title but doesn't import the characters anymore.

nicholasThompson’s picture

I've also had issues with the node title containing double-encoded characters. I ended up having to use decode_entities($value) to tidy them up. I wonder if there is a similar issue with modified characters such as Ü...

PS: Have you tried 6.x-1.x-dev (HEAD)?

Nchase’s picture

Version: 6.x-1.0-alpha9 » 6.x-1.x-dev

I'm using latest dev and the issue is still the same: not mapping title using utf-8 encoded csv.

alex_b’s picture

Title: Danish characters in csv import » CSV import problems with characters like æ ø å ö ä ü

I'm giving this a more general title. A sample CSV file that breaks would help a lot.

steinmb’s picture

Are you sure that the source file (CVS) is in UTF8? I just imported a few Norwegian cvs files that contain æøå in the title-fields and all node created was encoded correctly.

Nchase’s picture

yes, it is utf-8, I opened a new issue for the title mapping: http://drupal.org/node/724080 . Perhaps you'd like to check the file I uploaded at the new issue to confirm that it is utf-8?

Summit’s picture

FileSize
580 bytes

Hi,

Attached an example file, notice "Landhaus Höreageer", this needs to be encoded to the right characters.

Thanks a lot for going into this!
Greetings,
Martijn

mlsamuelson’s picture

Git didn't want to let me apply this patch against 6.x-1.x-dev so I've made it against HEAD instead.

Given that http://drupal.org/node/724080 indicates there isn't actually an issue when UTF-8 is used, I've disregarded that facet of this issue, and attempted to solve the problem of non-UTF-8 encoded files causing issues (mostly with MySQL), by doing a check for when the CSV file is not UTF-8, and then converting it to UTF-8.

I was never able to get mb-convert-encoding() to handle the conversion correctly without it knowing the unknown encoding or providing a number of variants to check against (see http://stackoverflow.com/questions/1002089/get-file-encoding for more on this). As a result I've used utf8_encode() which assumes you're coming from ISO-8859-1 (which I'm guessing in 80% of the cases is what people will be dealing with).

This isn't a perfect fix, but it's a start and perhaps someone with more encoding-fu can suggest changes. In its current state it should ease the pain of those with users uninterested in doing their own UTF-8 conversions.

marco.giaco’s picture

Version: 6.x-1.x-dev » 6.x-1.0-beta10
FileSize
5.78 KB

We are importing from several different encodings, I've been facing the same problem.
Unfortunately I could not trust the results of mb_detect_encoding (it is not telling between ISO-8859-1 and windows-1252).
The only way I found to get around this was to let the user specify the right encoding from the import form and recoding from
the user-selected enconding to utf-8 inside the iterator's next method.
I'm attaching a patch (mado on 6.x-1.0-beta10 version) with this approach.

paolomainardi’s picture

Subscribe

kardave’s picture

Drupal uses UTF8 all the time. Sources may not.

A good solution could be to force the non-UTF8 characters, to be converted into UTF8.
I grabbed a solution from here: http://stackoverflow.com/questions/910793/php-detect-encoding-and-make-e...

I can now import iso8859-2 encoded CSV files like a charm. Finally.

David

Berliner-dupe’s picture

Version: 6.x-1.0-beta10 » 6.x-1.x-dev

Is new information available?

I use dev-version but characters like ä ö and so on will not be imported.

Patch from #9 not help.

CSV is UTF-8.

Can anyone help me?

Regards Matthias

Frank Ralf’s picture

Try adding the following code in line 199 of CSV parser:

$line = iconv('ISO-8859-1', 'UTF-8', $line);

(Taken from http://www.drupalcenter.de/node/40991 )

bgm’s picture

UTF-8 with feeds works for me, but did you test the patch proposed in #1005400: Remove BOM from UTF-8 files in Drupal 6? ?

Berliner-dupe’s picture

#14

Try adding the following code in line 199 of CSV parser:
$line = iconv('ISO-8859-1', 'UTF-8', $line);

Thank you ....

but ... i think this dont work for Feeds-DEV-Version.

franz’s picture

Heine’s picture

#10 is the correct approach (note: did not test patch), because ISO-8859-1 and windows-1252 are not equivalent (eg for €).

The CSV in #8 appears to have been proper UTF-8 that has been converted to UTF-8 again while being interpreted as a single-byte encoding. The source CSV is broken, feeds not (in this case).

franz’s picture

@Heine, feeds is only broken in the issue I mentioned above regarding UTF-8 (multibyte in general)

manuelBS’s picture

twistor’s picture

Status: Active » Closed (outdated)