Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
When I import csv-files with danish characters like "æ ø å" the current character and the rest of the text i that field is gone.
Input filen
"Vi prøver lige igen"
Output
"Vi p"
I have tried to change the textencoding for my csv-file, without luck.
Comment | File | Size | Author |
---|---|---|---|
#10 | feed_csv_parser_encoding.patch | 5.78 KB | marco.giaco |
#9 | feeds-special-characters-in-non-utf8-csv-parser-704532.patch | 1.35 KB | mlsamuelson |
#8 | Feeds-csv-example.txt | 580 bytes | Summit |
Comments
Comment #1
alex_b CreditAttribution: alex_b commentedCharacter encoding must be UTF-8 for the parser to work properly. Care to post the file?
Comment #2
Nchase CreditAttribution: Nchase commentedI'm having German characters like öäü. The problem is: whenever I change the encoding from ansi to utf-8 it breaks the node title. It doesn't map it anymore. Changing it back to ansi it maps the title but doesn't import the characters anymore.
Comment #3
nicholasThompsonI've also had issues with the node title containing double-encoded characters. I ended up having to use decode_entities($value) to tidy them up. I wonder if there is a similar issue with modified characters such as Ü...
PS: Have you tried 6.x-1.x-dev (HEAD)?
Comment #4
Nchase CreditAttribution: Nchase commentedI'm using latest dev and the issue is still the same: not mapping title using utf-8 encoded csv.
Comment #5
alex_b CreditAttribution: alex_b commentedI'm giving this a more general title. A sample CSV file that breaks would help a lot.
Comment #6
steinmb CreditAttribution: steinmb commentedAre you sure that the source file (CVS) is in UTF8? I just imported a few Norwegian cvs files that contain æøå in the title-fields and all node created was encoded correctly.
Comment #7
Nchase CreditAttribution: Nchase commentedyes, it is utf-8, I opened a new issue for the title mapping: http://drupal.org/node/724080 . Perhaps you'd like to check the file I uploaded at the new issue to confirm that it is utf-8?
Comment #8
Summit CreditAttribution: Summit commentedHi,
Attached an example file, notice "Landhaus Höreageer", this needs to be encoded to the right characters.
Thanks a lot for going into this!
Greetings,
Martijn
Comment #9
mlsamuelson CreditAttribution: mlsamuelson commentedGit didn't want to let me apply this patch against 6.x-1.x-dev so I've made it against HEAD instead.
Given that http://drupal.org/node/724080 indicates there isn't actually an issue when UTF-8 is used, I've disregarded that facet of this issue, and attempted to solve the problem of non-UTF-8 encoded files causing issues (mostly with MySQL), by doing a check for when the CSV file is not UTF-8, and then converting it to UTF-8.
I was never able to get mb-convert-encoding() to handle the conversion correctly without it knowing the unknown encoding or providing a number of variants to check against (see http://stackoverflow.com/questions/1002089/get-file-encoding for more on this). As a result I've used utf8_encode() which assumes you're coming from ISO-8859-1 (which I'm guessing in 80% of the cases is what people will be dealing with).
This isn't a perfect fix, but it's a start and perhaps someone with more encoding-fu can suggest changes. In its current state it should ease the pain of those with users uninterested in doing their own UTF-8 conversions.
Comment #10
marco.giaco CreditAttribution: marco.giaco commentedWe are importing from several different encodings, I've been facing the same problem.
Unfortunately I could not trust the results of mb_detect_encoding (it is not telling between ISO-8859-1 and windows-1252).
The only way I found to get around this was to let the user specify the right encoding from the import form and recoding from
the user-selected enconding to utf-8 inside the iterator's next method.
I'm attaching a patch (mado on 6.x-1.0-beta10 version) with this approach.
Comment #11
paolomainardi CreditAttribution: paolomainardi commentedSubscribe
Comment #12
kardave CreditAttribution: kardave commentedDrupal uses UTF8 all the time. Sources may not.
A good solution could be to force the non-UTF8 characters, to be converted into UTF8.
I grabbed a solution from here: http://stackoverflow.com/questions/910793/php-detect-encoding-and-make-e...
I can now import iso8859-2 encoded CSV files like a charm. Finally.
David
Comment #13
Berliner-dupe CreditAttribution: Berliner-dupe commentedIs new information available?
I use dev-version but characters like ä ö and so on will not be imported.
Patch from #9 not help.
CSV is UTF-8.
Can anyone help me?
Regards Matthias
Comment #14
Frank Ralf CreditAttribution: Frank Ralf commentedTry adding the following code in line 199 of CSV parser:
$line = iconv('ISO-8859-1', 'UTF-8', $line);
(Taken from http://www.drupalcenter.de/node/40991 )
Comment #15
bgm CreditAttribution: bgm commentedUTF-8 with feeds works for me, but did you test the patch proposed in #1005400: Remove BOM from UTF-8 files in Drupal 6? ?
Comment #16
Berliner-dupe CreditAttribution: Berliner-dupe commented#14
Thank you ....
but ... i think this dont work for Feeds-DEV-Version.
Comment #17
franzIs this related? #1487670: CSV Parser: incorrect UTF-8 interpretation (looks like limited number of imported fields) - Could you try the patch on this issue?
Comment #18
Heine CreditAttribution: Heine commented#10 is the correct approach (note: did not test patch), because ISO-8859-1 and windows-1252 are not equivalent (eg for €).
The CSV in #8 appears to have been proper UTF-8 that has been converted to UTF-8 again while being interpreted as a single-byte encoding. The source CSV is broken, feeds not (in this case).
Comment #19
franz@Heine, feeds is only broken in the issue I mentioned above regarding UTF-8 (multibyte in general)
Comment #20
manuelBS CreditAttribution: manuelBS at Bright Solutions GmbH commentedIsn't this related to #1428272: Added support of encoding conversions to the CSV Parser
Comment #21
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedPerhaps other related issues:
#1283512: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\x8Es, fi...' for column 'field_product_description_value' at row
#2236589: Importing CSV with special characters gives errors
Comment #22
twistor CreditAttribution: twistor commented