Closed (outdated)
Project:
Feeds
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
4 Feb 2010 at 12:37 UTC
Updated:
16 Jun 2016 at 22:47 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
alex_b commentedCharacter encoding must be UTF-8 for the parser to work properly. Care to post the file?
Comment #2
nchase commentedI'm having German characters like öäü. The problem is: whenever I change the encoding from ansi to utf-8 it breaks the node title. It doesn't map it anymore. Changing it back to ansi it maps the title but doesn't import the characters anymore.
Comment #3
nicholasthompsonI've also had issues with the node title containing double-encoded characters. I ended up having to use decode_entities($value) to tidy them up. I wonder if there is a similar issue with modified characters such as Ü...
PS: Have you tried 6.x-1.x-dev (HEAD)?
Comment #4
nchase commentedI'm using latest dev and the issue is still the same: not mapping title using utf-8 encoded csv.
Comment #5
alex_b commentedI'm giving this a more general title. A sample CSV file that breaks would help a lot.
Comment #6
steinmb commentedAre you sure that the source file (CVS) is in UTF8? I just imported a few Norwegian cvs files that contain æøå in the title-fields and all node created was encoded correctly.
Comment #7
nchase commentedyes, it is utf-8, I opened a new issue for the title mapping: http://drupal.org/node/724080 . Perhaps you'd like to check the file I uploaded at the new issue to confirm that it is utf-8?
Comment #8
summit commentedHi,
Attached an example file, notice "Landhaus Höreageer", this needs to be encoded to the right characters.
Thanks a lot for going into this!
Greetings,
Martijn
Comment #9
mlsamuelson commentedGit didn't want to let me apply this patch against 6.x-1.x-dev so I've made it against HEAD instead.
Given that http://drupal.org/node/724080 indicates there isn't actually an issue when UTF-8 is used, I've disregarded that facet of this issue, and attempted to solve the problem of non-UTF-8 encoded files causing issues (mostly with MySQL), by doing a check for when the CSV file is not UTF-8, and then converting it to UTF-8.
I was never able to get mb-convert-encoding() to handle the conversion correctly without it knowing the unknown encoding or providing a number of variants to check against (see http://stackoverflow.com/questions/1002089/get-file-encoding for more on this). As a result I've used utf8_encode() which assumes you're coming from ISO-8859-1 (which I'm guessing in 80% of the cases is what people will be dealing with).
This isn't a perfect fix, but it's a start and perhaps someone with more encoding-fu can suggest changes. In its current state it should ease the pain of those with users uninterested in doing their own UTF-8 conversions.
Comment #10
marco.giaco commentedWe are importing from several different encodings, I've been facing the same problem.
Unfortunately I could not trust the results of mb_detect_encoding (it is not telling between ISO-8859-1 and windows-1252).
The only way I found to get around this was to let the user specify the right encoding from the import form and recoding from
the user-selected enconding to utf-8 inside the iterator's next method.
I'm attaching a patch (mado on 6.x-1.0-beta10 version) with this approach.
Comment #11
paolomainardi commentedSubscribe
Comment #12
kardave commentedDrupal uses UTF8 all the time. Sources may not.
A good solution could be to force the non-UTF8 characters, to be converted into UTF8.
I grabbed a solution from here: http://stackoverflow.com/questions/910793/php-detect-encoding-and-make-e...
I can now import iso8859-2 encoded CSV files like a charm. Finally.
David
Comment #13
Berliner-dupe commentedIs new information available?
I use dev-version but characters like ä ö and so on will not be imported.
Patch from #9 not help.
CSV is UTF-8.
Can anyone help me?
Regards Matthias
Comment #14
frank ralf commentedTry adding the following code in line 199 of CSV parser:
$line = iconv('ISO-8859-1', 'UTF-8', $line);(Taken from http://www.drupalcenter.de/node/40991 )
Comment #15
bgm commentedUTF-8 with feeds works for me, but did you test the patch proposed in #1005400: Remove BOM from UTF-8 files in Drupal 6? ?
Comment #16
Berliner-dupe commented#14
Thank you ....
but ... i think this dont work for Feeds-DEV-Version.
Comment #17
franzIs this related? #1487670: CSV Parser: incorrect UTF-8 interpretation (looks like limited number of imported fields) - Could you try the patch on this issue?
Comment #18
heine commented#10 is the correct approach (note: did not test patch), because ISO-8859-1 and windows-1252 are not equivalent (eg for €).
The CSV in #8 appears to have been proper UTF-8 that has been converted to UTF-8 again while being interpreted as a single-byte encoding. The source CSV is broken, feeds not (in this case).
Comment #19
franz@Heine, feeds is only broken in the issue I mentioned above regarding UTF-8 (multibyte in general)
Comment #20
manuelBS commentedIsn't this related to #1428272: Added support of encoding conversions to the CSV Parser
Comment #21
megachrizPerhaps other related issues:
#1283512: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\x8Es, fi...' for column 'field_product_description_value' at row
#2236589: Importing CSV with special characters gives errors
Comment #22
twistor commented