Subj + minor fixes to so that processing EMPTY parser results won't break the batch operation (it was before).

* CSV parser configuration page and Importer start screen:
- source file encoding selection
- checkbox to check source encoding
* Parser
- converts encoding of source data
- checks source encoding and if it differs - throwing an exception


Requires: mbstring PHP extension

#82 feeds-encoding_support_CSV-1428272-82.patch5.21 KBacouch
PASSED: [[SimpleTest]]: [MySQL] 5,078 pass(es).
[ View ]
#52 feeds-encoding_support_CSV-1428272-52.patch5.07 KBeosrei
PASSED: [[SimpleTest]]: [MySQL] 4,355 pass(es).
[ View ]
#50 feeds-encoding_support_CSV-1428272-50.patch6.43 KBeosrei
PASSED: [[SimpleTest]]: [MySQL] 4,355 pass(es).
[ View ]
#45 feeds-patch.png18.07 KBmsti
#41 feeds-encoding_support_CSV-1428272-41.patch6.89 KBliquidcms
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch feeds-encoding_support_CSV-1428272-41.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]
#37 n4.txt14 bytesspuki
#35 feeds-encoding_support_CSV-1428272-35.patch6.89 KBJerenus
PASSED: [[SimpleTest]]: [MySQL] 3,843 pass(es).
[ View ]
#34 feeds-encoding_support_CSV-1428272-34.patch9.42 KBJerenus
PASSED: [[SimpleTest]]: [MySQL] 3,845 pass(es).
[ View ]
#30 price_win1251.csv_.txt143.25 KBStan Shevchuk
#27 feeds-add-encoding-support-1428272-0.patch6.94 KBOnkelTem
PASSED: [[SimpleTest]]: [MySQL] 3,849 pass(es).
[ View ]
#13 adding_encoding_conversion_support-1428272-12.patch10.19 KBderhasi
FAILED: [[SimpleTest]]: [MySQL] 3,737 pass(es), 51 fail(s), and 2 exception(s).
[ View ]
#6 adding_encoding_conversion_support_2.patch9.83 KBOnkelTem
FAILED: [[SimpleTest]]: [MySQL] 3,737 pass(es), 51 fail(s), and 2 exception(s).
[ View ]
#3 adding_encoding_conversion_support.patch5.4 KBOnkelTem
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch adding_encoding_conversion_support_0.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]
#1 Edit importer: Товары из бухгалтерии | ФОР-Дубна.png86.26 KBOnkelTem
adding_encoding_conversion_support.patch6.26 KBOnkelTem
PASSED: [[SimpleTest]]: [MySQL] 3,845 pass(es).
[ View ]


Issue summary:View changes


Issue summary:View changes


Issue summary:View changes


Issue summary:View changes


Issue summary:View changes



Cool features. I need this - I must check it.


Пацаны, ваще ребята!
Молодцы, четко, могёте, умеете!

Пошел тестировать.

с ув., Алексей

Issue summary:View changes


Issue summary:View changes


new5.4 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch adding_encoding_conversion_support_0.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Supplying patch to 7.x-2.x git version.
Since commerce_feeds dev branch haven't been updated to accomodate severe changes in the feeds 2.x git branch yet (getting error message: Class FeedsCommerceProductProcessor contains 2 abstract methods .... ), I had no chance to test the patch.

Issue summary:View changes


Version:7.x-2.0-alpha4» 7.x-2.x-dev
Status:Active» Needs review

Hi your patch not work:

:41: trailing whitespace.

:43: trailing whitespace.

:121: trailing whitespace.
$defaults = $this->configDefaults();
:125: trailing whitespace.
'#collapsible' => TRUE,
error: patch failed: plugins/
error: plugins/ patch does not apply

I try to use it in last dev version.


Please contact with me via mail or skype awa_77.

Version:7.x-2.x-dev» 7.x-2.0-alpha4
new9.83 KB
FAILED: [[SimpleTest]]: [MySQL] 3,737 pass(es), 51 fail(s), and 2 exception(s).
[ View ]

Fixed patch to 7.x-2.0-alpha4

* Fixed errors in previous (#0) patch: added missed encoding conversion when a cell allocates more then one line.

* REWORKED parser. This is serious change. I believe the original CSV parser did things in a wrong way, treating an arbitrary double quote as a delimiter. IMO, this violates CSV rules, which states, that a field should be [fully] double quoted, when something uncommon happens in it. For example:

;A string with a "double quote in it;

should definitely break parsing process. The correct variant would be:

;"A string with a ""double quote in it";

But current implementation will accept the former and will feed the fields' value with subsequent lines until next double quote.
In my patch it will throw an exception with corresponding message about it, stopping parsing.

p.s. Moving the issue back to alpha4. This is odd, I know, but I lost in versions.


I have import problems related to encodings (my CSV files are is-latin 1 encoded) and as re-encoding them is not my favorite choice I tried to find solutions.
This patch seems a nice deal but I can't apply it. I have Feeds 7.x-2.0-alpha4 but here is what I get :

$ patch < adding_encoding_conversion_support_2.patch
patching file
Hunk #1 FAILED at 343.
1 out of 1 hunk FAILED -- saving rejects to file
patching file
Hunk #1 FAILED at 74.
Hunk #2 FAILED at 92.
Hunk #3 FAILED at 194.
Hunk #4 FAILED at 232.
Hunk #5 FAILED at 258.
Hunk #6 FAILED at 320.
6 out of 6 hunks FAILED -- saving rejects to file
patching file
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 101.
Hunk #3 FAILED at 145.
Hunk #4 FAILED at 154.
Hunk #5 FAILED at 180.
5 out of 5 hunks FAILED -- saving rejects to file

Am I missing a point here ?

Does patch -p1 < patchfile work for you?

Status:Needs review» Needs work

yes it does :

$ patch -p1 < adding_encoding_conversion_support_2.patch
patching file includes/
patching file libraries/
patching file plugins/

Unfortunately my iso-latin encoded CSV file could not be imported and I had to convert it.


Would you send me an example file?
aneganov at gmail d0t com

Status:Needs work» Needs review

The patch works for me.

Cleaned it up for coding styles and renamed it, so the issue is referenced. (@see).

@polom, did you set the "Source file encoding" in the "Settings for CSV Parser"?

new10.19 KB
FAILED: [[SimpleTest]]: [MySQL] 3,737 pass(es), 51 fail(s), and 2 exception(s).
[ View ]

*arg*, sorry forgot to attach the patch

Version:7.x-2.0-alpha4» 7.x-2.x-dev
Status:Needs review» Needs work

do you have a test for this?

OnkelTem,could you provide a test for that?

@derhasi, I would gladly make one, if I were know how to do that :) Really, I never created unit tests before.

there is a patch in the issue summary, and in #13.
Which one to use? I applied #13 and update.php, nothing changed so far, I still have error
SQLSTATE[HY000]: General error: 1366 Incorrect string value

Please, provide a copy of example data file which produces the error.

Category:feature» bug

In my case, I use feeds, mailhandler and mail comment modules (latest devs in d7)
The following error messages appear on

Mailbox was checked and contained 1 messages.
Warning message SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xA0
<...' for column 'comment_body_value' at row 1
Error message SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xA0
<...' for column 'message' at row 1

The drupal error log shows:

PDOException: in field_sql_storage_field_storage_write() (line 448 of /home/gezond/public_html/modules/field/modules/field_sql_storage/field_sql_storage.module).

and the feeds log shows:
SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xA0asdf ...' for column 'message' at row 1

The data imported is a general Gmail message like 'hello' (of which I have set utf8 for outgoing messages, according to

Are you mails come in CSV format?

Status:Needs work» Needs review

Seems to work really well for me. Don't know the test framework very well either or I'd be right on it. I found a bunch of other issues which I believe this solves, which I closed as duplicate. Hope that's not too presumptuous.
#1605628: Encoding problems while importing from CSV
#1471950: CSV Parser: Check and Convert the Encoding
#1220606: ‚ or ’ character in Feed
#1319142: csv in ISO 8859-15/EURO

Category:bug» feature

We probably need to split this issue, since #6 is reporting about serious bug in parser.
What do you think?

Yes, these are very separate issues.

new6.94 KB
PASSED: [[SimpleTest]]: [MySQL] 3,849 pass(es).
[ View ]

Separating encoding support from the rest.

Status:Needs review» Needs work

+++ b/libraries/ParserCSV.incundefined
@@ -324,4 +342,37 @@ class ParserCSV {
+      if (function_exists('mb_convert_encoding')) {

This should use extension_loaded(). Move this check to the top of the function, don't check twice.

+++ b/plugins/FeedsCSVParser.incundefined
@@ -185,6 +191,40 @@ class FeedsCSVParser extends FeedsParser {
+    $form['encoding'] = array(

This whole fieldset should be conditional on the mb library.

+++ b/plugins/FeedsCSVParser.incundefined
@@ -185,6 +191,40 @@ class FeedsCSVParser extends FeedsParser {
+    if (function_exists('mb_list_encodings')) {

Should use extension_loaded().

Could you please provide an example CSV file that needs this functionality?

new143.25 KB

Hi twistor,

I'm attaching a sample CSV file that requires conversion. Any chance to see the patch commited to dev?

I need encoding conversions from Spanish characters: which one is the patch to use against the current alpha7 release?

Thanks so much for your time on this essential module!

Please can this be committed - this would be so useful as I feel pain at the moment every time I have to import Euro characters!

To the last 3 posters, see #29. The patch still needs work before it can be committed.

new9.42 KB
PASSED: [[SimpleTest]]: [MySQL] 3,845 pass(es).
[ View ]

Here is the patch based on all the previous. And I made some modifications to let it work.

new6.89 KB
PASSED: [[SimpleTest]]: [MySQL] 3,843 pass(es).
[ View ]

The final one.

Status:Needs work» Needs review

new14 bytes

It's not working for me. File encoding is Windows-1251, but it was recognized as CP936. So I just commented lines with encoding checking.

//$encode_array = array('ASCII', 'UTF-8', 'GBK', 'GB2312', 'BIG5');
      //$this->encoding = mb_detect_encoding($data, $encode_array);
      // Convert encoding if needed
if ($this->from_encoding != $this->to_encoding) {
$data = mb_convert_encoding($data, $this->to_encoding, $this->from_encoding);

I downloaded the patch and it seems to be working to check the encoding of individual lines. However, I ran into a case where the file I was attempting to import was encoded such that fgets() did not recognize the end of line character and returned the full contents of the file. At this point it is too late for this patch to help.

What I found to resolve this issue and enable feeds to import my ASCII-encoded file was ini_set("auto_detect_line_endings", true). See the bottom of the page at for more information. It seems as though PHP, without this setting, does not separate lines which only end in \r (Mac Carriage Return character).

Hope this is useful to someone else as it took me a while to find.

i have issue similar to #37. i saved a file from Excel (2007) as .csv. the file has 5 lines, 3 of which have odd non-ASCII dashes. when hitting the detect encoding line in the patch; as spuki states, those 3 lines are detected as CP936.

NOTE. comment in #37 about commenting out those lines doesn't do anything; mb_detect_encoding doesn't modify the data it simply determines the encoding. with or without the array of encoding types listed it still determines these lines are CP936 and the part of the code which then does the encode still runs (that check is simply to make sure we don't re-encode data that is already UTF-8.

so the only issue (and maybe not fixable) is that to convert those dashes in CP936 to UTF-8 the dash is simply removed - this is not a great solution; but without the patch and the encoding conversion Feeds import fails on these characters.

ALSO - as noted in #1140194: SQLSTATE[HY000]: General error: 1366 Incorrect string value for a field with accents if i open the Excel saved CSV into Notepad and then re-save as UTF-8, the dashes import correctly (with or without this patch; although i agree would be nice if Feeds could handle this).

turns out losing characters when using PHP to convert from CP936 to UTF-8 is a bug in PHP 5.3 and has been fixed in PHP 5.4.0 (

new6.89 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch feeds-encoding_support_CSV-1428272-41.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

btw, patch in #35 does not adhere to drupal coding standards. i think the attached is somewhat closer

Status:Needs review» Needs work

The last submitted patch, feeds-encoding_support_CSV-1428272-41.patch, failed testing.

Status:Needs work» Needs review

#35 works for me


new18.07 KB


Status:Needs review» Reviewed & tested by the community

Hi, for me too! Setting this to RTBC ok?
Greetings, Martijn

Just to be clear: is #35 the patch that will be committed?

Status:Reviewed & tested by the community» Needs work

As was already pointed out, the patch in #35 breaks Drupal coding standards.

Honestly the to, from, check encoding business seems too complicated. Feeds is complicated enough. Is there a valid reason why we can't just automatically detect the encoding, and use that without any user interaction?

twistor: Because sometimes the automatic detection is not reliable. It's nice to have auto detection, but not only this.

Btw, if you think Feeds is too complicated too, see this project.

Status:Needs work» Needs review
new6.43 KB
PASSED: [[SimpleTest]]: [MySQL] 4,355 pass(es).
[ View ]

Here is #35 plus the coding standard changes from #41. Should pass tests. Works great for me to import international characters. Thanks everyone.

Issue summary:View changes


Are we have the conclusion that the demand of user interaction?

new5.07 KB
PASSED: [[SimpleTest]]: [MySQL] 4,355 pass(es).
[ View ]

I've refactored this a bit:

  • UTF-8 conversion is forced to stop the "SQLSTATE[HY000]: General error: 1366 Incorrect string value" error.
  • Removed the optional "Check encoding" checkbox.
  • Made the encoding convert function fail if the encoding detection fails
  • Removed the encoding check function (it seems extraneous?)
  • Removed the exception if fixEncoding() cannot locate mbstring. I left the notice on the import form, maybe should be in install requirements?
  • Made $detected_encoding a local variable.
  • Added additional comments.
  • Reduced the overall amount of code changes.

It needs simpletests and possibly more refactoring to fit better within the current code architecture.

Hi, I installed the latest patch 52. I have a csv with terms in a column with a special character (ë). When I import the csv (utf-8) the term is not recognized. The select term field stays empty. I am struggling for a week with this. The only thing that works is to make a fake term without special character and try to change it by deleting the term and try to merge it with the intended taxonomy term with the special character or changing the term name once it is imported.

@johan2 Have you tried any other encoding options? Windows‑1252 worked for my imports.

I am working on a power mac with excel 2001. My best setting to save from excel is -windows csv-. With Smultron and Textmate I controlled the file, and it shows no problems. A text is imported fine to a Text-field except when it has to go to a Term-field with special characters then it goes wrong even when changing the "ë" to "ë".
I suspected also the file... is it really utf-8 ? I found more issues concerning this.

Component:Feeds Import» Code

It must have something to do with the taxonomy. When I export a Term to csv from the site it will export the "ë" to "ë" . I kow that my excel is not the newest but through text-edit or smultron or textmate the text can be exported to utf-8, even changing the "ë" manually. Only for the Term the issue stays unsolved other text-fields have no problems and import correctly.

What also happens is if you adapt the same Term, changing "ë" to "e" in both taxonomy and csv-file, the import can go wrong because something stays in memory. I don't know what it is but flush cache is not enough or updating aliases. Changing abrupt to another Term that was in the taxonomy before works... So once it goes wrong you are started for a lot of worries since you are not sure what to do.

I have some more information about what happens. It seems that a great deal of the problem can be caused by the uploading process. When creating new terms is allowed (because otherwise the term stays empty) on import then the csv file creates for the same collumn with the same term several times a new term and in that process it ends up with ? ? ? around each character. So many duplicates are created. I installed the merge module 7-1 and here I noticed this behaviour. Before I had merge 7-2dev and the replace module, I desinstalled them. But also good news is that the special characters are passing.

But also good news is that the special characters are passing.

So, do you mean this feature patch is working correctly?

I did some more testing and ended up installing Openoffice. Some of the characters in excel didn't export correctly. This was not obvious to find out. In the Openoffice Calc (spreadsheet) I noticed that some characters had ? around them. So this was also the case in the import with Feeds. Once I deleted these and exported the sheet again to csv the Feeds-import worked. The problem was that these ? were invisible in a texteditor or textmate. To import excel into Openoffice goes through a wizard and the software is free, so for me this is solved ;-)

Thanks for the great module. Just imported over 10.000 records with all kinds of data inline: title, number (mixed 1a, 1.1, 2, etc) ,category, subcategory, dates, description, certificate number ... And in not more then a few minutes everything was in my views without problems. So thanks again!!

Just installed #52, working like a charm now! :) Thanx so much!!!

Status:Needs review» Reviewed & tested by the community

Yep worked!
Setting this to RTBC ok?
greetings, Martijn

Issue tags:+Needs tests

Thanks for all the work!

I'd love to have a test for it though.


I've been getting the SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\x92t message in my logs when feeds is trying to process a mail message from mail handler.

I've worked out that in the text the word don't is stopping it from working properly because of the'.

I've tried implementing the patch as suggested above but that still doesn't seem to work.

Also I'm not sure where the GUI as printscreened above is to change the source file encoding?

Thanks in advance.

@sphankin The screenshot above shows the import form located at import/name_of_importer
If you dont see the options in the screenshot, the patch is not applied properly. Here is how to apply patches:

Hi @msti, thanks for the reply. I've followed the patch instructions and I still can't see anything/it isn't fixing the issue. As far as I can tell (looking through the patch) the patch is being applied properly. Would it make a difference if I'm importing an email instead of a CSV file? Thanks.

@spankin: this is only for the CSV parser.

I'll be able to look into writing a simpletest later this month.

@eosrei - thank you, that would be great!

@eosrei - Hi. Any development on the email parser?



@sphankin: This patch is only for the CSV parser. I don't have time/need to work on the email parser. Please create a separate feature request issue for email functionality. I still hope to be able to implement the needed simpletests for this patch, but it is feature complete as is.

I used this patch with success on a project. I'm seriously just waiting for tests to commit it.

Status:Reviewed & tested by the community» Needs work

What franz said, needs tests.


@franz or @twistor, could a test just involve adding international characters to one of the csv files in feeds or should it have its own test?

Ideally, we should have an individual assertion that verifies if the code works well with the encoding. What matters most is to make sure the tests appropriately cover the possibilities and provide an easy output in case it fails. That's my take on it at least.

#52 is the way to go. Thanks for this patch. What is the status on getting this into the next build?

If someone can provide a file or files that works only with the new conversion that they would be comfortable being added to the project I can write the assertion.

#52 worked for me. Thank you for the patch!

Anyone up for the tests? Than I think this can be committed, right?
Greetings, Martijn

So all we need is a CSV file with non utf-8 encoding to test the feature. If someone can provide, then acouch writes the assertions and we commit.

Status:Needs work» Needs review
new5.21 KB
PASSED: [[SimpleTest]]: [MySQL] 5,078 pass(es).
[ View ]

I found that the patch in #52 does't allow the encoding settings to be changed on a per-node basis. I added

$form['encoding']['#default_value'] = isset($source_config['encoding']) ? $source_config['encoding'] : $form['encoding']['#default_value'];

to the sourceForm which fixed this. Didn't get a chance to test this, so setting to 'needs review'. Will reset if the test passes and will still wait for a file to write test encodings.

Status:Needs review» Needs work

This should be "Needs Work" until tests are written. I've got three higher priority projects in front of the project needing this, so I cannot work on it.

Issue summary:View changes

Fixing screenshot

Patch in #52 solved the problem for me as well. (Sorry I didn't test #82 — I was short on time and didn't need the additional functionality it offered so I went for the earlier one with proven success). Wish I could write tests to help get this committed — it's great!

somewhat related but maybe needs a new issue:

if the column headings have non standard characters (mine have french characters) then that column is skipped on import. i mentioned in #39 above:

i was getting the import to crash when it would encounter a non-english character. the path in #52 fixed it from crashing; but it is simply removing those characters. that is certainly not a solution.

ughh!! at a bit of a loss with this... as i mentioned in #86 (and #39) i have the patch from #52 and my import no longer crashes on FR characters; but they get dropped.

I have looked in to the code a bit more and this is making less sense the more i look.

taking the code from the patch i do this in a devel/php window:

= 'Date entrée';
$data = mb_convert_encoding($data, 'UTF-8', 'UTF-8');

and the result is: Date entrée

but, when i use a debugger and step through the same code in the feeds function: fixEncoding()

after running mb_convert_string() the character are lost; exactly as occurs when doing the import.

my guess is this has something to do with the devel/php having some html encoding possibly in the mix that the import function doesn't have; but also confused why so many people seem to be having success with the patch in #52.

is it possibly due to using PHP 5.3 instead of PHP 5.4?

i admit i do not understand all the aspects of dealing with other character sets, but i ran these tests to show if the issue is PHP version dependent.

running a test file via php on command line (Windows)

i set this as my test file:

echo mb_convert_encoding('Bibliothèque', 'UTF-8', 'UTF-8');

results are:

C:\Program Files (x86)\nusphere\phped\php54>php -f test.php
C:\Program Files (x86)\nusphere\phped\php54>cd ../php53
C:\Program Files (x86)\nusphere\phped\php53>php -f test.php
C:\Program Files (x86)\nusphere\phped\php53>

i am sure the ? is just a display issue with the command shell; but it is clear that PHP 5.3 and PHP 5.4 act differently

i'll set up PHP 5.4 for the web server and try the CSV import to be sure.

fixed. in the fixEncoding function i replaced this line:

$data = mb_convert_encoding($data, $this->to_encoding, $detected_encoding);

with this one:

$data = utf8_encode($data);

@liquidcms utf8_encode() only converts a string encoded in ISO-8859-1 to UTF-8. It doesn't cover any other conversions

mb_convert_encoding() converts between any listed encoding and is multi-byte string capable. See:

While this code may need some help, utf8_encode is not the answer. Sorry.

@eosrei thanks for the info.

i do not doubt what you are saying; but utf8_encode does work. :)

also the code in the patch: mb_detect_encoding($data, $this->from_encoding) returns utf8 for my data (not iso-8859-1)

and mb_convert_encoding certainly does not work (although, as i have said, perhaps it is only busted for < php 5.4).

Status:Needs work» Needs review

Issue summary:View changes
Status:Needs review» Needs work

Setting issue status back. No need to test until tests are written.

The patch in comment #82 works for me with these php versions:


I had no need to use "utf8_encode" instead of "mb_convert_encoding". I think "mb_convert_encoding" works really well here. However, I had to specify the exact charset of my CSV file to get it to work (in CSV parser configuration form). The "auto" option doesn't work for me.

I really think this patch must be put on recommended release ASAP because this is a key feature for a lot of Drupal sites in spanish, french, german, etc

Thank you so much!!!

#82 worked great for me.