Currently, Feeds uses ParserCSV defined in libraries/ParserCSV.inc which parses the CSV file itself, unnecessarily. It is a much better idea to use the native PHP function fgetcsv. It allows for configurable enclosure and escape characters as well as the delimiter (needed for issues like #1356718: The escape and enclosure characters should be configurable). According to the discussion over at #1014678: CSVParser can't parse TSV Files: double quotes still special, it's also faster.
Older versions of PHP may not cut it, but I don't think this is a huge problem. Most OS distributions are already using PHP 5.3.
Changelog:
- 5.3.0: The escape parameter was added
- 4.3.5: fgetcsv() is now binary safe
- 4.3.0: The enclosure parameter was added
| Comment | File | Size | Author |
|---|---|---|---|
| #9 | ParserCSV.inc_.patch | 3.79 KB | linuzboy |
| #3 | use_fgetcsv_instead_of_own_parser-1369874-3.patch | 6.79 KB | colan |
Comments
Comment #1
colanI'll start by getting what I can from the D6 version over at #1014678: CSVParser can't parse TSV Files: double quotes still special.
Comment #2
mrharolda commentedNice one, I'll be following this!
Feeds is becoming increasingly importent because almost all our clients have existing (non-drupal) websites already. I'll be migrating a lot of data in the future ;)
Comment #3
colanThis took a lot longer than I thought because the original patch was so divergent from what's now in the development tree. Anyway, here's what I've got so far. I haven't even had a chance to test this myself yet, but I wanted to post something as soon as I finished coding to get some feedback.
We'll see how it runs tomorrow.
Comment #4
colanSorry folks. Client priorities are changing again.
Comment #5
colanLeaving this with a more descriptive title so hopefully someone else can pick it up.
Comment #6
colanMarking #1161628: Better CSV Parser as a duplicate. Yes, that one's older, but we need to go from D7 down anyway.
Comment #7
emackn commentedI like this change. But I think it should happen in libraries/ParserCSV.inc instead of FeedsCSVParser.inc.
This doesn't break the existing tests and also allows for the existing tests to serve as a start point for more tests. YAY!
Comment #8
colanAdding tag.
Comment #9
linuzboy commentedYes, this is much more easier and applicable.
Just replace the line parser codes with the single line
Comment #10
kthullForget where I found this on d.o, but just add this to the end of your settings.php file and you're good to go:
Comment #11
gaëlg#9 did not work for me. Before,
\"gave\in my field, and after, it gives\"(no change). It should give".Comment #12
gaëlgAnd #3 gave
\Comment #13
gaëlgMy problem comes from this PHP bug. So it works only if the escape character is
".To get such a CSV file from PHPMyAdmin, you need to select "CSV for MS Excel" as file format. Then no need to patch the module, it just works.
Comment #14
twistor commentedRemoving tag.
Release blockers are about broken/missing functionality. Trying to replace the most common parser right before a release is not a good idea. The current CSV parser has its issues, but it is relatively well tested. There are numerous known issues with the fgetcsv() in various versions of PHP, mostly stemming from the fact that there's no spec for CSV files.
I love the idea of replacing a large portion of our own code, with C. I'd just like to see it mature a bit.
Comment #15
gaele commentedThank you GaëlG for the workaround (#13). You saved my day.
Comment #16
franzWhy is it that #9 doesn't set the enclosure when calling str_getcsv()? Why are there 2 backslashes instead of one for the escape argument? I think the whole idea is fine, but providing it can cause instability and some errors, we won't be able to fix. I think it's better to fix current parser.
http://ca.php.net/manual/en/function.str-getcsv.php
Comment #17
twistor commentedstr_getcsv() has too many bugs. It's not worth it.
Comment #18
pdcarto commented[Edit - whoops! I conflated this with https://www.drupal.org/node/2443721. Sorry!]
Changing this to "Closed (won't fix)" because the proposed solution is not acceptable ignore the reported issue, which is that there are problems parsing csv with non-enclosing double-quotes. If using str_getcsv is not an acceptable solution, then it seems to me the only other options are to either fix ParserCSV, or to say that the reported issue is not a actually problem.
Comment #19
mrharolda commentedMy guess is that str_getcsv() contains less bugs than the current custom implementation.
Also I can't find what's wrong with str_getcsv() as I'm using it for years now. Might be coincidence too.
Comment #20
twistor commented#2443721: CSV import - Unclosed double quote problem