Given the potential number of parameters for a parse run (delimiter, skip first line, timeout, column names, and what not), it might be a good idea to not stick everything into a single function signature. One could create a single $options argument (like for D6 url() and the likes), but making a class out of the parser looks just as nice and has the additional advantage that it's possible to store state after parsing is done.
Which is nice for the timeout parameter, as it doesn't require a clumsy by-reference argument to return the success state and/or number of parsed lines.
In preparation of finally uploading an initial version of my new Transformations module (sorry for being late), here's a patch that introduces a ParserCSV class, and makes parser_csv_parse() a mere convenience wrapper.
| Comment | File | Size | Author |
|---|---|---|---|
| parser_csv_parseobject.diff | 12.55 KB | jpetso |
Comments
Comment #1
jpetso commentedOh. And introduces the timeout parameter. Untested, but should probably work.
Comment #2
alex_b commentedI like that.
Need to review.
Comment #3
alex_b commentedThis patch passes the tests from patch http://drupal.org/node/350405#comment-1275756 .
I'm fine with worrying about the timeout parameter later.
As far as I'm concerned this is RTBC - nice work!
We should think about the future of our little parser as it's growing. There is also powerful stuff out there (random pick: http://code.google.com/p/php-csv-parser/ ) - jpetso, what are your thoughts?
Comment #4
jpetso commentedI'm completely fine with getting rid of custom code if there are no regressions in standards compliance. Didn't know the project on Google Code (I only made my way through the code comments in the PHP documentation). On first glance though, this specific project seems unsuitable to me in the current state as it uses PHP's fgetcsv() (which is not compliant to the CSV specification), and also it goes slightly against my personal needs as it doesn't take line iterators but only file paths. That's a direct consequence of using fgetcsv(), of course.
I don't want to impede the progress of this module though - if there are good reasons for switching to another parser (like speed, scalability or features) then I'll try to accommodate to it, or "fork" my iterator-based parser back to an implementation detail of my module. As long as using the one in parser_csv is feasible, I'll try to stick to it.
Comment #5
alex_b commentedAgreed.
I don't have a candidate that would take the place of the current parser - that would take a bit more research. You also seem pretty committed to building a solid parser, so I don't have a reason for replacing it :-)
However, I'm thinking that the CSV Parser portion of this project by itself could take advantage from being a Drupal independent project. You could use it in any PHP environment, right?
So what I'm thinking is we could make parser_csv a FeedAPI integration for an independent CSVParser project. I'm just wondering where we would host the CSVParser - any preferences? Google Code? http://code.google.com/hosting/createProject
Either way, for now, I would like to commit this patch as-is.
Comment #6
jpetso commentedWell, as far as my needs are concerned I'm mainly done building it, no promises that I'll be further improving it a lot. (Although an output row iterator that avoids storing the full array in memory would be teh awesome. Maybe I'll try that still.)
Right. On the one hand, that would increase the maintenance effort (file syncing, another repository to get and issue queue to subscribe to, can't depend on Drupal's Simpletest framework, etc.), assuming we stay the only project using the parser. But on the other hand we might gain more users (= contributors?) and also make it easier for Drupal modules to include just the parser in case they don't want to depend on or be part of the parser_csv package. Reasons for making the parser into a separate upstream are there, although I'm relatively indifferent whether to do this or not as the pros and cons pretty much seem to outweigh one another.
Google Code would also be ok, but my preference would be a hosting platform that does distributed version control systems. Like, say, Canonical's Launchpad - which uses Bazaar as VCS (including a very nice overview of private branches), bug tracking works ok, and unlike Google Code is going to go open source this summer. I've always wanted to try out Bazaar for real, it's the only major VCS that I haven't yet investigated in depth :P
I don't want to decide this all by myself though, your opinions and preferences are of course just as valuable as mine.
I won't stop you from doing so, it's my patch after all :P
Comment #7
alex_b commentedThank you for your input. I created a separate issue here: #380206: Break out CSV parser into Drupal independent project .
I don't plan on acting on a split for the moment, but if we'd like to get serious about this, splitting out or replacing by comparable (compliant) existing parser looks like the way to go.
Committed the patch - again, nice work. I like how CSV Parser is coming together.
Comment #8
alex_b commented