Make parser_csv_parse() into a parser class [#376263]

Given the potential number of parameters for a parse run (delimiter, skip first line, timeout, column names, and what not), it might be a good idea to not stick everything into a single function signature. One could create a single $options argument (like for D6 url() and the likes), but making a class out of the parser looks just as nice and has the additional advantage that it's possible to store state after parsing is done.

Which is nice for the timeout parameter, as it doesn't require a clumsy by-reference argument to return the success state and/or number of parsed lines.

In preparation of finally uploading an initial version of my new Transformations module (sorry for being late), here's a patch that introduces a ParserCSV class, and makes parser_csv_parse() a mere convenience wrapper.

Comment	File	Size	Author
	parser_csv_parseobject.diff	12.55 KB	jpetso

Comments

Comment #1

jpetso commented 18 February 2009 at 13:44

Oh. And introduces the timeout parameter. Untested, but should probably work.

Comment #2

alex_b commented 18 February 2009 at 15:23

I like that.

Need to review.

Comment #3

alex_b commented 21 February 2009 at 16:26

Status:

Needs review

» Reviewed & tested by the community

This patch passes the tests from patch http://drupal.org/node/350405#comment-1275756 .

I'm fine with worrying about the timeout parameter later.

As far as I'm concerned this is RTBC - nice work!

We should think about the future of our little parser as it's growing. There is also powerful stuff out there (random pick: http://code.google.com/p/php-csv-parser/ ) - jpetso, what are your thoughts?

Comment #4

jpetso commented 21 February 2009 at 17:01

I'm completely fine with getting rid of custom code if there are no regressions in standards compliance. Didn't know the project on Google Code (I only made my way through the code comments in the PHP documentation). On first glance though, this specific project seems unsuitable to me in the current state as it uses PHP's fgetcsv() (which is not compliant to the CSV specification), and also it goes slightly against my personal needs as it doesn't take line iterators but only file paths. That's a direct consequence of using fgetcsv(), of course.

I don't want to impede the progress of this module though - if there are good reasons for switching to another parser (like speed, scalability or features) then I'll try to accommodate to it, or "fork" my iterator-based parser back to an implementation detail of my module. As long as using the one in parser_csv is feasible, I'll try to stick to it.

Comment #5

alex_b commented 21 February 2009 at 18:09

I'm completely fine with getting rid of custom code if there are no regressions in standards compliance.

Agreed.

I don't have a candidate that would take the place of the current parser - that would take a bit more research. You also seem pretty committed to building a solid parser, so I don't have a reason for replacing it :-)

However, I'm thinking that the CSV Parser portion of this project by itself could take advantage from being a Drupal independent project. You could use it in any PHP environment, right?

So what I'm thinking is we could make parser_csv a FeedAPI integration for an independent CSVParser project. I'm just wondering where we would host the CSVParser - any preferences? Google Code? http://code.google.com/hosting/createProject

Either way, for now, I would like to commit this patch as-is.

Comment #6

jpetso commented 22 February 2009 at 11:17

You also seem pretty committed to building a solid parser, so I don't have a reason for replacing it :-)

Well, as far as my needs are concerned I'm mainly done building it, no promises that I'll be further improving it a lot. (Although an output row iterator that avoids storing the full array in memory would be teh awesome. Maybe I'll try that still.)

However, I'm thinking that the CSV Parser portion of this project by itself could take advantage from being a Drupal independent project. You could use it in any PHP environment, right?

Right. On the one hand, that would increase the maintenance effort (file syncing, another repository to get and issue queue to subscribe to, can't depend on Drupal's Simpletest framework, etc.), assuming we stay the only project using the parser. But on the other hand we might gain more users (= contributors?) and also make it easier for Drupal modules to include just the parser in case they don't want to depend on or be part of the parser_csv package. Reasons for making the parser into a separate upstream are there, although I'm relatively indifferent whether to do this or not as the pros and cons pretty much seem to outweigh one another.

So what I'm thinking is we could make parser_csv a FeedAPI integration for an independent CSVParser project. I'm just wondering where we would host the CSVParser - any preferences? Google Code?

Google Code would also be ok, but my preference would be a hosting platform that does distributed version control systems. Like, say, Canonical's Launchpad - which uses Bazaar as VCS (including a very nice overview of private branches), bug tracking works ok, and unlike Google Code is going to go open source this summer. I've always wanted to try out Bazaar for real, it's the only major VCS that I haven't yet investigated in depth :P

I don't want to decide this all by myself though, your opinions and preferences are of course just as valuable as mine.

Either way, for now, I would like to commit this patch as-is.

I won't stop you from doing so, it's my patch after all :P

Comment #7

alex_b commented 22 February 2009 at 16:46

Thank you for your input. I created a separate issue here: #380206: Break out CSV parser into Drupal independent project .

I don't plan on acting on a split for the moment, but if we'd like to get serious about this, splitting out or replacing by comparable (compliant) existing parser looks like the way to go.

Committed the patch - again, nice work. I like how CSV Parser is coming together.

Comment #8

alex_b commented 22 February 2009 at 16:46

Status:

Reviewed & tested by the community

» Fixed

Comment #9

8 March 2009 at 16:50

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Make parser_csv_parse() into a parser class

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

News items

Our community

Documentation

Drupal code base

Governance of community