HOWTO #1: Retrieving the first line of a CSV file

Alright, so let's start with a simple example. For this tutorial (and other subsequent CSV import/export HOWTOs), we'll use an example CSV file called courses.csv. Assuming it has been exported from Microsoft Excel, it might also contain semicolons instead of regular commas. Here's the contents; just paste it into your favorite text editor (I know you love Kate) and save it as a new file:

Course Name;Instructor;Units;Unit length (hours);Max number of attendants
Taekwondo;George Wright;12;1.0;20
Aikido;Lukas Gangoly;24;1.5;30
Hands-on pressure point demonstration;Christian Budai;1;0.5;3

In the second CSV HOWTO, we'll import the data rows into Drupal nodes. For that, we will need a list of field names so that the CSV fields can be properly mapped to node/CCK fields. Therefore, this first tutorial shows how to extract the first line of the file as a list of text items (strings) that other operations can use. This first one won't write any data yet, so don't worry about breaking your Drupal. Mind that for the CSV operations used in here, you need the CSV Transformations extension module installed in addition to Transformations' core modules (API and UI).

Creating the pipeline

Let's create a new transformation pipeline at the "Transformation pipelines" page (admin/build/transformations) of the "Site building" section in Drupal's administration area. Click the "Add" tab to create a new pipeline, and enter a name for it, e.g. "Extract first row of CSV file". After clicking "Create", Transformations UI displays an empty pipeline with just the "Pipeline parameters" and "Pipeline outputs" blocks shown. Both are not yet connected to any actual input or output slots. (An empty circle means "not connected". A black, filled circle means "connected".)

Adding operations

Now we'll tell Transformations how to process the data. This is done by adding operations to the pipeline, and connecting their slots appropriately. Operations are designed to be used in a mix-and-match fashion, so you need to assemble a few of them in order to get a useful transformation. The operations for this tutorial will consist of the following steps:

Load the CSV text from disk.
Process ("parse") the CSV text to make actual data structures given the plain text data - in case of CSV, the result of this operation is a list of rows, where a single row is a list of text items (one for each column).
Extract the first row of the a above list structure, so that we get our desired result: the first CSV line as a list of text items.

So we just need to communicate that plan to Transformations. Click "Add operation" and select the "Read file contents, line by line" operation (in the "Files and directories" category), then add it. Also, add the "CSV text lines to list of records" (in "CSV") and "Extract single item from list" (in "Lists and structures") operations. Looks better, eh? As you might have guessed, the slots on the left are input slots, and the ones on the right are output slots. Input slots that are required but not yet connected are marked red, you need to connect at least those in order for the pipeline to run.

Connecting input/output slots

Now let's define the data flow by connecting the slots. When executing, Transformations will determine the execution order of the operations depending on how the various slots are hooked up, considering connection dependencies. An operation that is not connected at all won't be executed in a pipeline.

Slots are connected by first clicking on the endpoint of one slot (the circle next to the slot label), then clicking on the endpoint of the other side of the connection. If jQuery UI is enabled, you can also drag and drop one endpoint onto the other one - feels better and saves a page reload. (Valid connections are highlighted when you drag and hover over a valid connection target.) Disconnecting works by clicking on a connected endpoint, then clicking again on the "X" that appears where the circle has been. An input slot can only be connected to a single output, but an output slot can feed multiple inputs. So when you hook up an already connected input slot to another output slot, its previous data source will be disconnected. Unfortunately, Transformations can't figure out compatibility between different slots yet, so you need to figure out for yourself which data types can be connected - either by making a good guess (should work most of the time) or by looking it up in the code. Er.

So, connect the slots! We want to make the file path and the CSV delimiter a dynamic parameter (to be provided only when the pipeline is executed), so we'll define both as pipeline parameters. Connect the "File path" slot to the "New parameter" pseudo-slot (creates a new parameter), then connect the "Delimiter" slot to "New parameter" (which appeared again, in addition to the "File path" parameter that you just assigned). Also, connect the "Text lines" output with the "CSV text lines" input, the "List of records" output with the "Item list" input, and the "Extracted item" with a "New (pipeline) output". That's it, we're set! Press the "Save" button to permanently store the pipeline in the database.

Executing the pipeline

As final step, let's try out the pipeline by opening to the "Execute" tab. The two pipeline parameters (file path and CSV delimiter) can be filled in there. The file path parameter requires a local path, so if you're not running Transformations on a local webserver, you need to manually upload the file beforehand (for example via FTP, SSH or upload.module), and then specify the local path on the server (e.g. "sites/default/files/courses.csv", or wherever you uploaded it to). The delimiter slot has a default value (the comma), which will be used unless you uncheck the "Use the default value for this input" option. Our CSV fields are separated by semicolons instead of commas, so in this case we actually want to uncheck this option and enter a semicolon (";") into the text field.

Now hit "Execute". If all went well, you'll see a visualization of the pipeline output, like this:

Extracted item ("item"):

0 { "Course Name" }
1 { "Instructor" }
2 { "Units" }
3 { "Unit length (hours)" }
4 { "Max number of attendants" }

That's it for this first tutorial! You can now use this pipeline to have a look at all your CSV headers in a list format, and more importantly, the results of this pipeline can be passed to other pipelines, like the one in the second HOWTO for CSV data.

Step by step

Upload a csv file in a folder that drupal can load
Click the Add tab to create a new pipeline, and enter a name for it, example : "Extract first row of CSV file".
Click Add Operation. Select Read File Contents (line by line). Click Next
Click Add Operation. Select CSV text lines to list of records. Click Next
Click Add Operation. Select Extract single item from list. Click Next
Connect Text lines to csv text lines
- If you have jquery ui, drag one on another, else click one and another after
Connect File Path and Delimiter to New parameter in Pipelines parameters
Connect List of records to Item List
Connect Extracted Item to New output in Pipelines outputs
Click on save button
Click execute
- enter ";" in delimiter
- enter the csv file path

The export of this pipeline

array (
  4 => 
  array (
    'name' => 'Extract first row of CSV file',
    'properties' => 
    array (
      'transformations_ui:position:TfWholeStringAtOnceFromFile-1' => 
      array (
        'z-index' => '4',
        'top' => '0',
        'left' => '189',
      ),
      'transformations_ui_persistence_id' => 4,
    ),
    'operations' => 
    array (
      'TfTextLinesFromFile-1' => 
      array (
        'name' => 'TfTextLinesFromFile',
        'label' => 'Read file contents (line by line)',
      ),
      'TfRecordsFromCSVTextLines-1' => 
      array (
        'name' => 'TfRecordsFromCSVTextLines',
        'label' => 'CSV text lines to list of records',
      ),
      'TfListExtractSingleItem-1' => 
      array (
        'name' => 'TfListExtractSingleItem',
        'label' => 'Extract single item from list',
      ),
    ),
    'sources' => 
    array (
      'TfWholeStringAtOnceFromFile-1' => 
      array (
        'filepath' => 
        array (
          'entity' => 1,
          'data' => '',
        ),
      ),
      'TfRecordsFromCSVTextLines-1' => 
      array (
        'csvTextLines' => 
        array (
          'entity' => 'TfTextLinesFromFile-1',
          'key' => 'textLines',
        ),
        'delimiter' => 
        array (
          'entity' => '2',
          'key' => 'delimiter',
        ),
      ),
      'TfTextLinesFromFile-1' => 
      array (
        'filepath' => 
        array (
          'entity' => '2',
          'key' => 'filepath',
        ),
      ),
      'TfListExtractSingleItem-1' => 
      array (
        'list' => 
        array (
          'entity' => 'TfRecordsFromCSVTextLines-1',
          'key' => 'records',
        ),
      ),
    ),
  ),
)

Comments

Unusable until a bug is fixed.

domesticat commented 11 June 2010 at 19:47

This is a great tutorial ... the problem? The Drupal Transformations module has a critical bug dating back to July 2009 #532464: Error adding operation "Set fields in node object" when you try to use the 'Set fields in node object' option needed in part 2 of this tutorial.

Until that bug is fixed, don't bother trying this tutorial.