Hello, I ran into the need to parse CSV without column headers. Noticed that FeedsCSVParser.inc required the first row to name columns however, so I quickly added an option to get around that. Allows mapping "column0", "column1", ... "columnN" to destination targets. Not sure how to provide patch, sorry this will be my first time sharing..

<?php
// $Id: FeedsCSVParser.inc,v 1.8 2010/03/29 04:02:37 alexb Exp $

/**
 * Parses a given file as a CSV file.
 */
class FeedsCSVParser extends FeedsParser {

  /**
   * Implementation of FeedsParser::parse().
   */
  public function parse(FeedsImportBatch $batch, FeedsSource $source) {

    // Parse.
    feeds_include_library('ParserCSV.inc', 'ParserCSV');
    $iterator = new ParserCSVIterator(realpath($batch->getFilePath()));
    $source_config = $source->getConfigFor($this);
    $parser = new ParserCSV();
    $parser->setDelimiter($source_config['delimiter']);
    $parser->setSkipFirstLine(FALSE);
    $rows = $parser->parse($iterator);
    unset($parser);

    // Apply titles in lower case.
    // @todo Push this functionality into ParserCSV.
    // Added: Check if header will be used
	if ($source_config['header']) {
      $header = array_shift($rows);
      foreach ($header as $i => $title) {
        $header[$i] = drupal_strtolower($title); // Use lower case only.
      }
    // Added: Otherwise create ficticious column names
    } else {
      if (count($rows)) {
        $header = array();
        for ($i=0; $i < count($rows[0]); $i++ ) {
          $header[$i] = 'column' . $i ;
        }
      }
    }   
    $result_rows = array();
    foreach ($rows as $i => $row) {
      $result_row = array();
      foreach ($row as $j => $col) {
        $result_row[$header[$j]] = $col;
      }
      $result_rows[$i] = $result_row;
    }
    unset($rows);

    // Populate batch.
    $batch->setItems($result_rows);
  }

  /**
   * Override parent::getSourceElement() to use only lower keys.
   */
  public function getSourceElement($item, $element_key) {
    $element_key = drupal_strtolower($element_key);
    return isset($item[$element_key]) ? $item[$element_key] : '';
  }

  /**
   * Define defaults.
   */
  public function sourceDefaults() {
    return array(
      'delimiter' => $this->config['delimiter'],
    );
  }

  /**
   * Source form.
   *
   * Show mapping configuration as a guidance for import form users.
   */
  public function sourceForm($source_config) {
    $form = array();
    $form['#weight'] = -10;

    $mappings = feeds_importer($this->id)->processor->config['mappings'];
    $sources = $uniques = array();
    foreach ($mappings as $mapping) {
      $sources[] = $mapping['source'];
      if ($mapping['unique']) {
        $uniques[] = $mapping['source'];
      }
    }

    $items = array(
      t('Import !csv_files with one or more of these columns: !columns.', array('!csv_files' => l(t('CSV files'), 'http://en.wikipedia.org/wiki/Comma-separated_values'), '!columns' => implode(', ', $sources))),
      format_plural(count($uniques), t('Column <strong>!column</strong> is mandatory and considered unique: only one item per !column value will be created.', array('!column' => implode(', ', $uniques))), t('Columns <strong>!columns</strong> are mandatory and values in these columns are considered unique: only one entry per value in one of these column will be created.', array('!columns' => implode(', ', $uniques)))),
    );
    $form['help']['#value'] = '<div class="help">'. theme('item_list', $items) .'</div>';

    $form['delimiter'] = array(
      '#type' => 'select',
      '#title' => t('Delimiter'),
      '#description' => t('The character that delimits fields in the CSV file.'),
      '#options' => drupal_map_assoc(array(',', ';')),
      '#default_value' => isset($source_config['delimiter']) ? $source_config['delimiter'] : ',',
    );
    // Added: Support header checkbox
    $form['header'] = array(
      '#type' => 'checkbox',
      '#title' => t('Header'),
      '#description' => t('If the CSV source contains column header values, check here.'),
      '#default_value' => $source_config['header']
    );
    return $form;
  }

  /**
   * Define default configuration.
   */
  public function configDefaults() {
    // Updated: Support header checkbox
    return array('delimiter' => ',', 'header' => '0');
  }

  /**
   * Build configuration form.
   */
  public function configForm(&$form_state) {
    $form = array();
    $form['delimiter'] = array(
      '#type' => 'select',
      '#title' => t('Default delimiter'),
      '#description' => t('Default field delimiter.'),
      '#options' => drupal_map_assoc(array(',', ';')),
      '#default_value' => $this->config['delimiter'],
    );
    // Added: Support header checkbox
    $form['header'] = array(
      '#type' => 'checkbox',
      '#title' => t('Header'),
      '#description' => t('If the CSV source contains column header values, check here.'),
      '#default_value' => $this->config['header']
    );
    return $form;
  }
}
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Grayside’s picture

Glad to see you contributing back!

The CVS Example section might help you figure out patching. By "change the files" it means to rename the file you changed. filename.inc.new might work.

Possibly this should be a separate parser extending FeedsCSVParser. If that's not the direction to take, the checkbox will need to be kept, but should default to headers being "active" if that's the most common case.

It would also be nice if the Mapper UI excluded the need to name the columns entirely, since it can be handled dynamically by setting the order of mappings.

brst t’s picture

Mapping columns and skipping first row - like it's done in the node_import module, though hopefully not requiring an additional 8 linear steps and no saved settings.

Importing csv to nodes, ahh. Getting taxonomy nicely, though I had to insert a dummy top root and am missing the node-reference field. Small sacrifices easily fixed and worked around.

imclean’s picture

Version: 6.x-1.0-alpha14 » 6.x-1.0-beta4
Status: Active » Needs review
FileSize
2.65 KB

This would be a great feature to have. Here's patch for beta4 based on the code by mvierow. It requires you to name the source mappings column0, column1, column2 etc. which is less than ideal as that's how the target is named.

It'd be good to be able to enter whatever source name you'd like (therefore target name as well) while not having column headers in the CSV. Similar to Grayside's idea, but retaining the ability to label the columns yourself.

Also, being able to reorder mappings without removing them and adding them again would be handy.

alex_b’s picture

Status: Needs review » Needs work

Why do we bother with such complicated header names, why not just number them 1, 2, 3, 4, 5?

alex_b’s picture

Or even better 0, 1, 2, 3 - that way we may get away without setting the array keys explicitly, instead just using the default numeric array keys.

imclean’s picture

Sounds like a good idea. Given more time I would like to delve deeper into the inner workings of the Feeds module and all its files.

Perhaps some context to my particular requirements is in order. We are using Feeds with the Data module to import CSV data into database tables. The fields are automatically named based on the Source mappings, which in this case is column0, column1. Not ideal but it works.

What files do I need to look at to be able to both:

1. Use default array key numbering as the internal header names

2. Name the source mapping whatever I like

I'm happy to look into implementing it, I'd just like a little help on direction.

alex_b’s picture

1. This should be just how the parser returns it, no modifications to the keys necessary.
2. Create the fields using admin/build/data before mapping to them.

imclean’s picture

1. Thanks.
2. Ah yes, that works well.

I'm wondering if it would be possible to handle this from within Feeds so it's processor independent. It would also keep some basic configuration in the one place.

Would it make sense to be able to specify or select a source column or name and specify a target name within the mappings settings?

imclean’s picture

Actually point 2 doesn't quite work the way I've done it. I can't create a mapping with the source 0.

alex_b’s picture

#9 - why is that?

imclean’s picture

The function addMapping() in plugins/FeedsDataProcessor.inc checks to see if the source is empty:

 if (empty($source) || empty($target)) {
      return;
 }

PHP's empty() function considers 0 and "0" to be empty.

I disabled the above check to see if that was the problem and a new field "__num_0" was created in the database. This new field still doesn't appear as a source in the mappings but it does appear in the target select list.

imclean’s picture

Version: 6.x-1.0-beta4 » 6.x-1.0-beta5
Status: Needs work » Needs review
FileSize
2.54 KB

The array keys method seems to work with beta5 which simplifies things a bit.

patcon’s picture

Version: 6.x-1.0-beta5 » 6.x-1.0-beta9

While the file has changed a bit, this still works in beta9 when applied manually... Sorry, no patch at the moment. Thanks very much for contributing this!

Any chance of getting any commit-love on this? An affirmative on that might give me the kick in the pants that I need in order to learn how to roll a patch :)

patcon’s picture

FYI, running into slight issue that might be a decent reason to use col0, col1, col2, etc (if it's all the same to everyone else).

I'm using a module that uses the source field name in a theming function, and due to a coding bug, it's reading the zero and taking it as a false, and the function is throwing an error. I understand that this is a downstream problem that needs to be fixed anyhow, but using colx might save users a few headaches and warnings with other feed contrib modules. I knew enough to figure out what was causing it, but others might not :)

imclean’s picture

If the other module is the problem then that's where it should be fixed. Should be easy enough, report it in that module's issue queue if you haven't already.

twistor’s picture

@patcon, that's not a bad idea. This is a problem in php. When using number strings, i.e. "0", as array keys, php converts them to integers. This causes all sorts of problem with Drupal's form api. Although, I think starting with 1 should solve the problem.

patcon’s picture

@imclean, I'm with you, but it seems like it's an arbitrary decision here, so would it not make sense to be pro-active in one place rather than forcing others to be reactive, potentially in many places?

alex_b’s picture

Status: Needs review » Needs work

@patcon I'd rather avoid 'col'x or starting with 1 and fix this problem where it needs to be fixed.

#12 looks good. One issue: we need to inverse the option 0 or empty for 'use headers' and 1 or TRUE for 'don't use headers'. Otherwise we have a backwards compatibility problem. Most likely we'll want to reflect this in the UI as well. "No headers" "Check if the imported CSV file does not start with a header row".

imclean’s picture

Status: Needs work » Needs review
FileSize
2.32 KB

I thought setting the default option would take care of backwards compatibility when upgrading, although I haven't tested this. Either way, this patch is against beta9 incorporating alex_b's suggestion in #18.

alex_b’s picture

Version: 6.x-1.0-beta9 » 6.x-1.x-dev
Status: Needs review » Reviewed & tested by the community

I thought setting the default option would take care of backwards compatibility when upgrading

This way we're on the safe side...

This is RTBC now but will need a port to 7.x 2.x - this patch is simple, I'll do that on the fly.

alex_b’s picture

D6 minor clean ups and D7 port.

alex_b’s picture

Title: CSV parser without column headers » Support parsing CSV files without column headers
Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Reviewed & tested by the community » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

hanoii’s picture

It's not my intention to re open old issues but I am struggling understanding how no_header works.

First it seems that I need to manually set the columns source to 0, 1, 2, etc for this to work properly, is that right?

Second, what's the point of having the No headers option in the import page? Because if you actually change that setting and use a CSV WITH header, as the opton is there for you to use it, then the 0, 1, 2 won't work any more and you would need to manually edit the source details again, is that right? I believe the option should only be there on the import definition and not in the actual import form, but rather be an informative legend. Worth any of this a patch?