The MigrateSourceCSV migrate source class allows CSV files to be used as a source.

The class constructor has the following parameters:

  • $path: The full system filepath to the source CSV file.
  • $csvcolumns: An array describing the CSV file's columns. Keys are integers (or may be omitted), values are an array of field name then description. This may be left empty if the CSV file has a header row: see below.
  • $options: An array of options. See below for options relevant here.
  • $fields: Optional - keys are field names, values are descriptions. Use to override the default descriptions, or to add additional source fields which the migration will add via other means (e.g., prepareRow()).

The options used by the CSV source class are:

  • header_rows: The number of rows to count as headers. If this is set, you can pass an empty array for $csvcolumns.
  • embedded_newlines: Set to TRUE if your input file has embedded newlines which throw the record count off. Setting this does make getting the record count significantly slower.
  • 'length', 'delimiter', 'enclosure', 'escape': These are passed as parameters to the PHP fgetcsv() function.

When specifying the csvcolumn parameter, the key to each name/description array is the zero-based column number. For example, if the first (0) column of the CSV has the unique ID, the fourth (3) column has a first name, and the fifth (4) column has a last name, specify the columns like this:

    $columns = array(
      0 => array('id', 'User ID'),
      3 => array('fname', 'First Name'),
      4 => array('lname', 'Last Name'),
    );
    $this->source = new MigrateSourceCSV('/path/to/myfile.csv', $columns);

As this implies, you can skip columns that you are not importing.

For a complete example of using MigrateSourceCSV, see the migrate_example_baseball module bundled with Migrate.

Note that the map is still typically a MigrateSQLMap, and the source key uses the same schema structure Drupal uses for database column definitions with the same possible data types of 'char', 'varchar', 'int', 'float', 'numeric' ('text' and 'blob' should not be used for keys and will cause the MySQL error "Syntax error or access violation: 1170 BLOB/TEXT column used in key specification without a key length.")

Using the CSV row number as the map value

The CSV source class adds a property 'csvrownum' to each row, giving its position in the CSV file. This may be used as the value for the mapping.

Invalid data value

When importing a CSV file, you may get the message "Invalid data value given. Be sure it matches the required data type and format". This usually indicates incompatibility between the line endings in the CSV file and your system, and with PHP 5.3 or later it can be addressed by setting the PHP auto_detect_line_endings option to 1. If it's not convenient to set this in php.ini, you can set it at runtime in your migration constructor:

ini_set('auto_detect_line_endings', TRUE);

CSV with header rows

If the CSV has header rows, you pass a value for the header_rows parameter, as in the example below.

$this->source = new MigrateSourceCSV($this->path, $this->csvcolumns(), array('header_rows' => 1));

If header_rows is set, you can pass an empty array for $csvcolumns.

If the CSV has a header row and header_rows is not defined you can get Invalid values errors like "Migration failed with source plugin exception: Invalid data value given. Be sure it matches the required data type and format."

Comments

dsdeiz’s picture

A note for CSV files which have new lines is to set the option embedded_newlines. Otherwise, each line in the CSV file is considered as one row when migrate module computes the number of rows.

Arnold French

Drupal Developer/Themer
Promet Solutions Inc.
4001 N. Ravenswood Suite # 503B
Chicago, IL 60613

Drupal Web Development
Hosting

fonant’s picture

If

  • you have a CSV file with a header row that lists the column titles, and
  • the header row column titles are valid as PHP object property names, and
  • you set 'header_rows' => TRUE, and
  • you pass an empty array for the $csvcolumns argument,

then you can use the CSV header row values as source names in field mappings. This means you can re-order or change the CSV file columns later without breaking the mapping!

However, if your CSV header row column titles do not form valid PHP object property names, then you cannot use this method. Instead you have to specify the source column names to use (valid as PHP object property names) in the $csvcolumns array. The mapping is then done in the order of the CSV columns, so that you can't re-order the CSV file columns later without breaking the mapping (unless you write additional custom code to generate the mappings yourself from the CSV header row).

http://www.fonant.com - Fonant Ltd - Quality websites

fonant’s picture

Here's a bit of code I wrote that reads the first header row of the source CSV file, modifies the column/field titles to be valid by stripping non-alphanumeric characters and then CamelCasing the resulting string:


class MyCSVMigration extends Migration {

  public function __construct($arguments) {
    parent::__construct($arguments);

    $path = '/path/to/the/csv/file.csv';
    $cols = array();
    $file = fopen($path, 'r');
    $headers = fgetcsv($file);
    foreach ($headers as $index => $header) {
      if ($header == '') {
        $header = 'Column ' . ($index + 1);
      }
      $headerparts = preg_split('/[^a-zA-Z0-9]+/', $header);
      $headerparts = array_map('ucfirst', $headerparts);
      $headervar = implode('', $headerparts);
      $cols[] = array(
        $headervar,
        $header);
    }
    $options = array(
      'header_rows' => 1,
      'embedded_newlines' => TRUE);
    $fields = array();
    $this->source = new MigrateSourceCSV($path, $cols, $options, $fields);
...

So a CSV header row like:

"Item ID","Weight (kg)","Length/Width/Height (m)","Colour of exterior"

would automatically result in source variables for each column named:

  'ItemId'
  'WeightKg'
  'LengthWidthHeightM'
  'ColourOfExterior'

which can be manipulated in prepareRow($row) as member variables named:

  $row->ItemId
  $row->WeightKg
  $row->LengthWidthHeightM
  $row->ColourOfExterior

This means that the column order in the CSV file can change, so long as the column titles are not changed from the ones used in mappings.

A bit more robust than hard-coding the column definitions.

http://www.fonant.com - Fonant Ltd - Quality websites