Basics

When migrating directly from a database system which is supported by the Drupal database API, and accessible to the destination server where you're running Migrate, use the MigrateSourceSQL source class. In its simplest form, just construct a query of the source data you need and pass it to the MigrateSourceSQL constructor:

$query = db_select('example_pages', 'p')
         ->fields('p', array('pgid', 'page_title', 'page_body'));
$this->source = new MigrateSourceSQL($query);

Do not execute the query - execution will be controlled by the Migrate module.

It is important to understand that the query you use as your source must return a single row for each object to be created. While it may seem natural at first blush to join to, say, a "category" table which has multiple rows for a given content item, that will produce multiple rows in the source query - see the topic Multiple source data rows for suggestions on dealing with such situations.

The above example assumes that the example_pages table is in your default Drupal database - usually, however, your source data will be in a different database. If you have defined a connection named external_db in settings.php, your query above would look like:

$query = Database::getConnection('default', 'external_db')
         ->select('example_pages', 'p')
         ->fields('p', array('pgid', 'page_title', 'page_body'));

See Cross-database migrations for more details.

Source field overrides

On the migration info pages provided by migrate_ui, descriptions for source fields are displayed. The MigrateSourceSQL class generates the list of source fields directly from the query, and defaults their descriptions to tablename.fieldname, but you can provide more descriptive text if you prefer. You can also add source fields not actually in the query to the displayed list (for example, if you are adding data to the input row in prepareRow()). This is done by passing an array of descriptions keyed by fieldname as the second argument to the constructor. See below for an example.

Specifying a count query

To display the total number of available records in the source, MigrateSourceSQL by default uses the count query automatically generated by Drupal for the source query ($query->countQuery()). Sometimes the source query may be quite complex and the generated count query slower than it needs to be, and you may know a more efficient way to determine the count (quite often, all that's needed is a count of the base table). In cases like this, you can pass an explicit count query to use in place of the generated one.

This example, from WineWineMigration in migrate_example's wine.inc, demonstrates overriding source fields and specifying a count query:

$query = db_select('migrate_example_wine', 'w')
         ->fields('w', array('wineid', 'name', 'body', 'excerpt', 'accountid',
          'image', 'posted', 'last_changed', 'variety', 'region', 'rating'));
$query->leftJoin('migrate_example_wine_category_wine', 'cwbw',
  "w.wineid = cwbw.wineid");
$query->leftJoin('migrate_example_wine_categories', 'bw',
  "cwbw.categoryid = bw.categoryid AND bw.type = 'best_with'");
// Gives a single comma-separated list of related terms
$query->groupBy('w.wineid');
$query->addExpression('GROUP_CONCAT(bw.categoryid)', 'best_with');

$count_query = db_select('migrate_example_wine', 'w');
$count_query->addExpression('COUNT(wineid)', 'cnt');

// Better descriptions for wineid and name; best_vintages is added in prepareRow()
$source_fields = array(
  'wineid' => t('Wine ID in the old system'),
  'name' => t('The name of the wine'),
  'best_vintages' => t('What years were best for this wine?'),
);

$this->source = new MigrateSourceSQL($query, $source_fields, $count_query);

Joining (or not) to the map table

For each migration class you define, Migrate creates a map table (which keeps track of the relationships between source rows and destination objects) and a message table (which logs messages relative to the source rows triggering them). On import, the map table is checked against the rows in the source query to determine which rows need to be processed. For any other source class, this must be done as a separate operation - a source row is fetched, and then the map table is separately checked. However, MigrateSourceSQL has a more efficient option available - under the right circumstances, the map table can be joined to the source query, placing the burden of selecting rows on the database engine. Those circumstances are

  • The source query is on the same connection as the map message table (either they're both in the default Drupal database, or the map and message tables are created in the external source database), or
  • The source data is in a different database than the map and message tables, but
    • The databases are both MySQL, and
    • The databases are on the same host and port, and
    • They are accessed using the same credentials (username and password). If defining the connection in settings.php, use the same username and password in both the default and the legacy DB $databases arrays.

By default, MigrateSourceSQL assumes the join is possible. If you know that the join will not work, you can pass the map_joinable option to tell the source class to do things the hard way:

$this->source = new MigrateSourceSQL($query, array(), NULL,
  array('map_joinable' => FALSE));

Batching of query results

When a select statement is executed in Drupal's database API, all results of the query are fetched into memory. In high-volume migrations, you can easily go beyond any reasonable memory_limit. Migrate 2.6 introduced batching of query results - based on a batch_size option (defaulting to 1000 rows), only that number of rows is fetched at a time - when the batch has been processed, another batch of batch_size rows is fetched. In Migrate 2.6 the batching process is not optional (although it can effectively be defeated by setting a batch_size larger than the number of rows in your query), and proved to be problematic in some scenarios. Migrate 2.7 recognizes a batch_size of 0 to indicate batching should not be done, and also defaults the batch_size to 0. If you have a scenario where your migration queries absorb too much memory, set the batch_size to a number of rows which can be handled in memory. It's important to note that batching will set map_joinable to FALSE - joining the map table would prevent the batching from working properly.

$this->source = new MigrateSourceSQL($query, array(), NULL,
  array('batch_size' => 10000));

Comments

cameron prince’s picture

Be sure you look at Cross-database migrations. The first example above doesn't list all of the needed arguments to connect to different database, which are as follows:

$this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE));

scresante’s picture

It isn't obvious at first glance but it should be mentioned, that the migrate module squashes case to lower, so that it is not necessary to use uppercase characters in $query->fields $source_fields or $this->addFieldMappings
It is safe (and in fact fixes issues with "Source(s) not used in a mapping) to use lowercase in all these instances.
Correct me if I'm wrong, but I just spent an hour pulling my hair out before testing this and realizing this.

pdcarto’s picture

As best I can tell, MigrateSourceSQL requires the use of dynamic queries (does not permit static queries), and also as best I can tell, dynamic queries do not permit the use of field names that contain spaces and other special characters. The logical thing would be to enclose these field names in the '`' character, as one would do in a static query, but these are stripped out in the dynamic query, along with all spaces and other special characters like parentheses, with the result that the query fails against the database. I discovered this when trying to use MS Access databases for migrate sources. After converting those databases to mysql, many field names retained these spaces and other special characters.

The best solution I found was to create a mysql view (stored query) that adjusts the field names, and then use that as the migration source table.

pdcarto’s picture