MigrateSourceSQL
Basics
When migrating directly from a database system which is supported by the Drupal database API, and accessible to the destination server where you're running Migrate, use the MigrateSourceSQL source class. In its simplest form, just construct a query of the source data you need and pass it to the MigrateSourceSQL constructor:
$query = db_select('example_pages', 'p')
->fields('p', array('pgid', 'page_title', 'page_body'));
$this->source = new MigrateSourceSQL($query);
Do not execute the query - execution will be controlled by the Migrate module.
It is important to understand that the query you use as your source must return a single row for each object to be created. While it may seem natural at first blush to join to, say, a "category" table which has multiple rows for a given content item, that will produce multiple rows in the source query - see the topic Multiple source data rows for suggestions on dealing with such situations.
The above example assumes that the example_pages table is in your default Drupal database - usually, however, your source data will be in a different database. If you have defined a connection named external_db in settings.php, your query above would look like:
$query = Database::getConnection('default', 'external_db')
->select('example_pages', 'p')
->fields('p', array('pgid', 'page_title', 'page_body'));
See Cross-database migrations for more details.
Source field overrides
On the migration info pages provided by migrate_ui, descriptions for source fields are displayed. The MigrateSourceSQL class generates the list of source fields directly from the query, and defaults their descriptions to tablename.fieldname, but you can provide more descriptive text if you prefer. You can also add source fields not actually in the query to the displayed list (for example, if you are adding data to the input row in prepareRow()). This is done by passing an array of descriptions keyed by fieldname as the second argument to the constructor. See below for an example.
Specifying a count query
To display the total number of available records in the source, MigrateSourceSQL by default uses the count query automatically generated by Drupal for the source query ($query->countQuery()). Sometimes the source query may be quite complex and the generated count query slower than it needs to be, and you may know a more efficient way to determine the count (quite often, all that's needed is a count of the base table). In cases like this, you can pass an explicit count query to use in place of the generated one.
This example, from WineWineMigration in migrate_example's wine.inc, demonstrates overriding source fields and specifying a count query:
$query = db_select('migrate_example_wine', 'w')
->fields('w', array('wineid', 'name', 'body', 'excerpt', 'accountid',
'image', 'posted', 'last_changed', 'variety', 'region', 'rating'));
$query->leftJoin('migrate_example_wine_category_wine', 'cwbw',
"w.wineid = cwbw.wineid");
$query->leftJoin('migrate_example_wine_categories', 'bw',
"cwbw.categoryid = bw.categoryid AND bw.type = 'best_with'");
// Gives a single comma-separated list of related terms
$query->groupBy('w.wineid');
$query->addExpression('GROUP_CONCAT(bw.categoryid)', 'best_with');
$count_query = db_select('migrate_example_wine', 'w');
$count_query->addExpression('COUNT(wineid)', 'cnt');
// Better descriptions for wineid and name; best_vintages is added in prepareRow()
$source_fields = array(
'wineid' => t('Wine ID in the old system'),
'name' => t('The name of the wine'),
'best_vintages' => t('What years were best for this wine?'),
);
$this->source = new MigrateSourceSQL($query, $source_fields, $count_query);
Joining (or not) to the map table
For each migration class you define, Migrate creates a map table (which keeps track of the relationships between source rows and destination objects) and a message table (which logs messages relative to the source rows triggering them). On import, the map table is checked against the rows in the source query to determine which rows need to be processed. For any other source class, this must be done as a separate operation - a source row is fetched, and then the map table is separately checked. However, MigrateSourceSQL has a more efficient option available - under the right circumstances, the map table can be joined to the source query, placing the burden of selecting rows on the database engine. Those circumstances are
- The source query is on the same connection as the map message table (either they're both in the default Drupal database, or the map and message tables are created in the external source database), or
- The source data is in a different database than the map and message tables, but
- The databases are both MySQL, and
- The databases are on the same host and port, and
- They are accessed using the same credentials (username and password). If defining the connection in settings.php, use the same
usernameandpasswordin both the default and the legacy DB$databasesarrays.
By default, MigrateSourceSQL assumes the join is possible. If you know that the join will not work, you can pass the map_joinable option to tell the source class to do things the hard way:
$this->source = new MigrateSourceSQL($query, array(), NULL,
array('map_joinable' => FALSE));
Batching of query results
When a select statement is executed in Drupal's database API, all results of the query are fetched into memory. In high-volume migrations, you can easily go beyond any reasonable memory_limit. Migrate 2.6 introduced batching of query results - based on a batch_size option (defaulting to 1000 rows), only that number of rows is fetched at a time - when the batch has been processed, another batch of batch_size rows is fetched. In Migrate 2.6 the batching process is not optional (although it can effectively be defeated by setting a batch_size larger than the number of rows in your query), and proved to be problematic in some scenarios. Migrate 2.7 recognizes a batch_size of 0 to indicate batching should not be done, and also defaults the batch_size to 0. If you have a scenario where your migration queries absorb too much memory, set the batch_size to a number of rows which can be handled in memory. It's important to note that batching will set map_joinable to FALSE - joining the map table would prevent the batching from working properly.
$this->source = new MigrateSourceSQL($query, array(), NULL,
array('batch_size' => 10000));
Help improve this page
You can:
- Log in, click Edit, and edit this page
- Log in, click Discuss, update the Page status value, and suggest an improvement
- Log in and create a Documentation issue with your suggestion