To import content from XML feeds, there are three primary approaches you can use with the classes provided by Migrate: MigrateSourceList, MigrateSourceMultiItems, and MigrateSourceXML.

Using MigrateSourceList

If you are using an XML-based web service where you query one service URL to obtain a list of IDs to be migrated, and another one which, when with an ID from the first service as a parameter, returns the data for that specific object, you should use MigrateSourceList.

From migrate_example's wine.inc, consider this source definition:

    // $xml_folder could be a local file directory, or (more usually) a URL for a web service
    $list_url = $xml_folder . 'index.xml';
    // Each ID retrieved from the list URL will be plugged into :id in the
    // item URL to fetch the specific objects.
    $item_url = $xml_folder . ':id.xml';

    // We use the MigrateSourceList class for any source where we obtain the list
    // of IDs to process separately from the data for each item. The listing
    // and item are represented by separate classes, so for example we could
    // replace the XML listing with a file directory listing, or the XML item
    // with a JSON item.
    $this->source = new MigrateSourceList(new MigrateListXML($list_url),
      new MigrateItemXML($item_url));

Using MigrateSourceMultiItems

If you have a single URL which contains all items to be migrated, without a separate source of the IDs, use MigrateSourceMultiItems. Here is another example from wine.inc:

    $items_url = $xml_folder . 'positions.xml';
    $item_xpath = '/positions/position';  // relative to document
    $item_ID_xpath = 'sourceid';         // relative to item_xpath and gets assembled
                                         // into full path /positions/position/sourceid

    $items_class = new MigrateItemsXML($items_url, $item_xpath, $item_ID_xpath);
    $this->source = new MigrateSourceMultiItems($items_class, $fields);

MigrateSourceXML

The previous approaches use the PHP SimpleXML extension. This extension is, as the name implies, simple to use, but it depends on reading and parsing the entire source XML file into memory. Sometimes you may need to deal with a very large XML file - for example, WordPress exports all blog content into a single XML file, which can be very large for a busy site - and even if you have enough memory (and can set the PHP memory_limit high enough) to hold it, it's very slow.

For these situations, we provide MigrateSourceXML - this class is based on the PHP XMLReader extension, which reads and parses the XML file incrementally so there is never a large memory impact at any given time. As usual, we offer some sample code from wine.inc:

    $xml_folder = DRUPAL_ROOT . '/' . drupal_get_path('module', 'migrate_example') . '/xml/';
    $items_url = $xml_folder . 'producers2.xml';
    $item_xpath = '/producers/producer';  // relative to document
    $item_ID_xpath = 'sourceid';          // relative to item_xpath
    $this->source = new MigrateSourceXML($items_url, $item_xpath, $item_ID_xpath,
      $fields);

Yes, this is very similar to MigrateSourceMultiItems. So, if MigrateSourceXML provides performance benefits over MigrateSourceMultiItems, why use MigrateSourceMultiItems? The answer is that the SimpleXML-based approach provides more flexibility - because the entire XML file is pre-parsed and in memory, the full xpath syntax is available. MigrateSourceXML does not have the whole XML structure at its disposal, so cannot apply xpaths globally - it pulls one XML element (identified by $item_xpath) at a time, and xpaths local to that element will work, but the $item_xpath itself supports only a limited syntax (basically a simple fully-qualified path to the elements you want), and xpaths on field mappings (see below) cannot reach outside of the current element (such as to reference fields in the parent element).

In all cases

Whichever approach you took above for defining your source, you need to derive your base migration class from XMLMigration rather than Migration. Doing so allows you to specify xpaths for each field you are migrating:

    $this->addFieldMapping('title', 'name')
         ->xpath('name');

What this mapping definition tells migrate is to apply the xpath /producers/producer/name (where the $item_xpath is /producers/producer) to the source XML, and put the value at that xpath into the source row's "name" field. Then, when the mappings are processed, the source row's "name" field goes into the destination node's "title" field.

Comments

cdesautels’s picture

How do you pass content inside a cdata wrapper?

marco-s’s picture

To complete this question: I've overridden the xml() method of the MigrateItemsXML class to add the 'LIBXML_NOCDATA' option to the simplexml_load_file() function.

class MyMigration extends XMLMigration {
	//...
	$items_class = new MigrateItemsXMLCustom($items_url, $item_xpath, $item_ID_xpath);
	$this->source = new MigrateSourceMultiItems($items_class);
	//...
}
class MigrateItemsXMLCustom extends MigrateItemsXML {
  public function &xml() {
    if (!empty($this->currentUrl)) {
      // We have to add the 'LIBXML_NOCDATA' option, because of the existing CDATA values.
      $this->currentXml = simplexml_load_file($this->currentUrl, 'SimpleXMLElement', LIBXML_NOCDATA);
      if ($this->currentXml === FALSE) {
        Migration::displayMessage(t(
          'Loading of !currentUrl failed:',
          array('!currentUrl' => $this->currentUrl)
        ));
        foreach (libxml_get_errors() as $error) {
          Migration::displayMessage(self::parseLibXMLError($error));
        }
      }
      else {
        $this->registerNamespaces($this->currentXml);
      }
    }
    return $this->currentXml;
  }
}
sgarsot’s picture

I am working on a XML migration, using Migrate module (and implementing a class extends XMLMigration ). The XML document I'm working with is extremely large, and given a (large) context I want to go back until the root element, just to get the numpages. The context is something like:

$item_xpath = '/feed/entry/relationship/related/object';

I am using the following code in order to get the num pages:

$this->addFieldMapping('field_pages', 'pages')->xpath('/feed/pagination/@numpages')

But it doesn't work. I have also tried to use ../../../.. but I get the same result ( a NULL value)

Thanks