I'm wondering what is the best way to update an entity reference during a two step migration process. I read about createStub() but not sure if this is what I'm looking for, or if it is the best way to handle what I need.

My first migration brings in all specimen data from a csv file, the second brings in image data from another csv file. The actual image files are not being moved, so there is not saving an actual "file." I have a generated uuid in the csv files to connect the two. The uuid for the specimen is stored in a "furi" field.
The specimen has an entity reference field to image called "standard_image". The field is used to represent that specimen. Meaning, there may be multiple images for a given specimen, and the person who added the specimen picked which image they wanted as the "main" image.

So, when migrating the images, I need to update the existing specimen node standard_image field with the new node of the Image. Should I do this some how with createStub(), and if so, how? Or, should I simply make use of the complete() method, look up the specimen node using the uuid / furi, and update accordingly?

Comments

mikeryan’s picture

Status: Active » Postponed (maintainer needs more info)

Since it's the specimens that have references to the images, the simplest thing to do would be to import the images first, then all you have to do is reference the image migration as sourceMigration in the entity reference field mapping in the specimen migration. If you must migrate the specimens first, then by implementing createStub in your image migration, running the specimen migration will create image stubs that should get filled in when the image migration runs.

rbruhn’s picture

@mikeryan - Yes, I thought about running the images first. I'm just not familiar enough with the inner workings of Migration to know if the UUIDs need to be stored. Perhaps you can answer that? For example:

An ID column in the images CSV contains the Specimen UUID. When importing Specimens, this UUID is going into a field called FURI. When images are imported, the FURI field is populated by a url where the images come from. For example: http://remotesite.com/99999.

So if I'm importing images first, do I need to store that UUID somewhere? Or does it get stored in the migration tables and I don't need to worry about mapping it?

mikeryan’s picture

If you use the UUID column in the images CSV as the source key in the MigrateSQLMap call, then your specimen migration would map the UUID it has to the entity reference field and add a ->sourceMigration('MyImageMigration') and it should all work out.

dropfen’s picture

if the image key or something is the same in both migrations you can use to make the relations between the imege Migration and the node Migration like this:

define for this in the image Migration map property eith this value:

$this->map = new MigrateSQLMap($this->machineName,
        array(
            'the_img_id' => array( // the_img_id is the column which is unique and should be also available as relation in the other csv file.
                'type' => 'int',
            )
        ),
        MigrateDestinationFile::getKeySchema()
    );

And in the other migration to get migrated Images to the Node you should define it in your FieldMapping:

$this->addFieldMapping('your_image_field', 'the_img_id')
        ->sourceMigration('YourImageMigrationClass');

should work.

rbruhn’s picture

@dropfen - I can't do the images first. I did not notice in mikeryan's first response he said, "Since it's the specimens that have references to the images..."
It't not the Specimens that have references to images, but images referencing specimens. Because the UUIDs in the Image csv have duplicates, they cannot be used as the sourceid in *_map.

I'm going to have to use createStub() and figure out how to do that.

dropfen’s picture

what is the relationship "value" between your images and the Specimens?

rbruhn’s picture

Not sure what you are asking for. The Specimens have multiple images associated with them. Specimen UUIDs in the csv are unique. The UUIDs in the Images csv point back to the Specimen they belong to.

Then, each Specimen has an entity reference field called "standard_image" which will be an image node id. In the specimen csv, this is represented by a column "standardImage" with a number value (i.e. 589037) that was an image id in the old database. The standard image is simply the default image a user assigns to the specimen.

dropfen’s picture

Oh sorry, I undestand. In this case, you could use your ImageMigration Source to update your specimens, without to run any ImageMigration.

https://drupal.org/node/1117454

rbruhn’s picture

@dropfen - I looked at that link and not sure how that is suppose to help me. The image migration also needs to create nodes themselves. Meaning, Image is also a node type and contains fields about the image. So I have to run a migration for that.

However, beyond worrying about updating the entity reference field, I'm at a point where I cannot even connect the images to the specimens I imported. I'm running into the issue of duplicates values in the UUID column from the image.csv not importing into the map table. Since I need the UUIDs to match against the Specimen UUIDs, but the id field of *_map is a primary key, no dupes are allowed.

For example:

$this->destination = new MigrateDestinationNode('bir_specimen');

// Id is the name of the UUID column in specimen.csv
$this->map = new MigrateSQLMap($this->machineName,
    array(
      'id'  => array(
      'type' => 'varchar',
      'length' => 255,
      'not null' => TRUE,
    ),
  ),
  MigrateDestinationNode::getKeySchema()
);

That sets up the UUID fields in the *_map table for specimens. I then need to migrate the images so their nodes are created, and somehow connect them using the UUID from the images.csv file.

// "Machine name" of the image node type.
$this->destination = new MigrateDestinationNode('bir_image');

// id is the UUID column in images.csv, and represents the specimen it belongs to
$this->map = new MigrateSQLMap($this->machineName,
    array(
      'id'  => array(
      'type' => 'varchar',
      'length' => 255,
      'not null' => TRUE,
    ),
  ),
  MigrateDestinationNode::getKeySchema()
);

$this->addFieldMapping('specimen_id', 'nid')->sourceMigration('Specimen')
      ->description(t('The assignment of images to the respective specimen'));

In this case, not all the images are imported due to their having duplicate UUIDs (which are really the Specimen UUIDs the images belong to). The specimen_id field above is simply an entity reference using the nid of the specimen.

I found some issues about using prepareKey() to make unique values, and I see how I can use that to get all the images migrated into the table, but then how do I tell it what image key matches what specimen key (since they would then be different) and get the specimen nid to map to specimen_id?

I guess this is a whole different issue from what I began with.

dropfen’s picture

Okey, the fact that your specimens have no relation to the images but images to the specimens makes it difficult to get this image nodes into your specimens just by running the (trivial) migration process.

There are some other maybe helpful methods u could use like handleSourceMigration() to get data from another Migration, but I'm not familiar with it, so try it out. And I hope you will get the an smart solution for this, and post your results.

But if you need just a relation between your images and specimens, you could map your specimens to your images. Maybe it's a stupid idea, but so you would have your data in your db and then just need to switch the relations by running a custom script or with views bulk operations and a little php code...

rbruhn’s picture

@dropfen - Thanks for mentioning the handleSourceMigration(). I'm able to solve the issue using that.

To sum up what I did:
First, it should be mentioned the title of this issue, my first problem with entity reference, is not what this solution fixes. Instead, I ran into the problem of migrating relational data from my csv files due to the sourceId in *_map not allowing duplicates.

specimen.csv contained unique UUIDs in column Id.
images.csv contained the specimen UUIDs they belonged to. There are duplicate values in the Id column because more than one image can be associated with a specimen.

To handle this, I created a unique key for the images using prepareKey()

public function prepareKey($source_key, $row) {
  $key = array();
  $row->id_key = $row->id . '_' . $row->identifier;
  $key['id_key'] = $row->id_key;

  return $key;
}

id_key is what is stored in the sourceId of *_map table.

I then added a custom field source to my csv columns.

34 => array('specimen_id', t('Custom source field')),

Then added a mapping for the custom field to my entity reference field (holds the specimen nid image is associated with).

$this->addFieldMapping('specimen_id', 'specimen_id');

In prepareRow(), I used handleSourceMigration() to retrieve the destination node id for the specimen using the UUID (id column) from the image.csv file. I also discovered whoever dumped the csv files did so without checking for matching UUIDs. So I check if a corresponding specimen even exists, and if not, create a message for it and skip it.

// Check if specimen uuid exists and skip if false.
$row->specimen_id = $this->handleSourceMigration('MorphbankSpecimen',$row->id);
if (empty($row->specimen_id)) {
  $message = "No corresponding Specimen UUID for {$row->id}.";
  $this->queueMessage($message, 3);
  return FALSE;
}
dropfen’s picture

So, you got this:

Image1
-- specimen_id = Spec1_UUID
Image2
-- specimen_id = Spec1_UUID
Image3
-- specimen_id = Spec2_UUID

???

If it is, are you sure you need the custom map key and the handleSourceMigration method? I think the only thing you need is just to add the FieldMapping for the specimen_id got from id.

In your ImageMigration Class:

// specimen_id is your entityreference field, and id is the UUID (from the specimen)
$this->addFieldMigration('specimen_id', 'id')
->sourceMigration('MorphbankSpecimen');

If you got this structure:

Specimen1
-- images (Image1, Image2)
Specimen2
-- images (Image3)

It's great and I didn't understand the trick how the code works, really :)

rbruhn’s picture

@dropfen - In doing something like this....

// specimen_id is your entityreference field, and id is the UUID (from the specimen)
$this->addFieldMigration('specimen_id', 'id')
  ->sourceMigration('MorphbankSpecimen');

... you would need the connection between the Specimens and Images. Migration treats those _map tables as one to one relationship. Since I had duplicate UUIDs for the Images to connect them to the Specimens, it required making the unique key for the Image _map table. The only way to connect the two, was using the handleSourceMigration method.

In other words, using the above fieldMigration you posted, the code would see the specimen_id equals the id of the specimen from the previous specimen migration.... but not know what specimen to use.

-----

If I had imported the images first, I would have had to still create a unique key for the _map table due to duplicates, and then create stubs for the specimens so had the node ids to store in the specimen_id. I would also have to create a field to store the specimen UUIDs in the Image csv. During the specimen migration, I would then have to use the UUID and specimen_id to make it all fit together.

It was easier simply to import Specimens first. Hope that explains it.

pifagor’s picture

Issue summary: View changes
Status: Postponed (maintainer needs more info) » Closed (outdated)