Here's a simplified scenario of my case:

  • Source: CSV. Sample row: "Peter", "Pan", 1330855930
  • Destination: Node entity

The source column definitions:

array(
  0 => array('firstname', 'First name'),
  1 => array('lastname', 'Last name'),
  2 => array('updated', 'Last updated'),
);

Added this hightwater mark:

$this->highwaterField = array(
  'name' => 'updated',
  'type' => 'int',
);

Now... I'm expecting that, on the second import, Migrate will bypass all records that were previously imported and not changed. I didn't touch the CSV file between the imports. I see that {migrate_status}.highwater is filled with the greatest "updated" column from source.

I used both:

$ drush migrate-import MyMigrate
$ drush migrate-import MyMigrate --update

Both are importing again, everything. My intention is to keep my Drupal in-sync with CSV so it's not a one-time migration. So, performance is critical... I need only fresh data (new and updated records) to come in.

Comments

claudiu.cristea’s picture

It seems that Migrate is using the un-prepared value of timestamp when deciding to parse the row. In my case I implemented a prepareRow() method to transform CSV milliseconds timestamps to regular UNIX timestamp.

  public function prepareRow($row) {
    // Convert milliseconds timestamp to UNIX timestamp.
    if (isset($row->updated)) {
      $row->updated = floor($row->updated / 1000);
    }
  }

In includes/source.inc we have:

      // 5. So, we are using highwater marks. Take the row if its highwater field
      //    value is greater than the saved marked, otherwise skip it.
      elseif ($row->{$this->highwaterField['name']} > $this->activeMigration->getHighwater()) {
        // Fall through
      }

So...

  • $row->{$this->highwaterField['name']} will be in milliseconds (unprepared)
  • $this->activeMigration->getHighwater() will be in seconds, it was prepared before stored.

I think the above elseif statement must take the prepared $row->{$this->highwaterField['name']}

claudiu.cristea’s picture

Title: Highwater & CSV not working » Row highwater field is checked unprepared
Assigned: Unassigned » claudiu.cristea
Status: Active » Needs review
StatusFileSize
new1.71 KB

Here's a patch. I admit that is not so nice cloning the $row object but I didn't wanted to messup all the logic there by preparing the row in an early stage. This works for me.

mikeryan’s picture

Priority: Major » Normal
Status: Needs review » Needs work

No, there's got to be a better way. Doubling the calls to prepareRow() for the sake of an edge case is not acceptable. Better would be to find a way to insert a call to prepareRow() just before testing the highwater mark, and make sure it doesn't get called again below if it was called here...

mikeryan’s picture

Status: Needs work » Needs review
StatusFileSize
new2.56 KB

The attached patch is what I was talking about - does it address the issue for you?

Thanks.

claudiu.cristea’s picture

Status: Needs review » Reviewed & tested by the community

Works as expected with #4. I had to apply it manually.. the branch is ahead.

Thank you!

mikeryan’s picture

Assigned: claudiu.cristea » mikeryan
Status: Reviewed & tested by the community » Fixed

Committed for D6 and D7, thanks!

claudiu.cristea’s picture

Status: Fixed » Needs work

Checked twice... Now existing, untouched records are updated every import :(

Tested and discovered that every time this condition is satisfied:

      // 2. If the row is not in the map (we have never tried to import it before),
      //    we always want to try it.
      elseif (!isset($row->migrate_map_sourceid1)) {
        // Fall through
      }

This mean that the existing rows doesn't have the migrate_map_sourceid1 property set at this moment. Not sure that this is coming from your patch.

mikeryan’s picture

Status: Needs work » Fixed

The line comes from #1529362: Migrate not respecting existing map statuses. I don't see why a previously-imported record would not have migrate_map_sourceid1 set...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.