I have been studying the classes within migrate_d2d to implement a migration. At the moment, I am terribly confused with the DrupalMigration::newOnly property.

This is the property in the class along with the documentation:

abstract class DrupalMigration extends Migration {
  ...

  /**
   * Set to TRUE to suppress highwater marks or track_changes (i.e., to only
   * import new items, skipping updated items, on subsequent imports).
   *
   * @var bool
   */
  protected $newOnly = FALSE;

This is how it is used:

class DrupalFile6Migration extends DrupalFileMigration {
  ...
  public function __construct(array $arguments) {
    parent::__construct($arguments);

    if (!$this->newOnly) {
      $this->highwaterField = array(
        'name' => 'timestamp',
        'alias' => 'f',
        'type' => 'int',
      );
    }
  ..

The variable name and behaviour is confusing me. My understanding is that highwater marks would only import changed items, in other words, new or changed items. So, if I only want to import new (or updated) records, I would define highwaterField. Is that correct?

Now, the property name newOnly tells me that if I set it to TRUE, only new records will be imported. However, the behaviour in code would define a highwaterField only if newOnly is FALSE, which means that if newOnly is FALSE, only new records are imported. I am not able to understand this.

I searched the documentation and blog posts but couldn't find anything concrete on newOnly. Can anyone reason the name and/or behaviour for this variable?

Lastly, thank you for the module. Awesome work!

Comments

hussainweb’s picture

Issue summary: View changes
hussainweb’s picture

Issue summary: View changes
mikeryan’s picture

Status: Active » Postponed (maintainer needs more info)

By default, with newOnly not set, the highwater field is defined. With the highwater field defined, the migration will import all source records whose highwater field is higher than the last saved highwater mark - i.e., all records that have been added, as well as all those that have been changed.

When newOnly is set, then the highwater field is not defined. The migration then behaves in the default manner - any incoming records previously-imported (i.e., those that are recorded in the map table) are skipped, and only new records (those not in the map table) are imported. Without the highwater field defined, the migration does not recognize changed records, so does not import them.

Does this clarify things?

wonder95’s picture

With the highwater field defined, the migration will import all source records whose highwater field is higher than the last saved highwater mark - i.e., all records that have been added, as well as all those that have been changed.

My understanding was that you had to specify if you wanted existing records to be updated. The drush command has the --update option, and the UI has the "Update" checkbox, but what you said here and on the highwater marks docs page say otherwise.

How do these options I mentioned above work in concert with $newOnly? newOnly is only set in the constructor, so it's doesn't look like it can be set on the fly. Do you have to know up front if a particular migration will only migrate new data or not?

Thanks.

mikeryan’s picture

Status: Postponed (maintainer needs more info) » Active

When the issue status is "Postponed (maintainer needs more info)" and you're providing more info, please set the status to "Active" - postponed issues are at the bottom of the priority list for review.

mikeryan’s picture

Status: Active » Postponed (maintainer needs more info)

With the highwater mark set, all source records with updated timestamps since the last import will be imported - those added since then are imported for the first time, and any that were previously imported then changed on the source side are updated. Using --update with highwater marks is problematic: #2379289: migrate-import --update does not seem to work as expected, if map is not joinable, due to highwater field?.

What newOnly does is simply disable the highwater mark, so migration behaves in the normal way - it will import only new content (it will not automatically update items that were changed on the source side, as highwater marks would do). If you add --update when newOnly is set, it works as it does normally - it sets every previously-imported item's needs_update to MigrateMap::STATUS_NEED_UPDATE, so they all get reimported (not just changed items) in addition to any new items being imported. This feature was added in #1956834: Remove highwater fields or make them configurable for the use case when someone did not want changed records on the source to be re-imported - in hindsight, maybe it would have been better to call it suppressHighwater or something like that...

Does that address your question?

badrange’s picture

I am not changing issue status since I did not ask the original question, but I can say that for me the explanation is helpful, and it would be highly appreciated if you took the time to add this clarification to the docs.

It has been a while since I did my first and so far only migration, but I remember misunderstanding what the --update flag actually did which made the migration less smooth than it could have been. I also didn't understand the actual consequences of the newOnly flag. I was using the migrate_d2d module, which of course uses highwater marks to the fullest.

For a migrate-newbie it could be helpful with a sentence stating something like: "If you have changed the migration code, and need to update all content in the target system to reflect your changes, use --update to force a remigration of everything".