Problem/Motivation
Entities created through the UI are validated (required fields confirmed to be present, field values confirmed to be in proper ranges, etc.) through form validation - the entity API (save()) does not do validation, so if migrated data is to be validated it needs to be done explicitly. As benjy says:
I think we've discussed this elsewhere but no, entity->save() doesn't cause validation, you have to do $entity->validate() for that which opens up a can of worms with migrate.
Still, no reason not to permit those who choose to do so to open that can...
Proposed resolution
Add a "validate" option to entity destinations - when true, call $entity->validate().
destination:
plugin: entity:node
validate: true
Remaining tasks
- Write the code (should take about 2 minutes).
- Write tests (slightly longer).
User interface changes
N/A
API changes
Adds a configuration option to entity destinations.
Data model changes
None.
Release notes snippet
The ability to validate content entities during a migration is available. On the destination just set validation: true
. See https://www.drupal.org/node/3073707 for more info.
Comment | File | Size | Author |
---|---|---|---|
#81 | 2745797-81.patch | 14.3 KB | BR0kEN |
#72 | 2745797-72--8.6.x.patch | 12.14 KB | BR0kEN |
Comments
Comment #3
mikeryanComment #4
harings_rob CreditAttribution: harings_rob commentedI am working on this.
Comment #5
harings_rob CreditAttribution: harings_rob commentedHi Mikeryan,
Attached is a "pre patch", could you let me know if this is the correct approach?
Comment #6
dawehnerYou also have to do something with the returned data of the validation.
Comment #7
heddnre #4, that is the right direction. But you need to throw a row exception if the validation fails.
Comment #8
mikeryanAs dawehner points out, we need to handle failed validations. I'm thinking probably the best thing to do if the returned violation list is non-empty would be to throw an exception - we'll need a new exception class, MigrateEntityViolationException or somesuch, with the violation list as an argument. MigrateExecutable would catch this and save any messages found to the migration message table via $this->saveMessage(). See ContentEntityForm::flagViolations() to see how to iterate the violation list. Actually, you could basically copy this to MigrateExecutable, replace the setErrorByName call with saveMessage(), and remove the getFormDisplay() call.
TRUE and FALSE should be all-caps.
Comment #9
harings_rob CreditAttribution: harings_rob commentedAttached is the full patch including the tests.
I am not sure about the exception class as I have never done this before.
Please let me know where I can improve.
Regards,
Comment #10
harings_rob CreditAttribution: harings_rob commentedComment #12
mikeryanThe meat of it looks good, some cleanup and rearrangement suggestions below. I've tested locally, and it works well in practice.
I'm not sure why RequirementsException (which I presume you were looking at) is under an Exception directory - other migrate exceptions, and exceptions defined in other core modules, are directly in src, so I would move this up to the Drupal\migrate namespace.
I'm going back and forth on whether this should extend MigrateException as is done here (so it will get caught without any other changes), or should extend \Exception and MigrateExecutable should explicitly catch this exception. I'm leaning towards keeping it this way, but I can be easily swayed if others disagree...
But, since we're never going to pass these parameters, there's no reason to have them - we can let the MigrateException constructor default them.
Shouldn't that be $violations->getEntityViolations() as it is in ContentEntityForm::flagViolations()? Well, I guess the fact that it's \Traversable means it will work... I'll see when I test locally. Later - yes, works just fine for reals, so this is OK.
True, there's no point to this parameter, but fixing that is out of scope for this issue.
Similarly, coding standard fixes are out of scope (I believe there's an existing core issue to update to short array throughout core via automation).
s/True/TRUE/, s/false/FALSE/
I'm prepared to RTBC with the above issues addressed, thanks!
Comment #13
dawehnerJust a quick comment. We are kind of trying to not make the mistake again to define new exceptions inside the main root namespace. Collection exceptions in its own namespace IMHO removes clutter for example.
This is loosing some information, for example which violation message applied to which field of an entity. Maybe we should extract that information as well, or at least store the entire list of violations?
Comment #14
heddnComment #15
heddnHere's the result of some testing. We need to handle this a little more gracefully, or no one is going to change the validate flag. If I disable validate, then I get zero errors/warnings. If I enable it, dies right away and I have to run it with '--no-halt-on-error'.
Comment #17
iMiksuLets try to investigate why tests failed.
Comment #18
biguzis CreditAttribution: biguzis at Wunder commentedComment #19
biguzis CreditAttribution: biguzis at Wunder commentedIn MigrateEntityViolationException.php changed line #26 from
$messages[] = $violation->getRoot() . ' : ' . $violation->getPropertyPath() . ' : ' . $violation->getMessage();
to$messages[] = $violation->getMessage();
Comment #20
biguzis CreditAttribution: biguzis at Wunder commentedComment #21
heddnCan we get a real simple manual migration where we turn on validation? In my example in #15, I was migrating from D6 so there was very likely some bad data. It would be nice to see a manual test to see how validation works in real life. What errors get thrown, etc.
And we also should add some tests. And an interdiff on #19 is always good.
Comment #22
biguzis CreditAttribution: biguzis at Wunder commentedInterdiff for #19
Comment #23
mikeryanManually tested, hacking migrate_example (setting field_migrate_example_country to required and removing the country data for one row), I get the following in the message table:
So, that brings us back to dawehner's earlier comment:
Unfortunately, getEntityViolations() (as well as the iterator over $violations) does not return field-specific information. It looks like to be able to include field names in the messages, we need to iterate over getFieldNames(), calling getByField() for each name (see EntityConstraintViolationListInterface).
Comment #24
heddnAt this point, probably not as novice. Removing tag. @biguzis, feel free to continue working on it though. Don't get scared away.
Comment #25
heddnRe-titling to reflect this only support content entities, not config.
Comment #26
joachim CreditAttribution: joachim as a volunteer commented> you have to do $entity->validate() for that which opens up a can of worms with migrate.
Could someone explain more what the can of worms is?
If you don't validate, then either:
- a) your migration will crash, due to data being too long / wrong format / not in the allowed values. That's the same overall effect as validating, in that you have to deal with the problem before you can complete the migration, but not as nice DX.
- b) you migration runs, but you have created bad data in your site. So you've just postponed the problem of cleaning it up.
I don't see why this needs to be an option -- or at the very least, validation should be enabled by default.
Comment #27
joachim CreditAttribution: joachim commentedI don't think that's quite right. With this, ANY value in the 'validation' property will cause validation. So these all would:
The new configuration item should be documented at the top of the class too. (The others aren't, but there's an issue to document process plugins, so destination plugins should start documenting their options too.)
I tried this patch with my own Drupal6-Drupal migration. As it happens, I have a custom process plugin which populates a boolean field.
So I tried setting that to return something that would break it: 'cake'.
This happened:
Breaking that line apart to see which method on the exception returns something non-stringy, it's:
I'm going to add: I really really think this should be classed as a bug, not a feature. As I've just been reminded by intentionally crashing my migration, when this happens you have to go through the steps of resetting it and possibly rolling it back. That's not a graceful failure.
Comment #28
heddnRe #26: "Can of worms", I think stem from the experience of running several D6/D7 => D8 migrations for clients. I always do spot checking of data. Even going as far as looking at the last 10-20 most recent posts of a various pieces of content. Issues arise when there is a malformed malformed D6 piece of data that didn't work in D6, but no one knew (or cared). In many cases, D6 doesn't have the same level of data validation that D8 does. And typically we are dealing with very old, legacy link data or youtube videos. Most of my clients don't care or have the budget to fix all the data.
If there is a DB level problem where the data get's truncated or won't fit into the DB, those still get logged as errors and should get fixed. But if the format for the youtu.be link is spelled youtub.e, that wouldn't get fixed and in many cases it is easier to fix in the new system of D8, where validation will throw errors when re-saving the content. But most clients don't have the money, time, resources to fix up their broken data. So it just stays broken.
BTW, I'm all for adding it as an option. But I'm not sure how often I'll turn it on.
Comment #30
renaudcuny CreditAttribution: renaudcuny at OSCE: Organization for Security and Co-operation in Europe commentedRe-rolling patch for 8.3.x.
Comment #31
renaudcuny CreditAttribution: renaudcuny at OSCE: Organization for Security and Co-operation in Europe commentedComment #32
mikeryanComment #33
mikeryanMy comments in #2745797-23: Add option to content entity destinations for validation and joachim's in #2745797-27: Add option to content entity destinations for validation still need to be addressed.
Comment #35
piggito CreditAttribution: piggito as a volunteer and at Skilld commented/modules/migrate/src/Plugin/migrate/destination/EntityRevision.php
overrides save function so I think we should call entity validation outside of save function in order to target EntityRevisions (and EntityReferenceRevisions) aswell.I'm providing a patch just applying this change over the patch in #30.
Comment #36
andypostComment #37
heddnShouldn't we have some type of interface changes for needsValidation and validateEntity on MigrateDestinationInterface. But we cannot do that without breaking BC. But then we cannot be sure that all things are validable. Maybe that's OK. And we only care about validating content entities.
Let's not call this entity in the name. Not all things migrated are destined for an entity. It is more re-usable if the exception is more generic.
A lot of times, we just check if isset. And return that value. if it is set, then we assume the configuration was desired.
Comment #38
quietone CreditAttribution: quietone as a volunteer commentedA bit of a nit. But all, or nearly all, of the migration integration tests do this in the setup() method. Can this be moved into setup?
Comment #39
piggito CreditAttribution: piggito as a volunteer and at Skilld commentedRerolling patch and applying changes suggested in #37 and #38
Comment #41
quietone CreditAttribution: quietone as a volunteer commentedWas looking at the documentation and found a few things.
Let's have something more meaningful here. Maybe 'Thrown when migrated content fails validation' or some such.
Let's be more descriptive, something like 'Gets the validation configuration key.'?
Since this is adding a new configuration key it will need documenting. This should be done after #2862662: Add documentation to EntityContentBase destination plugin is in.
This is testing the validation of an entity and that exceptions are thrown, not the EntityContentBase destination, which is tested in MigrateEntityContentBaseTest. It needs changing.
This block is setup, is there a reason it can't be in the setup() method?
Comment #42
jofitz CreditAttribution: jofitz at ComputerMinds commented1-5: done.
Comment #44
heddnThis is shaping up nicely.
Let do call this with Entity in the name. It only works with ECB and entities.
Let's add a MigrateValidatableInterface that has needsValidation and validate on it. Then implement that in ECB. That way we can easily add validation to other destinations or sources or anything as we see fit. Then in ECB, we can just call validate/needsValidation because it is an instanceof MigrateValidatableInterface.
Can we show a code example below where we describe what happens with when validate is flagged true?
Comment #45
joachim CreditAttribution: joachim commentedI've just spotted what I think is a pretty big inconsistency.
For migrations, we don't validate entities at all, and we're only considering adding an option.
However, EntityContentBase::processStubRow() checks all fields for whether they are required:
which is effectively doing a large part of entity validation!
Comment #46
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedNice patch, thanks. However the error reporting can sometimes be a bit unhelpful...
1)
The validation fails as I assigned a string into a bool. The error should report which field is the problem.
EDIT: Also it should report which row/entity is the problem in a way that the end user can understand - source_ids_hash is not so easy to deal with. As another example, a message such as "The username contains an illegal character." without indication of which user is not so easy to resolve.
2) The validation fails when static_map doesn't match. There is no error, just a skipped row. There is no entry in the messages to indicate what the cause was.
Comment #49
BR0kENComment #50
heddnThese changes should be reverted.
I don't think we want to do this here. Config entities extend from here and shouldn't be validated. They already do their own validation internally. Rather add this interface to ECB.
Maybe name this
isValidationEnabled
?Suggestion, "Run entity and field validation before saving an entity."
Can we add some comments why we need the asserts? This pattern isn't very common yet, so some rational in a comment could assist.
As part of this, let's also pass the full entity to the exception too and list the entity id in the message. This is to respond to #46
Comment #51
BR0kEN#50.1 do you think this should go in a separate issue since that documentation isn't related to the class it's added to?
Comment #52
BR0kEN#50.2, in my opinion, it's better to have this for all type of entities because custom destination plugin is able to provide an additional, migration-related level of validations easily.
Comment #53
heddnre #51: If you feel we need to remove those comments, then open another issue. It doesn't belong in this one and is just extra noise.
re #52. I see your point, but most people don't create custom destinations. They are rare and usually an indication of code smell. 98% of the time the ones provided by core are enough. And even when someone does find a reason to create one, custom content destinations extend from ECB. If they don't, they should.
Comment #54
BR0kEN#53.1 agree.
#53.2, yeah but I also meant configuration entities. Everyone who'll extend `EntityConfigBase` (btw, `ECB` abbr valid for both `EntityConfigBase` and `EntityContentBase`) is also have the ability to provide a validation. I'd prefer to be consistent and handle all entity types from core.
Comment #55
BR0kENComment #56
heddnI'm still not fully convinced that we need to add validation for config entities. They already do their own validation on save from my experience. So adding the validation on the entitycontentbase should be be enough.
Comment #57
BR0kENIn most scenarios that method will simply be executed with no payload during the import of configurations. I agree that mostly it shouldn't be needed.
My points are:
- we implement a consistent solution for both config and content entity types;
- we leave an ability for an overridden/custom destination plugin to implement their own validation.
Let's leave this in NR now, waiting for other opinions. I'm not against restricting this to the content entity only, so will do this once someone else say it's not a good idea to provide this solution for config destination too.
Comment #58
BR0kENI rethink a bit the need of having this for config entities.
Comment #59
BR0kENOoops.
Comment #61
heddnThis is really shaping up nicely. Just a couple nits.
Can we use
!empty
? That way if someone sets to false, this won't trigger.It would be cleaner to use
assertArrayNotHasKey
orassertCount(2, $this->messages)
Comment #62
BR0kENComment #63
heddnCosmetic only changes in #62. They'll hopefully come back green. All feedback addressed. We have test coverage. I think we're good to go.
Comment #64
BR0kENThe same patch for 8.6.x.
Comment #65
BR0kEN[node ]: This value should not be null.
have to contain a field/property name to which it's assigned.[node ]
to[node]
.$entity->isValidationRequired()
isTRUE
even if$this->isEntityValidationEnabled()
isFALSE
.Comment #66
BR0kENComment #67
BR0kENComment #70
heddnHow about MESSAGES_SEPARATOR?
This is a change from current levels of validation. I'm not sure we want to do this. It could have backwards compatibility implications.
Comment #71
BR0kEN#70.1 makes sense. Why not?
#70.2 the
ContentEntityBase::preSave()
will throw an exception if an entity has to be validated. So I assume having this condition we simplify the process of configuring the migrations.Comment #72
BR0kENComment #73
BR0kENComment #74
rosinegrean CreditAttribution: rosinegrean at PitechPlus commentedComment #75
BR0kENComment #76
heddnI think we want to wrap the entity check
isValidationRequired
intoisEntityValidationRequired
. And log an error or warning if the entity doesn't support validation but someone has configured validation.Comment #77
mikelutzSetting to NW per #76
Comment #78
BR0kENThe
Drupal\Core\Entity\ContentEntityInterface
cannot to not have the validation since it implements theDrupal\Core\Entity\FieldableEntityInterface
that defines thevalidate()
method.Comment #79
BR0kENComment #80
heddnAlmost there.
Nit: this seems a little odd wording. Could we phrase it:
The entity to check for required validation.
Comment #81
BR0kENComment #82
heddnThis has gone through extensive reviews at this point. We've got tests and I think we are covered. The only odd thing about this patch is the use of asserts. We don't typically see them, but that isn't up to me to decide. So bumping over to RTBC and we'll see what additional feedback we might receive.
Comment #83
BR0kENEnsures each violation is of a correct type.
Needed because methods below expect a content entity.
Also, using
assert()
you replace the PHPDoc comment for IDE autocompletion.Generally saying, use of asserts helps to make sure everything goes as expected during development. For production-like environments the code enclosed in
assert()
is simply invisible for interpreter.https://www.drupal.org/docs/8/api/runtime-assertions
Comment #85
BR0kENComment #86
alexpottDiscussed on the cross initiative call around whether this should be the new default. @heddn pointed out that it is often easier to fix issues on the new site rather than fix them prior to the migration. On the other hand some of our Drupal 8 minor update issues have been cause been invalid entities created by migrate. We agreed to create follow-up to discuss what the default should be.
Comment #87
larowlanThis looks great, but we need a change-record here to announce the new feature and a release notes snippet in the issue summary.
Tagging and setting to NR for those.
Comment #88
BR0kENAdded change record - https://www.drupal.org/node/3073707. Can I help with the release note?
Comment #89
heddnAdded a release note and reviewed and updated the CR. With that, I think we're good for RTBC again.
Comment #90
larowlanComment #92
larowlanCommitted 17f9026 and pushed to 8.8.x. Thanks!
Published the change record 🎉
Nice work folks
Comment #94
Wim Leers#86:
Nearly 4 months have passed, and that follow-up has not yet been created.
Because of the huge consequences of migrations that don't validate entities, and because the discussion in #86 is not accessible to me now, I'm taking the atypical and usually inappropriate action of reopening this closed issue, so that the follow-up is filed. It needs to provide sufficient context to allow Migrate contributors to help Drupal get to a place closer to where Drupal migrations do validate all migrated entities. Invalid/incorrect migrations cause problems for content authors (when they're editing content and validation errors occur for unchanged fields), for decoupled application developers (when JSON:API fails due to incompletely/incorrectly migrated data model definitions) as well as for site owners (when minor-to-minor update paths fail, as @alexpott already pointed out in #86).
Please forgive me 🙏😇
Comment #95
mikelutzThe discussion in #86 is preserved for posterity at https://youtu.be/9rBkBl4uGlg?t=1265
I'd rather just open the follow-up rather than re-opening this issue,so I opened #3095456: [Plan] Discuss strategy for long-term use of entity validation in the migration system
Nothing to forgive, my friend. Thank you for stirring the pot on these important discussions that have fallen by the wayside.
Comment #96
Wim LeersMuch appreciated! 🙏🥳