Problem/Motivation
I am migrating wimleers.com from D7 to D8. It contains comments from the D5/D6 days. It looks like this:

Note that the language column is empty ('') for the upper half of the rows displayed, but set to the unspecified language code ('und') for the bottom half. The upper half are comments upgraded from D5/D6. The bottom half are comments created in D7.
When migrating comments with entity validation turned on (see #2745797: Add option to content entity destinations for validation and #3095456: [Plan] Discuss strategy for long-term use of entity validation in the migration system), this causes lots of validation errors:

Also there's cases when comments become "orphaned" and sometimes they using language which disabled or deleted, so this cases also needs "fallback" to some undefined language - ref #3044158: Orphaned comments Error: Call to a member function getOwner()
Proposed resolution
Detect this edge case in \Drupal\comment\Plugin\migrate\source\d7\Comment and automatically fix it.
Remaining tasks
None.
User interface changes
None.
API changes
None.
Data model changes
None.
Release notes snippet
TBD
Comments
Comment #2
wim leersFix + test coverage
Comment #3
wim leersThis was triggering 473 validation errors for me, with #2 applied zero 🤓
Comment #4
wim leersBeen told that I should put this in
migration system:)Comment #5
quietone commentedI have only glanced at this patch and this jumped out at me.
Instead of passing an empty string in the pipeline we normally use the default_value process plugin to set the value in the pipeline. For example as is done in the node migrations.
Comment #6
wim leersI was first going to do that, but … I wasn't 100% clear on how to configure
\Drupal\migrate\Plugin\migrate\process\DefaultValueto only treat''like this, and notNULL,0,'0'et cetera (see https://www.php.net/manual/en/function.empty.php, which is whatDefaultValueuses).Comment #8
quietone commentedComment #9
wim leersTo be clear: I'm happy to change it to that, but I think the current patch allows for less room for error.
Comment #10
quietone commentedWell I reckon that NULL, 0 and '0' would be 'bad' data and it would be perfectly reasonable to attempt to migrate the row with a langcode of 'und'.
This will also need a migration test. I mean update the fixture with a new comment and make assertions in d7/MigrateCommentTest.
Comment #11
quietone commentedComment #12
wim leersGood point! I was just reasoning from the perspective "this is an actual observed known-to-occur invalid value resulting from prior Drupal upgrades". But I like your reasoning more :)
I looked for the appropriate place to add test coverage, and thought #2's was correct. But I see now that is just sort of a unit test, I hadn't found
\Drupal\Tests\comment\Kernel\Migrate\d7\MigrateCommentTestyet. That looks great! Thanks for the guidance! 🙏Comment #13
andypostI think this fallback should be needed not only in case of bad data but for case when no such language exists (I faced with orphaned comments in language which were deleted years ago)
Comment #14
wim leers#13: oh, great, can you update the issue summary? 🙏
Comment #15
andypostupdated with first found #3044158: Orphaned comments Error: Call to a member function getOwner() but it was very common case for d5/d6 era
Comment #16
huzookaComment #17
huzookaComment #18
wim leersYay, that test-only patch failed! Now let's bring back the fix from #2 (and perhaps tweak some test expectations) to make it pass again.
Comment #19
huzookaI had to update
Drupal\Tests\migrate_drupal_ui\Functional\d7\Upgrade7Testsince we have one more comments.Also removed the
use Drupal\Core\Language\LanguageInterface;line fromDrupal\comment\Plugin\migrate\source\d7\Commentfor test-only patch.Addressed #10, but not with the
default_valueplugin: I still prefer usingLanguageInterface::LANGCODE_NOT_SPECIFIEDinstead of hard-codingundin the migration.Comment #21
wim leersIncremented the expected comment count: makes sense.
I think all feedback has been addressed. I think this is ready.
This is now just blocked on migration system maintainer review. Thanks, @huzooka! 🙏👍
Comment #22
quietone commentedI'll review this during the coming week.
Comment #23
wim leersLovely, thank you @quietone!
Comment #24
quietone commented@huzooka and @Wim Leers, thanks for work here, this was easy to review.
This look really good and there is failing patch as well. The test and changes to the fixture are doing exactly what needs to be done.
This is just setting a default if the language is not defined and should be done in the process pipeline, not the source plugin. Doing so in the source plugin adds code that requires testing and maintenance. We can avoid that by using the pipeline.
And I think a comment in the pipeline would be helpful as well.
Then with part of this comment moved to the pipeline this can be shorter. Maybe just
'Tests a comment migrated from Drupal 6 to Drupal 7 that did not have a language.' or something like that
Changes not needed if the default value is set in the process pipepline.
Comment #25
quietone commentedUpdating tags
Comment #26
huzookaAddressed everything from #24.
I removed the previous additions of the comment source plugin test
Drupal\Tests\comment\Kernel\Plugin\migrate\source\d7\CommentTest, and set the default 'und' value in the process pipeline.@quietone, @Wim Leers, @andypost: thank you all!
Comment #27
wim leersD'oh. This makes so much sense 😅 Thanks, @quietone!
I think it's okay for me to RTBC since there's basically nothing left from my first patch iteration 🤓
Comment #28
huzookaComment #29
huzookaComment #30
alexpottCommitted and pushed 642b7653d4 to 9.0.x and 17ef9e3e9a to 8.9.x and fc579acaa2 to 8.8.x. Thanks!
Backported to 8.8.x as a migrate bugfix.
Comment #34
wim leersLovely, thanks! Since it's already been backported all the way to 8.8, I think this can be marked ? :)