Hello all, it’s time for the weekly migration initiative meeting. The meeting will take place in slack in various threads
mikelutz |
#3061571: If no Drupal 8 equivalent filter is found during migration, drop transformation-only filters, so that body fields show up |
wimleers (he/him) |
WDYT about my proposal? :slightly_smiling_face: |
mikelutz |
I think the concept is a fine way to go, at least as an interim step. Still wish we could do something with php_code. |
mikelutz |
But the fact that we can’t shouldn’t stop this from going forward. |
wimleers (he/him) |
The best thing I can think of with php_code is to replace the and with PHP-CODE-REMOVED-STORED-IN-KEY-VALUE-AT-%KEY . With the value for %KEY% of course being everything in those PHP tags. That’d allow us to safely remove that filter, while not losing the data. (edited) |
mikelutz |
I wouldn’t mind stripping it out by default if the contrib php_code module isn’t installed. I see no real reason to try to save it in kv store though. At that point you need a developer to make it useful anyway, so you might as well have the developer do a custom migration if you want the data for something. |
mikelutz |
Remember, not migrating something isn’t data loss. You can’t lose the data if it’s still in the legacy database. |
mikelutz |
Could also do a core filter that strips the php code out without executing it. We would have to deal with the contrib php_code filter with the same name though. Maybe have a core remove_php filter that does that. Then, we map php_code to that IF the php_code module isn’t installed. Then the data is there, but we don’t execute the php. Not sure how the sec team would feel about that though. |
wimleers (he/him) |
At that point you need a developer to make it useful anyway, so you might as well have the developer do a custom migration if you want the data for something.My proposal allows a non-developer to proceed now and enables a developer in the future to bring back this functionality. |
wimleers (he/him) |
Could also do a core filter that strips the php code out without executing it. We would have to deal with the contrib php_code filter with the same name though.No, we could make core do a hook_migration_plugins_alter() that checks if the contrib module is installed. If it’s installed, it returns early. If it’s not installed, it would map D7's `php_code` filter to a D8 php_code_no_op filter. |
wimleers (he/him) |
That would enable sites that want to keep using PHP filter to keep using it, and gets everyone else on Sanity Island. |
mikelutz |
Right, basically what I meant. |
wimleers (he/him) |
Oh ok :slightly_smiling_face: |
wimleers (he/him) |
Great! |
mikelutz |
I just meant we couldn’t call it php_code, we would have to call it something else and manage the use case that the contrib module was installed. |
wimleers (he/him) |
But let’s leave php_code out of the picture for a moment |
wimleers (he/him) |
That’s one very clear scenario. |
mikelutz |
Yeah, that’s totally a separate issue from this one. |
wimleers (he/him) |
The “I used a D7 contrib filter” scenario is much more widespread. |
mikelutz |
The only question here is really what other filters need to be on the list. |
wimleers (he/him) |
I will volunteer to go over ALL D7 modules providing filters. |
mikelutz |
@heddn any objection to the approach here? |
wimleers (he/him) |
I have two main questions:would you RTBC this approach?would you accept having a list of contrib module things in core? Because that’s what this needs. |
mikelutz |
I think I would prefer if the list could live somewhere in migrate_drupal as opposed to filter. |
mikelutz |
That seems like a more appropriate place to keep contrib related data lists in general. |
wimleers (he/him) |
WFM. So you’re saying that migrate_drupal should override the FilterID plugin implementation, to add contrib filter types to D7 filters. Right? |
heddn |
@mikelutz @wimleers (he/him) I'm down with the approach :slightly_smiling_face: |
wimleers (he/him) |
:partyfrog: |
heddn |
I really don't like alter hooks though with migrations. |
heddn |
any way we can do this more natively, would be nice. |
wimleers (he/him) |
Which alter hooks? There’s no alter hooks in my proposal? |
mikelutz |
I don’t know exactly. I see why it makes sense in filter, and why putting it in migrate_drupal is tricky, maybe it’s not that important to move it out of filter, It just feels better to me to have static data relating to d7 contrib modules live in migrate_drupal. |
heddn |
@wimleers (he/him) hook_migration_plugins_alter is what I'm referring. if it isn't currently there, fine. But in the scrollback, it was mentioned :slightly_smiling_face: |
wimleers (he/him) |
@heddn Oh, but that was only when talking about php_code, which is not what #3061571: If no Drupal 8 equivalent filter is found during migration, drop transformation-only filters, so that body fields show up is trying to solve :slightly_smiling_face: |
wimleers (he/him) |
@mikelutz Aight, I’m looking forward to pushing this further then :smile: |
gabesullice |
"thoughts and issues" is a perfect heading :slightly_smiling_face:I'm sure that y'all have lots of thoughts about them and could give a some really valuable high-level pointers |
mikelutz |
#3006750: Move memory management from MigrateExecutable to an event subscriber |
mikelutz |
#2688297: File migration slows down and eats more and more memory, eventually stops |
mikelutz |
Generally, given the amount of data it can process, the migration system does a really good job managing memory. The file issue is one anecdotal issue I’ve seen pop up that we haven’t been able to reliably reproduce. |
gabesullice |
yeah, reproducibility has got to be the biggest issue whenever memory issues arise |
mikelutz |
There are stories of people that hit a wall with 6-7 figure numbers of rows, but in my experience the memory usage stays constant no matter how many rows are being processed. I’ve always figured the culprit was likely poorly written custom plugins and hooks. |
gabesullice |
I bet you're right about the source of the issues. Do you have a concrete example of a common mistake made in that custom code?I'm thinking that maybe an API change, documentation, or some magic code (or a combination of all 3) might make it easier to detect and fix those problems. (edited) |
gabesullice |
For example, maybe we get total memory usage before a plugin is invoked, then get it after, and do some kind of analysis to identify which plugin is most responsible for memory usage increases |
heddn |
@gabesullice it has never been reproduceable enough to even start reacting. and no one with enough know how has seen this problem (probably those in the know aren't use web UI and using drush instead) |
mikelutz |
Yeah, I don’t have a concrete example, but something like a runaway static cache for some data set could potentially do it. |
mikelutz |
But I can’t say I’ve ever seen someone do that specifically. |
chriscalip |
Also part of the mix is that if the targetBundle has many attributes and the complexity of the fieldType of those attributes. eg. (my node bundle, xxx, has many fields.. not just text fields but address,geofields,etc..) |
heddn |
we did a two d6 and one d7 => one consolidated d8 site migration with 800+ fields and loads of data. all via drush and there were no memory issues. :shrug: (edited) |
chriscalip |
how many items in play? less than 10k? what types of fields? text fields? |
gabesullice |
no one with enough know how has seen this problem (probably those in the know aren't use web UI and using drush instead)I'm trying to find ways to make it easier for developers with zero prior migrate experience to write their migrations. |
heddn |
exactly, really hard to pin down isn't it :slightly_smiling_face: |
gabesullice |
I feel like this is probably a sort of chicken and egg thing. If you're a dev who wrote code that leaks memory, you might not be a dev that's able to isolate the problem and create a reproducible scenario (edited) |
heddn |
how often does memory management become an issue? It seems rare. |
chriscalip |
above 10k of items |
gabesullice |
I've heard a few anecdotal comments that out-of-memory issues are a frequent problem with migrations and this is confirmed by some foggy memories of past migrations I've written. |
heddn |
@chriscalip if you have a db dump that we could play with that seems to systematically reproduce the issue, let's see if we can get an NDA in place and share that thing :slightly_smiling_face: |
gabesullice |
Maybe a Twitter poll could help us take a pulse on the problem (if one exists at all)? (edited) |
chriscalip |
@heddn i'll follow-up with you on that; my bet is given a big enough csv source with a diverse mix of fields on the target bundle I can reproduce troubles without actual db. just generated csv source and a target bundle config. |
heddn |
@chriscalip but that isn't the same thing. CSVs are notoriously bad at memory management from a source. Drupal to Drupal is SQL and much better at memory |
chriscalip |
@heddn i see for the sake of limiting scope, i bet a drupal source would also show a similar experience. |
gabesullice |
Proposed tweet:If you've run any migrations into Drupal 8 with the Migrate module, have you encountered an error like "Fatal error: Allowed memory size of NNNNN bytes exhausted"?YesNoN/A; show the numbers(edited) |
heddn |
@gabesullice can we word via web ui? |
heddn |
@gabesullice and please paste the link in here and I'll retweet for us |
chriscalip |
haha exhaustion even happens on cli |
chriscalip |
well remote cli.. i believe pantheon has 250mb limit on most of their plans. (edited) |
gabesullice |
Yeah, I'm not sure that the web vs drush distinction is important. |
gabesullice |
Maybe I'm missing something. |
heddn |
@chriscalip but CLI uses a different executable. there's enough moving parts I want to be sure we aren't finding a bug in contrib here |
chriscalip |
ahh limiting scope, thats fair. |
gabesullice |
hmm |
heddn |
@gabesullice it is important. there's core's executable, then migrate_tools has a batch and a non-batch version. |
heddn |
And the wording should ask for folks to reach out with anecdotal stories so we can see if there is a trend |
gabesullice |
Good point about the stories. |
gabesullice |
the reason I'm hesitant to limit scope and you're is because I think we're trying to answer a different question. |
gabesullice |
I'm trying to test whether "are memory issues are a hard problem in the migrate ecosystem?" |
gabesullice |
And I think you're approaching it as "does the migrate module have a memory problem" |
heddn |
k |
gabesullice |
If the answer to my question is "yes" then my feeling is that we could do things to help people in the ecosystem find their mistakes more easilty (edited) |
gabesullice |
Updated tweet:If you've run migrations into Drupal 8 with the migrate module, have you encountered an error like "Fatal error: Allowed memory size of NNNNN bytes exhausted"?If yes, please reply with anecdotes :slightly_smiling_face: i.e. what happened? what did you change? were you using drush or the UI? etc.YesNoN/A; show the numbers |
gabesullice |
@heddn how's that? |
heddn |
@gabesullice nice |
gabesullice |
:thumbsup: |
gabesullice |
https://twitter.com/GabeSullice/status/1212860036428357632?s=20 |
kevinquillen |
Using drush only, I would get it on Pantheon a lot with 30,000 records, especially if they dealt with files and paragraphs. I had to use jq to break the source (a json file) into a dozen files to process them within pantheons memory limits. |
kevinquillen |
Ones that dealt with fields that are neither file nor paragraphs? Totally fine |
heddn |
@kevinquillen I'd expect that w/ a large JSON source. I doubt there is much that can automagically be done to shrink that source down is there? |
gabesullice |
https://github.com/salsify/jsonstreamingparser |
gabesullice |
quick google, just out of curiosity |
gabesullice |
https://github.com/salsify/jsonstreamingparser/blob/master/src/Listener/... |
kevinquillen |
Hm would have to give that a go, came up with the file chunking in a time crunch |
gabesullice |
yeah, if jq worked, by all means go for it!I think I saw a coupled JSON related OOM issues mention on the twitter poll above. If that's a frequent pain point, maybe migrate_plus could have a source that handles large JSON files by streaming them. |
gabesullice |
just thinking out loud |
gabesullice |
@kevinquillen, was your JSON basically a huge array of objects? |
gabesullice |
I imagine that's usually the case |
kevinquillen |
Yeah it was about 30,000 items in two files for two content types |
gabesullice |
so like:[ {itemId: 1}, {itemId: 2}, ... {itemId: 29123}, ...] |
gabesullice |
? |
kevinquillen |
Yep, maybe 12-15 fields each |
heddn, Nick Wilde (he/him) - DC 2020 DevOps Session Team, webchick, wimleers (he/him), gabesullice, mikelutz, chriscalip, kevinquillen
Comments
Comment #2
mikelutzComment #4
mikelutzComment #6
mikelutzComment #8
mikelutzComment #10
mikelutzComment #12
mikelutzComment #14
mikelutzComment #16
mikelutz