Problem/Motivation
Following up from #2669964: Migrate Drupal 7 core node translations to Drupal 8, there are lots of related data to nodes that are now united in the migration to the same nodes. Say nodes 6, 7, 8, 9 and 10 all get migrated to node/6 (because 7, 8, 9, 10 are all translations of node 6), this leaves 404s at node/7, node/8, /node/9 and /node/10. Their respective path aliases are migrated to node 6 with #2850984: Fix path alias migration of translated nodes [D7] and #2827644: Fix path alias migration of translated nodes [D6].
However there is likely other related data (entity references, menu items, etc) that is not updated (yet?). This is arguably data integrity for dead links in content etc., so making this major and 'migrate critical' to at least document if not fix.
Proposed resolution
We don't have path_redirect in core, and aliases probably aren't the right fix here (to alias the node/9 path itself to node/6), but for example it would be possible to keep an nid->tnid map somewhere as part of the migration and have a listener that looks for 404s, consults the URL vs. the map and redirects if it finds a match (in an optional module).
Remaining tasks
Discuss. Implement.
User interface changes
None.
API changes
None anticipated.
Data model changes
None anticipated.
Comment | File | Size | Author |
---|---|---|---|
#107 | interdiff-2850085-103-107.txt | 2.48 KB | maxocub |
#107 | 2850085-107.patch | 17.04 KB | maxocub |
#103 | 2850085-103.patch | 17.02 KB | maxocub |
#103 | interdiff-2850085-96-103.txt | 2.12 KB | maxocub |
#96 | interdiff-2850085-95-96.txt | 919 bytes | maxocub |
Comments
Comment #2
mikeryanThis was addressed for D6 in #2827644: Fix path alias migration of translated nodes [D6] - it looks like all that's needed for D7 is to make the corresponding changes to d7_url_alias.yml and add tests. We can either do this as a follow-up to #2669964: Migrate Drupal 7 core node translations to Drupal 8, or incorporate it into that patch.
Comment #3
Gábor HojtsyIMHO do it as a followup, we don't do any other reference updates, eg. where comments were posted to nodes, or where menu items are pointing to the node, etc. All of those are also broken. Aliases are not anywhere more special than the rest of the data referencing the node that is no longer there.
Also we do not have a redirection solution in core that I know, so we could only do the aliasing updates.
Comment #4
Gábor HojtsyComment #5
catchWe don't have a generic redirect solution in core, but we could have an optional module that handles redirects for old translation set node IDs to the merged node. It could listen to 404s, check the request path, look up the map, send a RedirectResponse.
Comment #6
Gábor HojtsyIn that case this would reach back to Drupal 6's data too.
Comment #7
catchYes we'd need a k/v collection or similar that keeps the nid -> tnid mapping.
Comment #8
Gábor HojtsyI believe #2827644: Fix path alias migration of translated nodes [D6] is using migrate's ID map tracking, but for this we would need to somehow have a more permanent tracking data store(?)
Comment #9
maxocub CreditAttribution: maxocub commentedComment #10
Gábor HojtsyUpdated issue summary given #2850984: Fix path alias migration of translated nodes [D7] so we can focus on the redirect problem here.
Comment #11
Gábor HojtsyComment #12
Gábor HojtsyComment #13
maxocub CreditAttribution: maxocub commentedI'd like to work on this. Why is it postponed?
Comment #14
Gábor HojtsyIt was originally postponed on #2669964: Migrate Drupal 7 core node translations to Drupal 8. I think it may still be postponed on #2850984: Fix path alias migration of translated nodes [D7] unless there is no overlap in where they change things.
Comment #15
maxocub CreditAttribution: maxocub commentedI don't see any overlap with #2850984: Fix path alias migration of translated nodes [D7], so I put it back to Active.
I have a working proof of concept, but I'm wondering where to put the code, right now it's in a custom module.
The code consists only of two event subscribers, one for the migrate.post_row_save event (to populate a key/value collection) and the other for the kernel.exception event (to redirect 404s based on the key/value collection).
I can think of these three possible locations:
What would be the best approach here?
Comment #16
Gábor Hojtsy@maxocub: I think @catch meant a separate module in core that the migration enables but otherwise is hidden.
Comment #17
maxocub CreditAttribution: maxocub commented@Gábor Hojtsy: OK, thanks, I'll upload a patch when my tests are fixed.
Comment #18
maxocub CreditAttribution: maxocub commentedHere's a first draft.
I have a hard time trying to test the redirection, if anyone have a better idea, please tell me.
Comment #20
maxocub CreditAttribution: maxocub commentedTrying to fix the failing tests.
Comment #21
heddnCan we just use redirect.module instead? I really don't like the feel of this direction...
Comment #22
Gábor Hojtsy@heddn: is that in core? :)
Comment #23
catchredirect module, at least in Drupal 6 and 7 has a performance issue on large sites given it does a database lookup on every request to see if there's a redirect. I haven't looked to see how it does things in 8.x yet.
The approach here allows us to do a lookup only if the response would otherwise return a 404, so it's a lot more efficient, and it'll only be enabled on migrated sites anyway.
Comment #24
heddnI'm suggesting we add a sub-module to redirect.module that does just what we need. But a lot of the plumbing and test cases are pretty solid in that module and I'd be worried about conflicts between the two modules. And a UI and the UX of a similar but different module in core.
Comment #25
catchThere'd be no UI for this at all, it wouldn't even be visible on sites without migrate installed.
Comment #26
Gábor HojtsyAssigning to myself for review.
Comment #27
Gábor HojtsyHm, should node module handle this? I don't know how these subscribers are implemented elsewhere.
This seems unrelated?
Should this be a sub-module under migrate_drupal?
Should this be same as the module description? Don't think we need to be very creative here :)
This is always the resolved internal URL? I think it would be useful to highlight that we already migrate path aliases too.
Hm, how does this work if a site migrates data from different sources? Eg. a Drupal 6 and a Drupal 7 site? That may not be a 80% use case, but sounds like this would break there, given the different source node ids possibly overlapping.
Comment #28
quietone CreditAttribution: quietone as a volunteer commentedLooks like there are questions to be answered... Setting to NW
Comment #29
mikeryanComment #30
maxocub CreditAttribution: maxocub commentedI'm working on this in Baltimore.
Comment #31
maxocub CreditAttribution: maxocub commentedComment #33
maxocub CreditAttribution: maxocub commentedFixing coding standards and two tests.
Comment #34
maxocub CreditAttribution: maxocub commentedFixing the remaining failing tests.
Comment #36
heddn#27.1 I think this should be move into node module. We've tried pushing that type of stuff out to the module itself as much as possible.
#27.3 We split out sub-modules in contrib to bundle in the same package for drush, but in core, we don't need that. I don't think we need to switch from that and start creating sub-modules.
#27.6 I don't think this is a possibility that can be addressed. Unless I misunderstand your comment.
Moving back to NW for item #1.
Comment #37
maxocub CreditAttribution: maxocub commentedMoved the subscriber from migrate_drupal to the node module
Comment #39
maxocub CreditAttribution: maxocub commentedWe need to check if the MigrateEvents class exists before subscribing to migrate events now that the subscriber is in node.
Hope this fixes all the failing tests...
Comment #40
jofitz CreditAttribution: jofitz at ComputerMinds commentedMinor coding standards correction: use short array syntax, [].
Comment #41
mikeryanHaven't done a full code review, but a casual look gives me some concern here - I'm not thrilled with depending on magic ID names. Could we determine this instead from the source plugin having ['translations' => TRUE]?
If not, at the very least s/2, 17/-17/ would enable this to work with migrate_upgrade-generated migrations of the form 'upgrade_d6_node_translation'.
Comment #42
maxocub CreditAttribution: maxocub commented@mikeryan: You're right, that was not very strong or flexible.
Now I check if we are migrating node entities by looking at the destination plugin and if they are translatable by looking at the translations configuration.
Although, instead of looking at the 'translations' key in the source plugin, as you suggested, I look at the 'translations' key in the destination plugin, since its already loaded. I don't think it matters, but I might be wrong.
Comment #44
c.nish2k3 CreditAttribution: c.nish2k3 as a volunteer commented@maxocub - this needs to be done for onPostImport() as well.
Comment #45
maxocub CreditAttribution: maxocub commented@c.nish2k3: Oups, forgot about that one. To avoid duplicating the code there, I added a helper method.
Also, finally, I look at the 'translations' key on the source configuration instead of the destination, to avoid running that code if we are migrating sources other than Drupal. Let's see if it makes the 'MigrateExternalTranslatedTest' pass.
Comment #46
vasi CreditAttribution: vasi at Evolving Web commentedComment #47
heddnParent meta is critical. Moving that priority down here. And closing out parent as this was the last active task.
Comment #48
Gábor Hojtsy#2746527: [META] Handle data related to Drupal 6 and 7 node translations with different IDs has various points that did not yet have subissues.
Comment #49
maxocub CreditAttribution: maxocub commentedIt has been discussed during the weekly migrate hangout that we should use a dedicated table to store the redirection map instead of the key/value store, since it could become bloated on a big site. Back to NW.
Comment #50
Gábor HojtsyNot just the bloat. We discussed that you may be *done* with your migrations, but want to keep your site up including the redirection feature, which is useful way past when migrate was needed on the site.
Comment #51
maxocub CreditAttribution: maxocub commentedHere's a new version using a mapping table instead of the key/value store.
Comment #52
mikeryanComment #53
heddnAssigning to myself for review.
Comment #55
heddnI cannot see anything wrong in the code. I'm not sure I like the yucky feeling I get with the architecture implemented here, but I don't have any better ideas. So let's move along to RTBC.
Comment #56
catchWhy not make a new key/value collection for this rather than making a custom table? The k/v collection absolutely should be able to persist after migrations are finished, but putting it in a custom database table prevents for example moving the storage to a redis implementation or similar.
Comment #57
maxocub CreditAttribution: maxocub commentedI kind of prefer the key value collection too.
But since the node_translation_redirect module is essentially a single event subscriber, wouldn't it be better to just move it to the node module instead of adding a new module to Core?
Comment #58
heddnComment #59
heddnInventing a tag to mark this as pertaining to migrate_drupal and therefore blocking its stable release.
Comment #60
heddnBlocker for #2905736: Mark Migrate Drupal as stable
Comment #61
heddnFirst a re-roll.
Comment #62
heddnre #57 why a new module: I think the idea for a new module is that the K/V lookup is somewhat expensive and to turn it on/off, it is easier to do this with a new module. It is only needed for nodes that were migrated from an i18n source.
Nit: unused variable.
I've fixed the nit in the attached patch and marked this RTBC. I don't think removing an un-used variable makes me disqualified.
Comment #63
heddnShould we be worried about bloat of K/V if someone has a TON of translated nodes on their site?
Comment #64
larowlanI was concerned about having this in the global container/event dispatcher for all sites - and seeing the way the getSubscribedEvents method is implemented confirms my concerns.
We need to add a NodeServiceProvider class here that implements ServiceModifierInterface and only adds the event subscriber service if migrate module is enabled.
For more info see https://www.previousnext.com.au/blog/overriding-services-drupal-8-advanc... and for a concrete example of conditionally adding a service if a module is enabled - see https://github.com/drupal-media/file_entity/blob/8.x-2.x/src/FileEntityS...
nit: we can use === here
I'm also not convinced that a new module is the answer here. Will dig about with the service provider notion and see if we can conditionally add this in node too.
We can't match this on path - because paths can be modified. We need to match it on route name -
entity.node.canonical
Similarly, once we use the route match service, we can use the raw parameter here instead of the regex matches
I think we should be doing
$event->setResponse($response)
here. Sending the response immediately prevents further execution.Once we change the code above to not use
$response->send()
we can then go$response = $kernel->handle()
and then assert that the status code is 301 and the location is as expected - instead of asserting the raw output.Comment #65
larowlanWould anyone object to people needing this functionality having to add a container parameter to enable the redirect, instead of the hidden module?
so in
sites/default/services.yml
they need to addAnd then we use that in the service provider to enable the subscriber?
Comment #66
Gábor Hojtsy@larowlan: trick question: can migrate add that container parameter?
Comment #67
larowlanIf Drupal can write to sites/default/services.yml then yes it can.
I think we do that in tests...will confirm
Comment #68
larowlan\Drupal\Core\Test\FunctionalTestSetupTrait::setContainerParameter
is what I'm referring to.But it also relies on having this in settings.php
$settings['container_yamls'][] = $app_root . '/' . $site_path . '/services.yml';
Although I note that is the default in default.settings.php.
Can migrate check those two things first?
i.e. that services.yml exists and is writable and that
Settings::get('container_yamls', [])
includes the services.yml file?I will also ask around in case there is a better way
Comment #69
heddn64.1: where we not trying to do the redirect in another module outside of migrate and using k/v instead of the migrate mapping takes so we could disable migrate module?
Also, I'd like a confirmation on my concern with k/v bloat
Comment #70
catchThe k/v table could get huge, if we're worried about that we should consider either changing the k/v database storage to be table-per-collection or subclassing maybe. No-one's going to be doing any joins on that table though, so it's really only inserts/deletes/updates that could cause any issues.
Sites running into problems due to that much content may need to use alternative storage for things anyway though.
Comment #71
maxocub CreditAttribution: maxocub as a volunteer and commentedFor a start, I fixed #64.2, #64.6 & #64.7.
About #64.4 & #64.5, I am not able to retrieve the route name with RouteMatch in the ExceptionSubscriber, it's always returning NULL, anybody have a clue why? This is what I'm doing:
If the URL we are trying to reach does not exists, isn't it normal that we don't have a route name for it?
If so, how else can we check that the requested path is a node path, other than
$request->getPathInfo();
?Comment #72
maxocub CreditAttribution: maxocub as a volunteer and commentedAbout #64. & #64.5:
After a few shots at this, I don't think there's a way to use the route match service to replace the preg_match. If, for example, '/node/2' was a translation on D7 and so does not exists anymore on D8, then there's no way to match this path to a route. We will always end up with a NullRouteMacth. So for now, I don't see any way around a regex, unless someone has a better idea?
One thing though, the current regex is not OK, since it's not catching paths like '/fr/node/2'.
Comment #73
catchI'm looking at PathValidator, this is the closest to what we want to do:
In the case of an entity not existing, we should get ParamNotConvertedException back from the route matcher. At the point ParamConverterManager throws that exception, it puts the route path and route name in the exception message.
We could look at something like adding a route name property to ParamNotConvertedException, then it would be possible to take a path, try a route match, and if the path matches a route but the param can't be converted (i.e. entity doesn't exist) still determine what the route would have been.
The other approach would be to add a public version of Drupal\Core\Routing\Router::matchCollection() that gets back the possible routes for the path before processing, then could just check if any of them are entity.node.canonical.
Since this is tagged migrate critical and partially blocks #2746527: [META] Handle data related to Drupal 6 and 7 node translations with different IDs bumping to critical.
Comment #74
heddnComment #75
heddnTagging this for some in person brainstorming in Vienna.
Comment #76
heddnWhat would be nice is a list of issues that /need/ to be resolved for migrate drupal stability. Or at least as much as possible, a full list of what is remaining.
Comment #77
catchSo the migrate critical tag should be that list ideally, but there might be issues in it which aren't actually stable blockers and some not tagged that are - that's fixable though via triage.
Comment #78
maxocub CreditAttribution: maxocub for Acquia commentedWIP, am I on a good track?
Comment #79
maxocub CreditAttribution: maxocub for Acquia commentedStill a WIP. Got rid of the new module in favor of two new node services:
I now have to work on setting that container parameter after a multilingual migration.
Comment #81
maxocub CreditAttribution: maxocub for Acquia commentedThis should fix the tests.
Comment #82
maxocub CreditAttribution: maxocub for Acquia commentedThis is the part that needs work an discussion.
What was suggested between #65 and #68 is that we add a container parameter to services.yml if it exists, is writable and is included in settings.php.
Those conditions might not be met in the majority of cases, and in those case the user might not be familiar or have no access to the creation of a services.yml file.
Should we look for another way or is that an acceptable solution?
Comment #83
catchThis looks a lot better. Even though I suggested extending the exception, architecturally it seems wrong. However that's comes from the entire pattern we've inherited from Symfony of using exceptions for control structure - 404s and 403s aren't actually exceptional but the normal functioning of a website. Given that limitation, it's good to get rid of the regexp.
We could probably get the previous exception only if the instanceof check passes for the first with an early return to save an extra level of nesting?
Comment #84
heddnThis is tough, the writing to a new or existing services file during the migration. It won't work for example on platform.sh, because the disk will be read-only. Instead, we can enable/disable a module. Which gets us back to where we started.
Comment #85
maxocub CreditAttribution: maxocub for Acquia commentedWould a simple
$container->setParameter()
in the first if statement of NodeServiceProvider work? That seems too easy and I'm not familiar with container parameters.If this works, it means we are 'activating' the redirections only if we have the migrate & language modules enabled, but them we can disable the migrate module and still have our redirections.
Also, re #83:
I added the early return if it's not a NotFoundHttpException.
One thing though, I don't understand if you are saying that the added parameters/methods to ParamNotConvertedException are not a good idea and we need to find another solution, or if it's bad but we have to live with it?
Comment #86
maxocub CreditAttribution: maxocub for Acquia commentedRe #85:
No that won't work, because if we uninstall migrate then the container gets rebuilt and the parameter is lost. Bummer.
Comment #87
larowlanCan we:
Comment #88
quietone CreditAttribution: quietone as a volunteer commentedI have the read the issue, installed the patch and had a brief look at the code. Everything looks good, so far. I mean that all the comments have been addressed but one.
What remains to do is to get an answer to maxocub question in #82.
Discussed with phenaproxima, and he would like to have input from another framework manager on this.
Comment #89
phenaproximaI would like input from another framework manager, but if we can't get it:
This looks like a great suggestion to me.
Comment #90
phenaproximaI really like this patch a lot. I feel like it's a pretty elegant solution to this persistent problem, and I can't wait to see it land.
I could be wrong, but I think \Throwable is not a thing in PHP 5, so $previous needs to be \Exception.
Also, $routeName should be $route_name, $parameters should be type hinted, and if $routeName and $parameters are required, we should make them required constructor parameters :) PHP lets constructors have any signature you want, even if it's incompatible with the parent constructor.
Additionally, we need a full doc block (not @inheritdoc), since we're defining new parameters here.
@return needs documentation, even if it's just "The route name" (lame, I know).
@return needs documentation. It's probably worth mentioning if these are the raw parameters, or the so-far converted ones.
Nit: These variable names should be using $snake_case, not $camelCase.
Nit: Should be $key_value.
This is a bit vague, can it be expanded to explain why it's storing this mapping? A paragraph or two would be useful. A change like this is best documented extensively in the code.
Nit: This doc comment isn't really accurate, and it repeats the comment in register(), which is more useful because the code is right there. So I think this can just say something generic, like "Registers services in the container."
Nit: Can this use the ::class syntax instead?
In discussion, we agreed to ditch the container parameter in favor of setting a state or config key. Given the circumstances, a container parameter may be too volatile for what we're trying to accomplish.
As per the last comment, we don't need the if statement anymore.
Can we use ::class here?
Not a valid group anymore :) This should probably have:
Need to adjust this too. ;)
Comment #91
jofitz CreditAttribution: jofitz at ComputerMinds commentedI'll take a look at addressing @phenaproxima's review.
Comment #92
maxocub CreditAttribution: maxocub for Acquia commentedSorry Jo, I'm already on it at the migrate sprint in vienna, I just didn't thought of assigning it to myself
Comment #93
maxocub CreditAttribution: maxocub for Acquia commentedComment #94
jofitz CreditAttribution: jofitz at ComputerMinds commentedNo problem, @maxocub. Go for it!
I just wish I was with y'all in Vienna.
Comment #95
maxocub CreditAttribution: maxocub for Acquia commented@Jo Fitzgerald: Wish you were here to meet you!
Re #90:
throw new ParamNotConvertedException
calls everywhere and I think this will break BC.Comment #96
maxocub CreditAttribution: maxocub for Acquia commentedRemoved "string" & "int" type hints since they won't work on PHP5.
Comment #97
phenaproximaOh, I like it.
On the assumption that Drupal CI will pass it, I think this is RTBC. Thanks, @maxocub!
Comment #98
maxocub CreditAttribution: maxocub for Acquia commentedUnassigning.
Comment #99
catchMassive crosspost, one option that hasn't been discussed, haven't read comments from #88 onwards. If #87 stops the event subscriber running on every 404 then that seems like a better option than the below anyway:
Drupal\language\LanguageServiceProvider is probably the closest example and uses config - conditionally register the service provider in a compiler pass.
State is probably better than config here but principle should be the same - means it'll get checked on container rebuild as opposed to every 404.
One potential problem with State is that LanguageServiceProvider has to check ConfigBootstrapStorage - there's a @todo about the dependency issues with the compiler pass, not sure what that looks like for state. That's a general problem with compiler passes depending on config/state.
Comment #100
maxocub CreditAttribution: maxocub for Acquia commentedFollowing discussions with @phenaproxima and @catch, here are some conclusions to questions we had to improve this patch:
Could we merge the two subscriber into one?
What about checking for the state in the getSubscribedEvents()?
Comment #101
catchThis is looking very close. Found a few things though:
What about moving the state check after the ParamNotConvertedException check? Seems like $exception->getPrevious() should be cheaper than fetching from state. Possibly even after checking the route name.
Also $routeName should be $route_name and $previousException should be $previous_exception per coding standards.
If we do $collection->has() followed by $collection->get() does this issue two queries or one? Seems like we could probably do $collection->get() all the time and check the result.
Just to double check, is there a reason we can't do the state check here instead and avoid registering the listener altogether?
Comment #102
maxocub CreditAttribution: maxocub for Acquia commentedThanks for the review @catch, I'm on it.
Comment #103
maxocub CreditAttribution: maxocub for Acquia commented1. I moved the fetching from state after the route name check, and changed the variable names.
2. I removed the call to $collection->has(), $collection->get() already returns NULL if it's not set.
3. For the same reasons as why it can't be in the getSubscribedEvents() call, the container is not yet initialized.
Comment #104
phenaproxima#103 looks lovely to me.
And +1 to @maxocub's reply to #101.3 -- we can't use container services in a service provider, because the container is in an indeterminate state (i.e., still in the middle of being built). LanguageServiceProvider, it seems to me, realized this and had to hack around it.
Comment #105
catchOne more thing, I've had about five crossposts trying to post this one, thanks for the quick responses:
- maybe this should be "Redirect migrated node translations to the combined node." Or "Redirect node translations that have been consolidated by migration."
This is a difficult concept to describe in one sentence, but the issue isn't that that translations don't exist but that they've been combined, it's the old nodes that don't exist any more.
Comment #106
maxocub CreditAttribution: maxocub for Acquia commentedOn it.
Comment #107
maxocub CreditAttribution: maxocub for Acquia commentedHere's it is.
Comment #108
webchickLooks like @catch is all over this one.
Comment #111
catchI think this looks good now. I pondered whether we'd use it for anything other than redirections, but migration should handle the other cases per the meta-issue, so being clear what it's for seems better than leaving it open.
Fixed this on commit:
Committed/pushed to 8.5.x and cherry-picked to 8.4.x. Thanks!