Problem/Motivation

We have parent field revisions since #2904231: Parent fields are not revisionable

And now we have outdated data.
Thus, processes such as garbage collection don't work properly
#3016388: Manual cleanup process for obsolete composite entities

Proposed resolution

Rebuild the parent field data for all revisions.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

miro_dietiker created an issue. See original summary.

johnchque’s picture

Maybe we can take a look at the approach in: https://gitlab.com/chx_/nope

miro_dietiker’s picture

IMHO this issue should be about checking and fixing all the per-revision parent storage information so we can reliably build on top of it.
We should not try to find workarounds. If those would perfectly work, we wouldn't even need the parent field revisionned..

johnchque’s picture

Status: Active » Needs review
FileSize
2.45 KB

So I could get all paragraphs that need to be updated. We just need to set the parent fields now.

One thing missing before moving forward is that we need to get in the query the entity type of the parent so we can update the revision by using the same results of the query.

Berdir’s picture

  1. +++ b/paragraphs.install
    @@ -387,3 +388,45 @@ function paragraphs_update_8019() {
    +/**
    + * Rebuild the parent fields.
    + */
    +function paragraphs_update_8020() {
    +  $fieldsArray = \Drupal::service('entity_field.manager')->getFieldMapByFieldType('entity_reference_revisions');
    +  $database = \Drupal::database();
    +  foreach ($fieldsArray as $entity_type => $fields) {
    +    foreach ($fields as $field_name => $values) {
    +      $entity_type_definition = \Drupal::entityTypeManager()->getDefinition($entity_type);
    

    I'm not sure that this should be an update function, as it could run for quite some time, we'll need to test it a bit.

  2. +++ b/paragraphs.install
    @@ -387,3 +388,45 @@ function paragraphs_update_8019() {
    +      $parent_table = $entity_type . '_revision__' . $field_name;
    

    You shouldn't hardcode this, instead use $table_mapping->getDedicatedRevisionTableName($field_storage), see views_field_default_views_data(). Similar to that, we need to check that the storage of that entity type uses SqlContentEntityStorage

    And, if we keep it as an update function then it should be a post update.

Berdir’s picture

Status: Needs review » Needs work
johnchque’s picture

Status: Needs work » Needs review
FileSize
3.23 KB
5.5 KB

OK, making it post update. Addressing suggestions.

Tested by:
- Creating a text paragraph and a nested at top level, then saving
- Update the node with the drag and drop function by moving the text paragraph inside the nested one. Parent fields should be equal in both revisions of the text paragraph.
- Run updates. The parent fields are updated accordingly. Tried also Revert the initial revision and it works fine now. :)

[EDIT]
All the above was tested with Paragraphs created BEFORE the parent fields became revisionable. Paragraphs that used drag and drop before that had the parent fields matching the last revision only. This patch will fix this problem on those old paragraphs revisions.

miro_dietiker’s picture

Your description reads like we have a bug in the drag & drop code currently?

johnchque’s picture

Checking if the parent is revisionable. :)

johnchque’s picture

Tested with translations, it works nicely.

Also added the language check for making it work with translatable fields. If we are mainly working with revisions tables each translation will be taken individually and updated accordingly. I don't think that this is a problem. :)

mbovan’s picture

Status: Needs review » Needs work
+++ b/paragraphs.post_update.php
@@ -113,3 +115,65 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
+        $revision->set('parent_id', $result->revision_id);

We are here updating the parent_id property with the revision_id of the target entity.

From the parent_id property description:

The ID of the parent entity of which this entity is referenced.

Shouldn't this be the target entity ID instead?

Berdir’s picture

correct, the parent id is the ID, not the revision ID.

mbovan’s picture

I worked on the update on top of #10.

Summary:
- Fixed #11
- The patch checks the target type so we only update paragraph entities and not all entity_reference_revisions fields
- I added support for paragraph fields created as base fields (e.g. in paragraphs library module)
- Added a few more safeguards
- Improved queries and column names.
- Improved comments

I didn't notice any obvious issues with the updated patch.

I will test it more on a couple of bigger sites.

mbovan’s picture

I am getting interesting results with nested paragraphs where entity types fight between each other which is the correct parent.

The steps to reproduce are explained in #3056868: Drag & Drop: New revision is not created when parent gets changed. I am not sure yet if it belongs Paragraphs or Entity reference revisions issue queue.

+++ b/paragraphs.post_update.php
@@ -113,3 +116,94 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
+  // Nested paragraphs must be updated first.
+  if (isset($entity_reference_revisions_fields['paragraph'])) {
+    $entity_reference_revisions_fields = ['paragraph' => $entity_reference_revisions_fields['paragraph']] + $entity_reference_revisions_fields;
+  }

With this, we are telling the post-update hook to take nested paragraphs data into account first and then we consult the other entity types.

AFAICS, #3056868: Drag & Drop: New revision is not created when parent gets changed means that we lost one (at least one) revision to track here...

johnchque’s picture

AFAIK, losing one revision is kinda expected as of some implementation of some feature I don't remember would make two parents point to a child so it would overlap depending on what entity type is processed later.

miro_dietiker’s picture

@yongt9412 Triggering a double save and thus maybe loosing one revision is one thing. But we seem to be talking here about overwriting a previous clean revision that leads to big trouble. Thus a recovery of the latest revision after a bogus drag & drop is not possible.

I experienced such a thing already but couldn't reproduce consistently. A bug in drag & drop and thus maybe losing a Paragraph is one frustrating thing. Inability to restore the previous clean state is then the worst case.

mbovan’s picture

Status: Needs review » Needs work

Changing the status to Needs work to add a drush command that does what paragraphs_post_update_rebuild_parent_fields do.

Berdir’s picture

The drush command is not for this but the err orphan cleanup issue.

Or more correct, it would only be if we decide to make that a manual batch ui/drush thing without post update

mbovan’s picture

Status: Needs work » Needs review
FileSize
7.39 KB
10.62 KB

This patch moves the revisions updates into the batch so the post update function doesn't timeout when it's executed through update.php UI.

mbovan’s picture

#13 narrows this down to support Paragraphs fields only. Since #3016388: Manual cleanup process for obsolete composite entities was created in Entity Reference Revisions module I am wondering whether this issue should be moved to that project too so we can update all ERR fields?

Berdir’s picture

No, this is paragraph specific, only there we changed the storage from not being revisionable to being revisionable for those fields. That's why we have to fix this now.

mbovan’s picture

Assigned: Unassigned » mbovan
Status: Needs review » Needs work
+++ b/paragraphs.post_update.php
@@ -113,3 +116,161 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
+    $revision = $paragraph_storage->loadRevision($result->$revision_column);

As per @Berdir, we should be more cautious here and check if the revision entity exists.

mbovan’s picture

Assigned: mbovan » Unassigned
Status: Needs work » Needs review
FileSize
7.42 KB
947 bytes

Addressed #22.

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +        $field_storage = FieldStorageConfig::loadByName($entity_type_id, $field_name);
    +
    +        // In case the field storage configuration entity do not exist the field
    +        // can be a base field.
    +        if (!$field_storage) {
    +          $base_field_definitions = $entity_field_manager->getBaseFieldDefinitions($entity_type_id);
    +          if (!isset($base_field_definitions[$field_name])) {
    +            continue;
    +          }
    +          $field_storage = $base_field_definitions[$field_name];
    +        }
    

    We should always use getFieldStorageDefinitions() which includes configurable and base fields.

  2. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +        // Process the fields that reference paragraphs.
    +        if ($field_storage && $field_storage->getSetting('target_type') === 'paragraph') {
    +          $definition = $entity_type_manager->getDefinition($entity_type_id, FALSE);
    +          // Skip non-revisionable entity types.
    +          if (!$definition || !$definition->isRevisionable()) {
    +            continue;
    +          }
    

    not having the entity type at all seems unlikely and should already fail earlier when getting the field storage definition, with the new code then at least. So either we can remove this or we need to check earlier with hasDefinition() and then remove the FALSE.

  3. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +
    +          // Skip non-SQL entity storage implementations.
    +          $storage = $entity_type_manager->getStorage($entity_type_id);
    +          if (!$storage || !$storage instanceof SqlEntityStorageInterface) {
    +            continue;
    +          }
    

    we could also check this before getting the field storage definitions.

  4. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +
    +  // Select the field values from the revision of the parent entity type.
    +  $query = $database->select($current_field['revision_table'], 'p');
    +
    +  // Join tables by paragraph revision IDs.
    +  $paragraphs_revision_key = $paragraph_definition->getKey('revision');
    +  $query->innerJoin($paragraph_revisions_data_table, 'i', "p.$revision_column = i.$paragraphs_revision_key");
    +  $query->fields('p', [$entity_id_column, $revision_column]);
    

    I find the p/i aliases confusing, I expected p to be the paragraph revision table. maybe f for field revision table and p for paragraph revision table?

  5. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +
    +  // Match the langcode so we can deal with revisions translations.
    +  $query->where('i.' . $paragraph_definition->getKey('langcode') . ' = ' . 'p.' . $current_field['langcode_column']);
    

    unecessary string concatentation: ' = ' . 'p.'

    Also, I'd just hardcode the name of the column on the paragraph revision table, that's not going to change, we know it is just langcocode.

  6. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +
    +  // Order the query by revision ID and limit the number of results.
    +  $query->orderBy("i.$paragraph_revision_id_key");
    +
    

    same here, we now what this is, no need to make it pseudo-generic. Looks like the variable is even set twice currently?

    Might not even need $paragraph_definition then in the end?

  7. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +  // Limit the number of processed paragraphs per run.
    +  $query->range(0, Settings::get('paragraph_limit', 100));
    +
    

    module name is paragraphs, so paragraphs_update_limit or so?

  8. +++ b/paragraphs.post_update.php
    @@ -113,3 +116,163 @@ function paragraphs_post_update_set_paragraphs_parent_fields(&$sandbox) {
    +    $sandbox['progress'] += Settings::get('paragraph_limit', 100);
    

    progress is set and set back to 0, but we don't seem to be using it? The finished progress just depends on the field index.

mbovan’s picture

Thanks for the review!

  1. ✔️Fixed.
  2. ✔️Improved.
  3. ✔️Fixed.
  4. ✔️Fixed.
  5. ✔️Fixed.
  6. ✔️I didn't change this since paragraph_limit was previously used in paragraphs_post_update_set_paragraphs_parent_fields()?
  7. ✔️Removed progress property.

  • Berdir committed d89fcea on 8.x-1.x authored by mbovan
    Issue #3041759 by mbovan, yongt9412, Berdir: Parent field revision...
Berdir’s picture

Status: Needs review » Fixed

Thanks, finally committed.

Berdir’s picture

Status: Fixed » Needs review
FileSize
2.34 KB

Small follow-up, we've seen in testing that if this is update is combined with the Drupal 8.7 update then it can get confused on whether taxonomy_term is already revisionable or not. Using the last installed definition now to check isRevisionable().

  • Berdir committed 9ad9718 on 8.x-1.x
    Issue #3041759 by Berdir: Improve parent revision rebuild post update to...
Berdir’s picture

Status: Needs review » Fixed

And committed that too.

John Pitcairn’s picture

Thanks, I'd like to a bit be more clear on what happens here. Is there:

A - an automatic hook_update_N() that removes unused paragraph entities with no current host entity? Or:

B - a site-builder triggerable process (drush?) that does the above? Or:

C - both, or

D - what does your average paragraphs user need to do?

Perhaps the issue summary needs to be updated to reflect what has been committed, and what will happen. It really isn't clear unless you have been actively involved as a developer on this issue.

Berdir’s picture

E: None of the other options ;)

This contains an automated update function that attempts to fix parent references in paragraphs, in case that got out of sync. That should only be an issue if you used advanced features like drag & drop or other things that might have resulted in the same paragraph having different parents in different revisions.

You don't have to do anything.

The deletion will happen in the related ERR issue #3016388: Manual cleanup process for obsolete composite entities, there it is a manual process, and there are also related issues of that to automatically delete unused paragraphs and paragraph revisions when their parents are getting deleted, we support some cases already but not all.

recrit’s picture

Status: Fixed » Needs review
FileSize
716 bytes

Re-opening this issue. The update does not take into account fields with a custom storage or computed values.
The attached patch excludes any field storage that "hasCustomStorage()".

recrit’s picture

Status: Needs review » Fixed

I did not realize that the $entity_field_manager->getFieldStorageDefinitions() already checks for isComputed(), so the patch #33 is not necessary.
My initial issue was that somehow some computed field names were stored in paragraphs_item_field_data as the parent_field_name. I'm still tracking that one down.

Berdir’s picture

@recrit: Yeah, that specific check shouldn't be necessary but I'm very open to adding more checks that table exists and stuff if you notice any problems with weird data.

Since we go through the fields and only query for those, it should be OK if you have weird values in the parent fields, if it used from one of the fields we loop over it should be updated, if not then we shoudn't touch it.

recrit’s picture

@Berdir, thanks. I'm still tracking down but the root cause is not the post update added in this issue.
I was able to reproduce the bug - the computed field paragraphs appear in the paragraphs_item_field_data and paragraphs_item_revision_field_data when un-publishing / archiving with the workbench moderation module.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.