Problem/Motivation

There is a set of situations where composite entities (such as Paragraphs) are not deleted from the database. This could lead to a huge amount of storage taken by revisions that may be no longer used.

Proposed resolution

Offer a Batch delete form and be careful with revisions that are still used.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#63 manual_cleanup_obsolote_composites-3016388-63-interdiff.txt4.07 KBBerdir
#63 manual_cleanup_obsolote_composites-3016388-63.patch37.38 KBBerdir
#60 interdiff-3016388-60-58.txt6.6 KBmbovan
#60 manual_cleanup_obsolote_composites-3016388-60.patch35.99 KBmbovan
#58 interdiff-3016388-58-57.txt930 bytesmbovan
#58 manual_cleanup_obsolote_composites-3016388-58.patch35.51 KBmbovan
#57 interdiff-3016388-57-53.txt5.06 KBmbovan
#57 manual_cleanup_obsolote_composites-3016388-57.patch35.38 KBmbovan
#56 interdiff-53.txt910 bytestbenice
#56 manual_cleanup_obsolote_composites-3016388-53.patch34.37 KBtbenice
#55 interdiff-53.diff910 bytestbenice
#55 manual_cleanup_obsolote_composites-3016388-53.patch34.37 KBtbenice
#53 manual_cleanup_obsolote_composites-3016388-53.patch34.37 KBtbenice
#43 interdiff-3016388-42-43.txt3.83 KBjohnchque
#43 manual_cleanup_obsolote_composites-3016388-43.patch34.06 KBjohnchque
#42 interdiff-3016388-40-42.txt7.61 KBjohnchque
#42 manual_cleanup_obsolote_composites-3016388-42.patch32.04 KBjohnchque
#41 interdiff-3016388-38-40.txt7.63 KBjohnchque
#40 interdiff-3016388-38-40.txt16.17 KBjohnchque
#40 manual_cleanup_obsolote_composites-3016388-40.patch31.28 KBjohnchque
#38 interdiff-3016388-37-38.txt14.74 KBjohnchque
#38 manual_cleanup_obsolote_composites-3016388-38.patch30.69 KBjohnchque
#37 interdiff-3016388-34-37.txt5.13 KBjohnchque
#37 manual_cleanup_obsolote_composites-3016388-37.patch28.63 KBjohnchque
#34 interdiff-3016388-32-34.txt7.12 KBjohnchque
#34 manual_cleanup_obsolote_composites-3016388-34.patch28.69 KBjohnchque
#32 interdiff-3016388-30-32.txt11.5 KBjohnchque
#32 manual_cleanup_obsolote_composites-3016388-32.patch28.64 KBjohnchque
#30 interdiff-3016388-25-30.txt2.29 KBjohnchque
#30 manual_cleanup_obsolote_composites-3016388-30.patch23.99 KBjohnchque
#26 manual_cleanup_obsolote_composites-3016388-25-interdiff.txt1.22 KBBerdir
#26 manual_cleanup_obsolote_composites-3016388-25.patch23.36 KBBerdir
#24 Selection_414.png12.32 KBBerdir
#24 manual_cleanup_obsolote_composites-3016388-24-interdiff.txt8.66 KBBerdir
#24 manual_cleanup_obsolote_composites-3016388-24.patch23.43 KBBerdir
#23 interdiff-3016388-21-23.txt15.94 KBjohnchque
#23 manual_cleanup_obsolote_composites-3016388-23.patch22.06 KBjohnchque
#21 interdiff-3016388-20-21.txt589 bytesjohnchque
#21 manual_cleanup_obsolote_composites-3016388-21.patch18.49 KBjohnchque
#20 interdiff-3016388-18-20.txt19.41 KBjohnchque
#20 manual_cleanup_obsolote_composites-3016388-20.patch18.53 KBjohnchque
#18 interdiff-3016388-17-18.txt8.97 KBjohnchque
#18 manual_cleanup_obsolote_composites-3016388-18.patch30.47 KBjohnchque
#17 interdiff-3016388-14-17.txt9.99 KBjohnchque
#17 manual_cleanup_obsolote_composites-3016388-17.patch27.77 KBjohnchque
#14 interdiff-3016388-13-14.txt16.98 KBjohnchque
#14 manual_cleanup_obsolote_composites-3016388-14.patch24.64 KBjohnchque
#13 interdiff-3016388-11-13.txt5.77 KBjohnchque
#13 manual_cleanup_obsolote_composites-3016388-13.patch23.06 KBjohnchque
#11 interdiff-3016388-7-11.txt8.39 KBjohnchque
#11 manual_cleanup_obsolote_composites-3016388-11.patch21.75 KBjohnchque
#7 interdiff-3016388-5-7.txt11.5 KBjohnchque
#7 manual_cleanup_obsolote_composites-3016388-7.patch23.06 KBjohnchque
#5 interdiff-3016388-3-5.txt658 bytesjohnchque
#5 manual_cleanup_obsolote_composites-3016388-5.patch21.37 KBjohnchque
#3 interdiff-3016388-2-3.txt16.82 KBjohnchque
#3 manual_cleanup_obsolote_composites-3016388-3.patch21.41 KBjohnchque
#2 2771531-16.patch275.6 KBjohnchque
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yongt9412 created an issue. See original summary.

johnchque’s picture

Status: Active » Needs review
FileSize
275.6 KB

Moving patch from META issue.

johnchque’s picture

Updating patch to insert the entities manually. :)

WIP: Will continue tomorrow.

The last submitted patch, 2: 2771531-16.patch, failed testing. View results

johnchque’s picture

Strange, I guess this prevented the test to be run.

Status: Needs review » Needs work

The last submitted patch, 5: manual_cleanup_obsolote_composites-3016388-5.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

johnchque’s picture

OK, almost working. :) Half of the test passing. Will investigate a bit more about the others. :)

Status: Needs review » Needs work

The last submitted patch, 7: manual_cleanup_obsolote_composites-3016388-7.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

johnchque’s picture

OK, just checking, not sure how it is possible with the UI to get orphan revisions on non revisionable composite relationships. Thought we fix that in #2834314: Support deleting translations of a composite reference and #2834034: Deleting a translation deletes composite references on translatable fields. Is there something that I am missing?

Do I need to just create the entities without a parent intentionally?

Berdir’s picture

Mostly it's about cleaning up stuff from *before* we did those fixes, but there are still pretty easy ways, for example simply stop using a paragraph that you created previously ("delete" it) without saving as a new revision.

johnchque’s picture

Updating tests. Thanks for the suggestion.

Status: Needs review » Needs work

The last submitted patch, 11: manual_cleanup_obsolote_composites-3016388-11.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

johnchque’s picture

I could finally make tests pass locally, let's see what testbot has to say. :)

johnchque’s picture

Cleaning tests.

BTW I think that for make the tests more reliable, we should maybe set the parent id field of the composite entity with a value of one that no longer exists instead of deleting the ones that we already have. Otherwise, this test can collide with #2953650: Composite entities are not removed on host delete if reference was removed in prior revision.

What do you think?

Berdir’s picture

Yes, definitely, we should create orphaned entities explicitly by-hand and not rely on bugs.

miro_dietiker’s picture

Status: Needs review » Needs work
  1. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,243 @@
    +    $checks[] = $this->t('Is not used in the default revision of the host entity.');
    

    One of, or all of them at the same time? ;-)

  2. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,243 @@
    +    $checks[] = $this->t('Is not used in the any translation of the host entity.');
    

    in the any?

  3. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,243 @@
    +    $checks[] = $this->t('Is not used in earlier revisions of the host entity in the same entity reference revisions field.');
    +    $form['description'] = [
    

    This is not 100% clear - what if the host ever changed?

johnchque’s picture

Updating the tests, need to rework the tests for counting the existing revisions. :) Will continue tomorrow.
WIP: Not addressing yet #16.

johnchque’s picture

Adding new scenarios that are needed to be checked. :)

johnchque’s picture

So wondering the results of the following scenarios:
// Scenario 1: A composite with a default revision that is referenced and an
// old revision that is not. Result: Only the old revision is deleted.

// Scenario 2: A composite with an old revision that is used and a default
// revision that is not. Result: Nothing should be deleted.

// Scenario 3: A composite with an old revision and a default revision both
// that are not used. Result: Should first delete the old revision and then
// the default revision. Delete the entity too.

// Scenario 4: A composite used in a EN node and a DE node. If the composite field is translatable.
// Delete the EN node.
Result:

// Scenario 5: A composite used in a EN node and a DE node. If the composite field is not translatable.
// Delete the EN node.
Result:

// Scenario 6: A composite used in a EN node and a DE node with many revisions. If the composite field is translatable.
// Delete all revisions from the EN node but the last one. Does it make a difference if we remove the DE revisions?
Result:

// Scenario 7: A composite used in a EN node and a DE node with many revisions. If the composite field is not translatable.
// Delete all revisions from the EN node but the last one. Does it make a difference if we remove the DE revisions?
Result:

What other scenario should we also work on?

johnchque’s picture

OK, updating to have only the needed scenarios. :)

johnchque’s picture

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/entity_reference_revisions.routing.yml
    @@ -0,0 +1,7 @@
    +entity_reference_revisions.delete_orphans:
    +  path: '/admin/config/entity-reference-revisions/delete-orphans'
    +  defaults:
    +    _form: 'Drupal\entity_reference_revisions\Form\OrphanedCompositeEntitiesDeleteForm'
    +    _title: 'Delete orphaned composite entities'
    +  requirements:
    +    _permission: 'administer site configuration'
    

    this is a strange path, things below admin/config should be in one of the defined sections, this one is a bit tricky, maybe just put it in system for now?

    The page title is also not very clear yet, plus we are not actually exposing this anywhere I think, no menu link or so.

    We should have that, with a title and description that explains what it is about a bit better.

    also not sure about the permission. This is a pretty destructive and very technical action, maybe a separate permission would make sense.

  2. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +    $checks = [];
    +    $checks[] = $this->t('Is not used in the default revision of the host entity.');
    +    $checks[] = $this->t('Is not used in any translation of the host entity.');
    +    $checks[] = $this->t('Is not used in any revision of the host entity.');
    +    $form['description'] = [
    +      '#markup' => $this->t('This form starts a batch to remove orphaned entities. An orphaned entity:'),
    +    ];
    +    $form['checks'] = [
    +      '#theme' => 'item_list',
    +      '#items' => $checks,
    +    ];
    

    I'm not sure if we really need a list like this.

    An orphaned entity is simply not used at all :)

    And with the switch to revisions, we could maybe instead be a bit more specific about that and explain that it will remove all orphaned composite entity revisions that are not referenced by anything. And if no revisions are left, then the entity itself will be deleted too.

  3. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +      $context['results'][] = $entity_type_id . ':' . $revision->getRevisionId();
    

    this could get huge. we only need the *count* of the results, not each id.

    Instead, I would just store a count of deleted revisions and deleted entities and then report those at the end.

  4. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +      $parent_entity_type = \Drupal::entityTypeManager()->getDefinition($parent_type, FALSE);
    

    maybe store the entity type manager in $entity_type_manager, should make it a bit easier to read.

    Then you could also use hasDefinition() for the condition below instead of that instanceof check.

  5. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +      // Check if the revision is used in any revision of the parent.
    +      $query = \Drupal::entityTypeManager()->getStorage($parent_type)->getQuery()
    +        ->allRevisions()
    +        ->condition($parent_field_name . '.target_revision_id', $revision->getRevisionId())
    +        ->condition($parent_entity_type->getKey('id'), $parent_id)
    +        ->range(0, 1)
    +        ->accessCheck(FALSE);
    

    I think there is no need to check for just that specific parent id, with translations and so on, it is possible that maybe this entity doesn't reference it anymore but the parent still does.

    So we don't even need the parent_id, we just need the field name and type.

    => New test case, have a composite entity that claims that Parent1 references it, which does not (might not even exist), but instead, Parent2 does. Must not be deleted.

  6. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +      // If this is the default revision, check if there are other revisions. If
    +      // there are not, delete the entity.
    +      if ($revision->isDefaultRevision()) {
    

    explain a bit clearer that we are looking for other revisions *of the composite entity.

    Also, maybe rename $revision to $composite_revision to make it clearer what's what.

  7. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +        $count = $entity_storage->getQuery()
    +          ->allRevisions()
    +          ->condition($entity_type->getKey('id'), $revision->id())
    +          ->count()
    +          ->execute();
    

    this does not include the accessCheck(FALSE) condition.

  8. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +        // Skip the revision deletion if this is the default revision.
    +        continue;
    +      }
    +      $storage->deleteRevision($revision->getRevisionId());
    

    maybe put the deleteRevision() call explicitly into an else {} block instead of the continue.

  9. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +    $context['message'] = t('Checked orphans for @entity_type (id: @entity_id): @current of @total', [
    +      '@entity_type' => $entity_type_id,
    +      '@entity_id' => isset($entity) ? $entity->id() : '',
    +      '@current' => $context['sandbox']['progress'],
    +      '@total' => $context['sandbox']['total'],
    +    ]);
    

    $entity doesn't exist anymore., should be $revision.

  10. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,264 @@
    +  public static function batchFinished($success, array $results, array $operations) {
    +    if ($success) {
    +      $message = \Drupal::translation()->formatPlural(count($results),
    +        'Removed 1 orphaned composite revision.',
    +        'Removed @count orphaned composite revisions.');
    +      drupal_set_message($message);
    +    }
    

    drupal_set_message() is deprecated.

    I think results should be keyed by entity type as well and this would then loop over that.

    Something like
    * Paragraph: Deleted 124 revisions (12 entities).
    * Something else: Deleted 461 revisions (20 entities).

    As separate messages.

    (entity type label first to avoid language problems with lowercase, plural and so on)

johnchque’s picture

Status: Needs work » Needs review
FileSize
22.06 KB
15.94 KB

OK, addressing all changes from the previous comment. Not totally sure about descriptions though. :)

Berdir’s picture

Did some manual testing and found a lot of bugs ;)

* parent entity types that don't support revisions were broken, e.g. taxonomy_term
* batch was broken in about 4 different ways, including core bugs that prevent us from using an entity query, updating the wrong key, having the wrong total, and never finishing because it was lowering the actual total so never reached the end.
* Added better reporting including time and deletions, removed the id as that doesn't say anything about the progress.

Next steps:
* Fix tests, they probably fail now due to changed strings and so on.
* Improve test coverage, switch to Settings::get('entity_update_batch_size', 50) so we can set the batch size to 1 or 2 in the test.
* Cover non-revisionable parent entity types

Status: Needs review » Needs work

The last submitted patch, 24: manual_cleanup_obsolote_composites-3016388-24.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

Another small update that uses loadMultipleRevisions(), which makes it a lot faster.

Status: Needs review » Needs work

The last submitted patch, 26: manual_cleanup_obsolote_composites-3016388-25.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

A test run against 800k paragraphs took 20min, it reported 8700 revisions deleted and 15661 entities. I already added a revision count to deleting the entity if no revisions are left, sounds like this site hat paragraphs that referenced a parent type that didn't exist anymore. Or actually, had no parent_type set at all.

Also, that no-valid-parent case still calls delete() which deletes the whole entity, it should only delete the revision as there might be other revisions that have a parent.

moshe weitzman’s picture

Nice work folks ... Perhaps someone can make a command/script that doe deletion reporting. I'd be happy to run it on the mass.gov DB which has lots of paragraphs.

johnchque’s picture

Assigned: Unassigned » johnchque
Issue summary: View changes
Status: Needs work » Needs review
FileSize
23.99 KB
2.29 KB

Great, so fixing tests. What is missing now:
* Improve test coverage, switch to Settings::get('entity_update_batch_size', 50) so we can set the batch size to 1 or 2 in the test.
* Cover non-revisionable parent entity types
* Refactor code to avoid code duplication.

Working on that right now. :)

johnchque’s picture

Issue summary: View changes
johnchque’s picture

Adding tests for non revisionable parents.
Refactoring the code.

Will work on the batch size test. :)

Berdir’s picture

+++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
@@ -0,0 +1,299 @@
+      // The parent entity type may no longer exist.
+      if (!$entity_type_manager->hasDefinition($parent_type)) {
+        OrphanedCompositeEntitiesDeleteForm::deleteOrphans($composite_revision, $composite_storage, $context['results'][$entity_type_id]['entity_count'], $context['results'][$entity_type_id]['revision_count']);
+        continue;
+      }
+
+      // Check if the revision is used in any revision of the parent, if that
+      // entity type supports revisions.
+      $query = $entity_type_manager->getStorage($parent_type)->getQuery()
+        ->condition($parent_field_name . '.target_revision_id', $composite_revision->getRevisionId())
+        ->range(0, 1)
+        ->accessCheck(FALSE);
+
+      if ($entity_type_manager->getDefinition($parent_type)->isRevisionable()) {
+        $query = $query->allRevisions();
+      }
+
+      $revisions = $query->execute();
+      // If there are parent revisions where this revision is used, skip it.
+      if (!empty($revisions)) {
+        continue;
+      }

I would invert it. Put this part into a method, return FALSE if there are no parents (=safe to delete), TRUE if there are.

You should need fewer arguments for that, and then the delete only needs to exist once and you don't have to put that into a method for now.

johnchque’s picture

All addressed. Hope this works good.

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,302 @@
    +  public static function hasParent($composite_revision, $parent_type, $parent_field_name) {
    

    I would just read the values again from the field in here, then you only need a single argument. That argument should have a type hint for ParagraphInterface, however.

  2. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -0,0 +1,302 @@
    +    // The parent entity type may no longer exist.
    +    if (!$entity_type_manager->hasDefinition($parent_type)) {
    +      return FALSE;
    +    }
    

    There are actually two cases here:

    There is a parent type but it doesn't exist anymore. Safe to delete.

    It's also possible that there is no parent_type at all. Which could for example mean that it has been created before 2016, when we added the parent_type field. And never updated, or only as a new revision.

    But we don't know and we have no idea who might still be using them. So they aren't safe to delete.

    which means maybe this function should be called isUsed() or so, and must return TRUE for parent_type = NULL. Including a test.

moshe weitzman’s picture

May I suggest a minor refactoring to submitForm() whereby the batch creation is separate from form handling. That way a Drush command can easily create the batch and then call drush_backend_batch_process()

johnchque’s picture

Thankfully I already had the scenario suggested, created. Updating tests for following that.

@moshe weitzman I think we are taking a bit longer than we should on this. Maybe that could be done in a follow-up? :)

johnchque’s picture

OK, so I created a service as suggested by @Berdir. Tests passing locally.

Not sure how to split the method deleteOrphansBatchOperation.

Berdir’s picture

Status: Needs review » Needs work
+++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
@@ -199,21 +195,28 @@
+
+    // If no parent type is set, the composite might be still used.
+    if ($parent_type == NULL) {
+      return TRUE;

the main point to document is that we are unable to test if it's still used because we don't know what to query for.

  1. +++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
    @@ -84,8 +84,7 @@
         $composite_entity_types = array_filter($form_state->getValue('composite_entity_types'));
         $operations = [];
         foreach ($composite_entity_types as $entity_type_id) {
    -      $callback = 'Drupal\entity_reference_revisions\Form\OrphanedCompositeEntitiesDeleteForm::deleteOrphansBatchOperation';
    -      $operations[] = [$callback, [$entity_type_id]];
    +      $operations[] = ['_entity_reference_revisions_orphan_manager_batch_dispatcher', ['entity_reference_revisions.orphan_manager:deleteOrphansBatchOperation', $entity_type_id]];
         }
    

    the finish callback also needs to go to the service I guess, not quite sure yet how that would work with drush but we'll see.

  2. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +/**
    + * Manages orphan composite revision deletion.
    + */
    +class EntityReferenceRevisionsOrphanManager {
    

    Manager is a pretty boring class name, anything can be a "Manager", doesn't say much about what it does.

    Maybe a ..Purger?

  3. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +    $entity_type_manager = \Drupal::entityTypeManager();
    

    this is already injected.

  4. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +      $context['finished'] = 1;
    +      $context['results'][$entity_type_id]['end'] = \Drupal::time()->getRequestTime();
    

    this can be injected

  5. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +    $interval = \Drupal::service('date.formatter')->formatInterval(\Drupal::time()->getRequestTime() - $context['results'][$entity_type_id]['start']);
    

    this too.

  6. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +      '@entity_id' => isset($composite_revision) ? $composite_revision->id() : '',
    

    I removed this from the message, so we can remove it here as well.

  7. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +   * @return bool
    +   *   Whether there are composite entity revisions used.
    

    description is a bit strange, mabe just "Whether the composite entity is still being used, FALSE if it is safe to delete.

  8. +++ b/src/EntityReferenceRevisionsOrphanManager.php
    @@ -0,0 +1,171 @@
    +    // If there are parent revisions where this revision is used, skip it.
    +    if (!empty($revisions)) {
    +      return TRUE;
    +    }
    +    return FALSE;
    

    can be simplified to just "return !empty($revisions);"

Looks pretty good, I think we want to get this in soon so we can start to reuse the API here in related issues.

One more thing: I would suggest we add a warning message to the UI that says this has the potential to delete composites that are still being used and it is recommended to create backup first.

johnchque’s picture

Status: Needs work » Needs review
FileSize
31.28 KB
16.17 KB

Addressing everything but 1. and adding the warning message.

johnchque’s picture

FileSize
7.63 KB

Updating the interdiff. :)

johnchque’s picture

OK, this addresses all comments above. :)

johnchque’s picture

So making the fields revisionable should help with this, had to update the test according to the change.

AMDandy’s picture

I ran this on a database with more than 8 million Paragraphs. It took about 80 hours and deleted 10% or so of the Paragraphs with no apparent ill effects (I checked older revisions, translations, and old translation revisions to make sure nothing was obviously missing.)

albertosilva’s picture

Thanks @yongt9412 for your patch!

Anyway, I'm facing a problem with paragraphs having parent_id set to NULL, and I think those paragraphs should be deleted:

mysql> select * from paragraphs_item_field_data where parent_id is NULL limit 10;
+-----+-------------+------------------+----------+------+--------+------------+-----------+-------------+-------------------+------------------+-------------------+-------------------------------+
| id  | revision_id | type             | langcode | uid  | status | created    | parent_id | parent_type | parent_field_name | default_langcode | behavior_settings | revision_translation_affected |
+-----+-------------+------------------+----------+------+--------+------------+-----------+-------------+-------------------+------------------+-------------------+-------------------------------+
| 551 |         551 | three_up_package | pt-br    |   56 |      1 | 1498144196 | NULL      | NULL        | NULL              |                1 | a:0:{}            |                             1 |
| 556 |         556 | three_up_package | pt-br    |   56 |      1 | 1498144352 | NULL      | NULL        | NULL              |                1 | a:0:{}            |                             1 |
| 561 |         561 | three_up_package | pt-br    |   66 |      1 | 1498144486 | NULL      | NULL        | NULL              |                1 | a:0:{}            |                             1 |
...

Do you think that modifying your patch to add a check for paragraphs without parent_id into isUsed() function (and then deleting them) could make sense?

Thank you!

miro_dietiker’s picture

We need to be very careful about those cases as there could be stale records where the parent_id was set wrong (or not at all). Thus best would be we provide a separate command to rebuild the parent_id data (optionally).

Making the process more aggressive is perfectly fine for a follow-up.

moshe weitzman’s picture

Status: Needs review » Needs work

@berdir let me know that this issue must wait for [#322354] and then adapt to it.

moshe weitzman’s picture

Using latest patch, I got an error after submitting the form

An AJAX HTTP error occurred.
HTTP Result Code: 200
Debugging information follows.
Path: /batch?id=3334149&op=do_nojs&op=do
StatusText: OK
ResponseText: Error: Call to a member function getColumns() on boolean in Drupal\Core\Entity\Query\Sql\Tables->addField() (line 228 of /var/www/mass.local/docroot/core/lib/Drupal/Core/Entity/Query/Sql/Tables.php).

Following the link for more info told me An error occurred while processing _entity_reference_revisions_orphan_purger_batch_dispatcher with arguments : _entity_reference_revisions_orphan_purger_batch_dispatcher

andypost’s picture

miro_dietiker’s picture

Didn't check in deep detail, but it seems chx proposes a completely different approach:
https://twitter.com/chx/status/1098624319624642560
https://gitlab.com/chx_/nope

Maybe worth double checking?

yobottehg’s picture

I checked both this patch and the module from chx with a real site:
- 5 languages
- 80.000 paragraph entities
- paragraphs in paragraphs
- symetrical translations
- node revisions get automatically cleaned up (node_revisions_autoclean) to at max 10 remaining.

This patch works but leaves a lot of trash still in the database as only 11% of the paragraph entities were deleted. (I assume that because of the deleted node revisions likely 60 to 70 percent of the paragraphs should be garbage.

The module from chx just threw an error (calling ->delete() on null) without any other explanation.

This issue is on "needs work". May i ask whats the current plan here?

johnchque’s picture

@yobottehg I think we need to focus on #2904231: Parent fields are not revisionable first and update the parent fields correctly so we can use this patch again.

tbenice’s picture

Hi, not sure where this issue is at this point but I needed to try it. I had trouble and so am modifying the patch a little.

The site I'm working with had a field_paragraph (in the past) that was and entity_reference_field but later was deleted and re-created as a entity_reference field. Of course there were then orphaned paragraphs with parent field of field_paragraph. So, in the isUsed() method the query failed with an 'invalid specifier target_revision_id' error since the field_paragraph was no longer an entity_reference_revisions field.

So...I just added a check if the field is a valid one before the query. I'm attaching the altered patch from #43 in case it is still needed.

Berdir’s picture

Status: Needs work » Needs review

HI, thanks for improving the patch.

It's extremely helpful to provide an interdiff when changing something in an existing patch, because that makes it so much easier to review what you did. see https://www.drupal.org/documentation/git/interdiff

The easiest way to do so is to apply a patch with git apply -3 (or another way that results in all changes being staged), then the full patch is git diff HEAD and the interdiff is git diff.

tbenice’s picture

Makes sense here is the improved patch with interdiff.

tbenice’s picture

again with correct extension for interdiff.

mbovan’s picture

This patch addresses a suggestion from #36.

#56 makes sense. Is it worth to write a test for this case?

mbovan’s picture

+++ b/src/EntityReferenceRevisionsOrphanPurger.php
@@ -0,0 +1,280 @@
+    if ($parent_field_config->get('type') != 'entity_reference_revisions') {
+      return FALSE;
+    }

Discussed #56 with @Berdir, and this would actually affect entities that do not use entity_reference_revisions field type such as the one in Entity Reference with Layout module.

Thus, we decided to add an explicit check for target_revision_id schema column. This means, the described use case in #53 will not be supported with the latest updates.

mbovan’s picture

Assigned: johnchque » Unassigned

Created a follow-up issue to implement a Drush command in #3081700: Implement a Drush command to clean obsolete composite entities .

mbovan’s picture

Created a follow-up #3081765: Allow users to remove composite revisions without parent id value in the manual cleanup process to cover #45, #46 and #53. I assume this will cover #51 until some degree as well.

@moshe weitzman I am not sure under what conditions #48 happens? Does it still happen on your project with the patch #60?

To summarize, I went through all the comments above and didn't notice anything remaining that we could work on here...

miro_dietiker’s picture

Priority: Normal » Major

Promoting for current relevancy and dependency for other progress.

Berdir’s picture

The schema check introduced a performance regression, refactored it a bit to store the result in a property per entity type/field.

Berdir’s picture

Status: Needs review » Fixed

Committed!

  • Berdir committed ac17b1c on 8.x-1.x
    Issue #3016388 by yongt9412, tbenice, Berdir, mbovan: Manual cleanup...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

arsn’s picture

Hello. For end user reference:

  • Documentation can be found here: admin/help/entity_reference_revisions
  • UI for v.1.7 new feature "Orphaned Paragraph cleanup": admin/config/system/delete-orphans

Thank you

dqd’s picture

+1 for this fix and hard work on it. Thanks to yongt9412, tbenice, Berdir, mbovan! Greetings from Berlin and please stay safe!

PS; should we start a documentation page on D.O. reflecting this?

awasson’s picture

I've been following this issue and the meta issues for a year or so and I wonder if I can pester you all for some clarification.

Is the addition of the new feature "Orphaned Paragraph cleanup": admin/config/system/delete-orphans supposed to resolve the issue that results from rendering a deleted paragraph entity in a view?

For instance if I have a view that's purpose is to render a paragraph field (one or more) and has a contextual filter on the node ID so it's limited to a per node arrangement. It seems that if I add one or more paragraphs from the node editing form they will show up in the view. As noted in the issue, the problem is that if I delete them or even edit them they remain in the view the way they did when they were originally created.

I thought that the new feature "Orphaned Paragraph cleanup" would resolve this but now I think I must be mistaken.

I'm using Entity Reference Revisions 8.x-1.8 and if I try to use "Orphaned Paragraph cleanup", even when I delete the paragraph instances that I don't want, no orphans are detected or deleted and they still show up in the view.

Thanks,
Andrew

gatorjoe’s picture

I confirm the same behavior in #69. I have removed a paragraph, run the cleanup but there are no paragraphs detected, therefore no removals. The deleted paragraph remains in the view as well.

Berdir’s picture

If you are using revisions then the old paragraphs aren't orphaned, they are still used by the old revision. Views support for ERR/paragraphs is incomplete and has bee since forever, see #2799479: Views doesn't recognize relationship to host, I don't understand why so many try to display paragraphs with views and despite that, nobody is spending time on properly fixing that issue or paying someone to do it :)

jigarius’s picture

Cool! Thanks for the solution. I just wanted to mention that for me, the batch stays stuck at 0.

Checked Paragraph revisions for orphans: 0 of 6132 in 29 min 28 sec (0 deleted)

For some reason, $composite_storage->loadMultipleRevisions($entity_revision_ids) always returns an empty array. I wonder if anyone else has run into this issue. If not, I guess I'll have to dig deeper.

Berdir’s picture

I'd suggest you create a new issue. Unsure why this would happen, sounds like some kind of strange edge case where it doesn't manage to count up. You'll have to debug through the batch process to find out what it's getting confused about. A field without paragraphs? A certain type of broken references? Not sure.