When a referenced entity is deleted, the references to that entity are not deleted.

I have a node with an entityreference field that references multiple users. If one of those user accounts is deleted the node still has a reference to the now non-existant user-id.
This causes Views to associate my node with Anonymous user, displayed reference counters being off by one, filters and access checks based on the existence of a reference failing.
The same problem occurs with all other reference types.

I see three possible solutions:
1. implement hook_entity_delete and entity_load all entities referencing the deleted entity, unset their referencing field and entity_save them.
2. clean up invalid references on cron runs, like http://drupal.org/sandbox/claudiucristea/1263038 does.
3. accept that references may be invalid and avoid using EntityFieldQuerys with reference fields, avoid counting references without joining in the referenced entities, add extra filter to the referenced entity fields in Views.

Option 1 has horrible performance implications on larger sites.
Option 2 still allows a window of inconsistent data.
Option 3 seems to be the one to use, but I am hoping for better suggestions.
All options also fail to prevent deletion of entities referenced by required fields.

This is not a new problem with referencing modules by the way:
References:
#1346696: On deletion of node/user, references to it are not removed
#1215070: Node references not removed when a node is deleted
#1212464: Reference data still resides in the database after deleting the node
#589714: Nodereference filtering: invalid reference should be same as no reference
Taxonomy:
#1281114: Database records not deleted for Term Reference Fields after Term is Deleted

Files: 
CommentFileSizeAuthor
#53 entityreference-n1368386-53.patch1.64 KBinghamn
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es).
[ View ]
#47 entityreference-n1368386-47.patch6.86 KBDamienMcKenna
PASSED: [[SimpleTest]]: [MySQL] 119 pass(es).
[ View ]
#45 entityreference-cleanup-1368386-44.patch7.82 KBolofjohansson
PASSED: [[SimpleTest]]: [MySQL] 119 pass(es).
[ View ]
#43 entityreference-cleanup-1368386-43.patch10.34 KBolofjohansson
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in sites/default/modules/entityreference/entityreference.module.
[ View ]
#34 entityreference.tar_.gz30.69 KBarcane
#31 entityreference-cleanup-1368386-31.patch6.33 KBchriscohen
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]
#28 entityreference-validate-integrity.patch7.84 KBchriscohen
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entityreference-validate-integrity.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]
#21 entityreference-cleanup-1368386-21.patch6.3 KBJvE
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]
#15 entityreference-cleanup-1368386-15.patch5.73 KBJvE
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]
#11 entityreference-cleanup-1368386-11.patch3.49 KBJvE
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]

Comments

Title:Delete references to deleted entities.Delete references to deleted entities
Category:bug» feature

The title suggests this is actually a feature request. Let's see what we can do.

How is this not a bug?

It would also be great if there was an option to cascade on delete, so that referenced entities are also deleted (perhaps also check so no other field-instances are referring to them). I need this functionality so will probably submit a patch in the near future (or if this should be a separate module).

Im having the same trouble with you.

I created a new thread for this issue (cascade on delete) to not pollute this one with OT. http://drupal.org/node/1417860

Thanks, that will be helpful :)

I've been keeping an eye on the other issues about references to deleted content and so far the most succesful "solution" seems to be to do cleanup of invalid references on cron runs.

Is this the way Entity reference should tackle this problem as well?
If so, then I'm going to work a bit on developing that. Maintaining solution 3 is getting harder and harder.

Still no idea on what to do with required references though. Should I make a separate issue for that?

Actually I don't agree, it's works perfectly fine, and is a better solution, to use EFQ to pick out the references on delete and purge them immediately. Doing it on cron is both more inefficient, and references will be left broken for a certain amount of time in between runs.

That would be solution 1 I mentioned. It is by far the cleanest way, but kills performance on large entity counts.

Unless you are proposing to remove the references from the database without going through entity_load and entity_save.
Wouldn't that deny modules the ability to react to the removal of the reference?
And prevent entities last_changed dates from being updated?
And cached versions of the entities being flushed?

Yes, one should of course go through the entity api for deletes etc, but for performance to become an issue it would have to be a very special user-case.

When I get some time (looks at boss) I will do some benchmarking of cleaning entity references.

I also came across #89181: Use queue API for node and comment, user, node multiple deletes (from comment #36 onwards) which seems to indicate that such bulk updates can indeed be a performance problem and that a solution is in progress (for D8).

Status:Active» Needs review
StatusFileSize
new3.49 KB
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]

I've gone and implemented hook_entity_delete() to clean up references.

After applying this patch you can enable this functionality in the field settings of each entityreference field.

Awesome, the patch worked for me!

Status:Needs review» Needs work

You definitely want to use a queue for this.

I'm all for using a queue if there is some way to block all other activity while the queue is being processed. Maintenance mode?

If you are willing to accept a time-period during which references may or may not be valid then you might as well not enable this functionality (which solves the problem of not knowing whether or not references are valid).

Ideas?

Status:Needs work» Needs review
StatusFileSize
new5.73 KB
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]

Here's a version of the patch with some simple queueing.

How to test this? Does this rely on cron to execute the queue items?

Yes, it processes entities it didn't do immediately on subsequent cron runs.
I'm thinking that the max number to process per content-type (currently 40) should be configurable at the field config. No time to change that today, maybe later this week.

So to test: delete an entity that is referenced by more than 40 entities of a certain content-type. It should leave invalid references for all entities that didn't fit in the first 40 until the next cron run.

Like I said, I'm not too happy with the queueing since it still allows a window in which the references are invalid and can crash stuff. So stick with the patch from #11 if you want all references to be claned immediately unless you run into memory or timeout problems.

I'm thinking that the max number to process per content-type (currently 40) should be configurable at the field config. No time to change that today, maybe later this week.

With 0 meaning "no limit"?

I am having this issue so I thought I might suggest something. To summarize the problems so far:

  1. You can't delete broken references on hook_entity_delete() because there might be so many that the page request times out.
  2. You can't queue the references for later deletion (on cron or similar) because until the cron run occurs, you have broken references, which is unacceptable.

The thing preventing #2 from being viable is that it leaves broken references until an unknown point in the future. If it was possible to deal with the broken references, it would be possible to stick things in a queue for processing later on.

What about changing the way that entityreference loads fields, so that it when it populates 'target_id', it checks to see if the target actually exists? This would mean that until the cleanup, it wouldn't matter if the fields are broken, since when the entityreferences are loaded, the broken ones are skipped.

Is that a viable solution? It's conceptual only I'm afraid, as I have no idea how this might be manipulated.

Yes, I think #19 is something we should definitely examine.

StatusFileSize
new6.3 KB
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]

On my site I'm using the patch from #11 because it handles hundreds of references in just seconds.
I'm glad chriscohen agrees with me on the queueing thing, but I don't think it's feasible to validate the references on loading them. You'll quickly run into problems with caching and performance.

Perhaps we can mark all the references as deleted (a relatively quick action) immediately and do the full deleting which triggers all the hooks and rules and such later?

I've attached a patch exploring that idea.

Queuing, combined with Batch API perhaps? I believe OG uses that tactic when deleting a group node for which its group items need to be reassigned.

The 'flag deleted' might be problematic when access control is in play and probably cached. Option 1 would be the best with respect to data integrity AND maintaining hook invocations AND purging/refreshing caches at the correct moment.

I would suggest option 1, with:
- Batch API
- Field configuration option for retaining updated timestamp on referenced entities

I am using #11 in a project. Can we get patch #11 applied to 7.x-1.0-rc3

I agree. The dereference functionality is optional, default off and comes with a warning. No known side effects.
Let's get that in first to solve the immediate reference integrity problem while we continue work on potential solutions for potential performance issues.

Any time frame on #24?

I have implemented something like this for myself and I agree this is a bug and has referential integrity implications. Now if this is ever committed, it could be an issue for people who have already implemented hook_enitty_delete in their custom modules, like myself. Just thought of bringing this up.

I'd imagine people that solve this in their custom modules will simply not enable it for fields their modules take care of until those modules are adjusted.

All these patches
#11 "immediately clean up invalid references"
#15 "do a bunch immediately and the rest on cron"
#21 "mark all deleted immediately and clean up properly on cron"
add an option to the field definition to enable this which is disabled by default.

So unless someone enables the cleanup functionality specifically for each field they want it on, nothing will happen with one of these patches applied.

StatusFileSize
new7.84 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entityreference-validate-integrity.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

So here's my version of a fix. This will validate each field on load, using a simple EntityFieldQuery, which I don't believe will have a huge impact on performance, and this also includes the patch from #11, which removes deleted references on a cron run.

I thought I would throw this in there in the interim, despite the good work that has been occurring on a 'mark as deleted' solution in the meantime.

Status:Needs review» Needs work

The last submitted patch, entityreference-validate-integrity.patch, failed testing.

Status:Needs work» Needs review

Sorry, just noticed significant problems with the above patch in certain conditions. I'd avoid using it!

StatusFileSize
new6.33 KB
PASSED: [[SimpleTest]]: [MySQL] 66 pass(es).
[ View ]

#21 looks very good to me. I did have one issue where, in entityreference_entity_delete, $entity could be passed in without $entity->bundle being set, in the case where a particular entity only has one bundle. I have submitted a revised patch which checks and 'fixes' this.

However, I am not entirely up to speed with the entity system, and it might be that my custom entity is doing something wrong, which is causing the problem, and the original patch is not at fault. If someone knows, that'd be great.

Just noticed patch in #31 (and #21) does not correctly implement coding standards for the PHPDoc comment on entityreference_dereference() so that should be addressed at some point!

Can we get #21 committed? If not, can we outline the shortcomings of the solution in #21 and propose a better solution?

StatusFileSize
new30.69 KB

Here is a tgz file of patch #11 applied to the latest 7.x-1.0-rc3

Note that I posted a sandbox module, Field reference delete, which may be of interest here. It takes care of cleaning up the stale references in the case where the entity reference fields are stored in an SQL database (but doesn't work for fields stored elsewhere, so it isn't a 100% theoretically complete solution).

If I read the patch in this issue correctly, it seems to be taking the approach that entity_save() should always be called to record the changes. But it's actually not clear that's a good idea; saving a bunch of entities that didn't actually change (but that just had some stale references cleaned up) could actually lead to unwanted side effects! See @sun's point #1 at #556022-68: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do for discussion of a similar problem.

I agree. Simply getting rid of the references was my first approach but the consensus was that the API should be used so modules could react to entities losing relationships.
I guess it boils down to how you see it. Either as a cleanup of bug-causing cruft left behind by the entityreference module, or as the separation of two previously related entities.

"Field reference delete" does look interesting. I'll take it for a spin later.

I think since there are different scenarions (noSQL, using queue VS on the fly) ER should take the simple approach similar to @David_Rothstein implementation, only done as a behavior -- so site-admins can disable it and have their own.

As well as cleaning up references, I think there are other use cases too. (Possibly slightly off-topic for this particular thread, as I'm not sure whether these are things that should be in different modules from David_Rothstein's project.)

Suppose we have a reference field on A pointing to B: A --> B. I can think of the following configurations:

- B may not be deleted while the reference exists. Eg, don't delete taxonomy terms that are in use
- A may not be deleted while the ref exists. Eg, don't delete product nodes that point to active products
- deleting A deletes B. Eg, the reference is being used in a way that considers B a 'sub-entity' of A in some way.
- deleting B deletes A. Eg, when deleting a project, delete all issues that belong to that project

Furthermore, for entities that support a concept of status (nodes, users, products that I know of), it can be desirable to have changes in status cascade either one way or the other (or even both?!).

I'm using the code from #11, but it doesn't work well if its a user referencing a node (user -> company, then delete the company for example), as it gives a foreach error. It looks like this is because on line 1221 this uses the language from the entity (rather than the field) to grab the field values.

I think this would likely be best done either using either entity api wrappers so you don't have to deal with language (its done for you) or the function field_language() to get the language of the actual field instead of the language of the entity.

My users often don't have a language set on the entity, I believe because this is not set with the default language like a node, rather it is only set if the user actually selects their language, though I could be wrong on this part. I'll try to post a patch this afternoon if I have time.

Is there any Movement on this. We find it as an issue at our end as well.

I'm using this module on my side ! It work great for me...

http://drupal.org/project/ercd

After implemented several solutions that require a scalable solution (mass delete in OG, mass notifications send in message-subscribe) I now think that maybe should integrate with Advanced-Queue module.
See og/og-vocab/message-subscribe for examples.

StatusFileSize
new10.34 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in sites/default/modules/entityreference/entityreference.module.
[ View ]

For those who are interested, here's #31 patched agains 7.x-1.0.

Status:Needs review» Needs work

The last submitted patch, entityreference-cleanup-1368386-43.patch, failed testing.

StatusFileSize
new7.82 KB
PASSED: [[SimpleTest]]: [MySQL] 119 pass(es).
[ View ]

And here's a patch that should work. Again, #31 patched against 7.x-1.0.

Status:Needs work» Needs review

StatusFileSize
new6.86 KB
PASSED: [[SimpleTest]]: [MySQL] 119 pass(es).
[ View ]

Rerolled against -dev, haven't really tested it yet.

The patch in #47 works for me with an EntityReference field pointing to a Bean object - I used to get a colorful error, with the option enabled the reference is now removed from the parent object correctly.

One question - from a usability perspective, shouldn't the reference value be removed automatically? Are there any use cases where it should not remove the reference? Right now it gives a great big error, i.e. your site breaks.

@DamienMcKenna: I agree with you, but "the community" raised these points:
Case #1: Modules may want to react on an entity losing a reference. Especially if the field is marked as #required. This lead to using full entity_save().
Case #2: There may be many references to the deleted entity. It could be the performance hit of removing all those references is be too big to do them all immediately. This lead to queuing.

Patch in #47 worked nicely for me.

StatusFileSize
new1.64 KB
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es).
[ View ]

Why not just delete the rows immediately, instead of queing? It shouldn't take too long, and would not bring in all the complexity of relying on yet other modules.

Because that will bypass all hooks and that FieldAPI will invoke. For instance, that won't work with an OG group field.