When a referenced entity is deleted, the references to that entity are not deleted.

I have a node with an entityreference field that references multiple users. If one of those user accounts is deleted the node still has a reference to the now non-existant user-id.
This causes Views to associate my node with Anonymous user, displayed reference counters being off by one, filters and access checks based on the existence of a reference failing.
The same problem occurs with all other reference types.

Already exiting contributed module that tries to handles this - https://www.drupal.org/project/field_reference_delete

I see three possible solutions:
1. implement hook_entity_delete and entity_load all entities referencing the deleted entity, unset their referencing field and entity_save them.
2. clean up invalid references on cron runs, like http://drupal.org/sandbox/claudiucristea/1263038 does.
3. accept that references may be invalid and avoid using EntityFieldQuerys with reference fields, avoid counting references without joining in the referenced entities, add extra filter to the referenced entity fields in Views.

Option 1 has horrible performance implications on larger sites.
Option 2 still allows a window of inconsistent data.
Option 3 seems to be the one to use, but I am hoping for better suggestions.
All options also fail to prevent deletion of entities referenced by required fields.

This is not a new problem with referencing modules by the way:
References:
#1346696: On deletion of node/user, references to it are not removed
#1215070: Node references not removed when a node is deleted
#1212464: Reference data still resides in the database after deleting the node
#589714: Nodereference filtering: invalid reference should be same as no reference
Taxonomy:
#1281114: Database records not deleted for Term Reference Fields after Term is Deleted

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Title: Delete references to deleted entities. » Delete references to deleted entities
Category: bug » feature

The title suggests this is actually a feature request. Let's see what we can do.

gnucifer’s picture

How is this not a bug?

It would also be great if there was an option to cascade on delete, so that referenced entities are also deleted (perhaps also check so no other field-instances are referring to them). I need this functionality so will probably submit a patch in the near future (or if this should be a separate module).

sunyu0072007’s picture

Im having the same trouble with you.

gnucifer’s picture

I created a new thread for this issue (cascade on delete) to not pollute this one with OT. http://drupal.org/node/1417860

sunyu0072007’s picture

Thanks, that will be helpful :)

JvE’s picture

I've been keeping an eye on the other issues about references to deleted content and so far the most succesful "solution" seems to be to do cleanup of invalid references on cron runs.

Is this the way Entity reference should tackle this problem as well?
If so, then I'm going to work a bit on developing that. Maintaining solution 3 is getting harder and harder.

Still no idea on what to do with required references though. Should I make a separate issue for that?

gnucifer’s picture

Actually I don't agree, it's works perfectly fine, and is a better solution, to use EFQ to pick out the references on delete and purge them immediately. Doing it on cron is both more inefficient, and references will be left broken for a certain amount of time in between runs.

JvE’s picture

That would be solution 1 I mentioned. It is by far the cleanest way, but kills performance on large entity counts.

Unless you are proposing to remove the references from the database without going through entity_load and entity_save.
Wouldn't that deny modules the ability to react to the removal of the reference?
And prevent entities last_changed dates from being updated?
And cached versions of the entities being flushed?

gnucifer’s picture

Yes, one should of course go through the entity api for deletes etc, but for performance to become an issue it would have to be a very special user-case.

JvE’s picture

When I get some time (looks at boss) I will do some benchmarking of cleaning entity references.

I also came across #89181: Use queue API for node and comment, user, node multiple deletes (from comment #36 onwards) which seems to indicate that such bulk updates can indeed be a performance problem and that a solution is in progress (for D8).

JvE’s picture

Status: Active » Needs review
FileSize
3.49 KB

I've gone and implemented hook_entity_delete() to clean up references.

After applying this patch you can enable this functionality in the field settings of each entityreference field.

arcane’s picture

Awesome, the patch worked for me!

bojanz’s picture

Status: Needs review » Needs work

You definitely want to use a queue for this.

JvE’s picture

I'm all for using a queue if there is some way to block all other activity while the queue is being processed. Maintenance mode?

If you are willing to accept a time-period during which references may or may not be valid then you might as well not enable this functionality (which solves the problem of not knowing whether or not references are valid).

Ideas?

JvE’s picture

Status: Needs work » Needs review
FileSize
5.73 KB

Here's a version of the patch with some simple queueing.

arcane’s picture

How to test this? Does this rely on cron to execute the queue items?

JvE’s picture

Yes, it processes entities it didn't do immediately on subsequent cron runs.
I'm thinking that the max number to process per content-type (currently 40) should be configurable at the field config. No time to change that today, maybe later this week.

So to test: delete an entity that is referenced by more than 40 entities of a certain content-type. It should leave invalid references for all entities that didn't fit in the first 40 until the next cron run.

Like I said, I'm not too happy with the queueing since it still allows a window in which the references are invalid and can crash stuff. So stick with the patch from #11 if you want all references to be claned immediately unless you run into memory or timeout problems.

John Pitcairn’s picture

I'm thinking that the max number to process per content-type (currently 40) should be configurable at the field config. No time to change that today, maybe later this week.

With 0 meaning "no limit"?

chriscohen’s picture

I am having this issue so I thought I might suggest something. To summarize the problems so far:

  1. You can't delete broken references on hook_entity_delete() because there might be so many that the page request times out.
  2. You can't queue the references for later deletion (on cron or similar) because until the cron run occurs, you have broken references, which is unacceptable.

The thing preventing #2 from being viable is that it leaves broken references until an unknown point in the future. If it was possible to deal with the broken references, it would be possible to stick things in a queue for processing later on.

What about changing the way that entityreference loads fields, so that it when it populates 'target_id', it checks to see if the target actually exists? This would mean that until the cleanup, it wouldn't matter if the fields are broken, since when the entityreferences are loaded, the broken ones are skipped.

Is that a viable solution? It's conceptual only I'm afraid, as I have no idea how this might be manipulated.

bojanz’s picture

Yes, I think #19 is something we should definitely examine.

JvE’s picture

On my site I'm using the patch from #11 because it handles hundreds of references in just seconds.
I'm glad chriscohen agrees with me on the queueing thing, but I don't think it's feasible to validate the references on loading them. You'll quickly run into problems with caching and performance.

Perhaps we can mark all the references as deleted (a relatively quick action) immediately and do the full deleting which triggers all the hooks and rules and such later?

I've attached a patch exploring that idea.

teezee’s picture

Queuing, combined with Batch API perhaps? I believe OG uses that tactic when deleting a group node for which its group items need to be reassigned.

The 'flag deleted' might be problematic when access control is in play and probably cached. Option 1 would be the best with respect to data integrity AND maintaining hook invocations AND purging/refreshing caches at the correct moment.

I would suggest option 1, with:
- Batch API
- Field configuration option for retaining updated timestamp on referenced entities

arcane’s picture

I am using #11 in a project. Can we get patch #11 applied to 7.x-1.0-rc3

JvE’s picture

I agree. The dereference functionality is optional, default off and comes with a warning. No known side effects.
Let's get that in first to solve the immediate reference integrity problem while we continue work on potential solutions for potential performance issues.

arcane’s picture

Any time frame on #24?

awm’s picture

I have implemented something like this for myself and I agree this is a bug and has referential integrity implications. Now if this is ever committed, it could be an issue for people who have already implemented hook_enitty_delete in their custom modules, like myself. Just thought of bringing this up.

JvE’s picture

I'd imagine people that solve this in their custom modules will simply not enable it for fields their modules take care of until those modules are adjusted.

All these patches
#11 "immediately clean up invalid references"
#15 "do a bunch immediately and the rest on cron"
#21 "mark all deleted immediately and clean up properly on cron"
add an option to the field definition to enable this which is disabled by default.

So unless someone enables the cleanup functionality specifically for each field they want it on, nothing will happen with one of these patches applied.

chriscohen’s picture

So here's my version of a fix. This will validate each field on load, using a simple EntityFieldQuery, which I don't believe will have a huge impact on performance, and this also includes the patch from #11, which removes deleted references on a cron run.

I thought I would throw this in there in the interim, despite the good work that has been occurring on a 'mark as deleted' solution in the meantime.

Status: Needs review » Needs work

The last submitted patch, entityreference-validate-integrity.patch, failed testing.

chriscohen’s picture

Status: Needs work » Needs review

Sorry, just noticed significant problems with the above patch in certain conditions. I'd avoid using it!

chriscohen’s picture

#21 looks very good to me. I did have one issue where, in entityreference_entity_delete, $entity could be passed in without $entity->bundle being set, in the case where a particular entity only has one bundle. I have submitted a revised patch which checks and 'fixes' this.

However, I am not entirely up to speed with the entity system, and it might be that my custom entity is doing something wrong, which is causing the problem, and the original patch is not at fault. If someone knows, that'd be great.

chriscohen’s picture

Just noticed patch in #31 (and #21) does not correctly implement coding standards for the PHPDoc comment on entityreference_dereference() so that should be addressed at some point!

chriscohen’s picture

Can we get #21 committed? If not, can we outline the shortcomings of the solution in #21 and propose a better solution?

arcane’s picture

FileSize
30.69 KB

Here is a tgz file of patch #11 applied to the latest 7.x-1.0-rc3

David_Rothstein’s picture

Note that I posted a sandbox module, Field reference delete, which may be of interest here. It takes care of cleaning up the stale references in the case where the entity reference fields are stored in an SQL database (but doesn't work for fields stored elsewhere, so it isn't a 100% theoretically complete solution).

If I read the patch in this issue correctly, it seems to be taking the approach that entity_save() should always be called to record the changes. But it's actually not clear that's a good idea; saving a bunch of entities that didn't actually change (but that just had some stale references cleaned up) could actually lead to unwanted side effects! See @sun's point #1 at #556022-68: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do for discussion of a similar problem.

JvE’s picture

I agree. Simply getting rid of the references was my first approach but the consensus was that the API should be used so modules could react to entities losing relationships.
I guess it boils down to how you see it. Either as a cleanup of bug-causing cruft left behind by the entityreference module, or as the separation of two previously related entities.

"Field reference delete" does look interesting. I'll take it for a spin later.

amitaibu’s picture

I think since there are different scenarions (noSQL, using queue VS on the fly) ER should take the simple approach similar to @David_Rothstein implementation, only done as a behavior -- so site-admins can disable it and have their own.

joachim’s picture

As well as cleaning up references, I think there are other use cases too. (Possibly slightly off-topic for this particular thread, as I'm not sure whether these are things that should be in different modules from David_Rothstein's project.)

Suppose we have a reference field on A pointing to B: A --> B. I can think of the following configurations:

- B may not be deleted while the reference exists. Eg, don't delete taxonomy terms that are in use
- A may not be deleted while the ref exists. Eg, don't delete product nodes that point to active products
- deleting A deletes B. Eg, the reference is being used in a way that considers B a 'sub-entity' of A in some way.
- deleting B deletes A. Eg, when deleting a project, delete all issues that belong to that project

Furthermore, for entities that support a concept of status (nodes, users, products that I know of), it can be desirable to have changes in status cascade either one way or the other (or even both?!).

kevinob11’s picture

I'm using the code from #11, but it doesn't work well if its a user referencing a node (user -> company, then delete the company for example), as it gives a foreach error. It looks like this is because on line 1221 this uses the language from the entity (rather than the field) to grab the field values.

I think this would likely be best done either using either entity api wrappers so you don't have to deal with language (its done for you) or the function field_language() to get the language of the actual field instead of the language of the entity.

My users often don't have a language set on the entity, I believe because this is not set with the default language like a node, rather it is only set if the user actually selects their language, though I could be wrong on this part. I'll try to post a patch this afternoon if I have time.

pravin.katragadda’s picture

Is there any Movement on this. We find it as an issue at our end as well.

sw3b’s picture

I'm using this module on my side ! It work great for me...

http://drupal.org/project/ercd

amitaibu’s picture

After implemented several solutions that require a scalable solution (mass delete in OG, mass notifications send in message-subscribe) I now think that maybe should integrate with Advanced-Queue module.
See og/og-vocab/message-subscribe for examples.

olofbokedal’s picture

For those who are interested, here's #31 patched agains 7.x-1.0.

Status: Needs review » Needs work

The last submitted patch, entityreference-cleanup-1368386-43.patch, failed testing.

olofbokedal’s picture

And here's a patch that should work. Again, #31 patched against 7.x-1.0.

DamienMcKenna’s picture

Status: Needs work » Needs review
DamienMcKenna’s picture

Rerolled against -dev, haven't really tested it yet.

DamienMcKenna’s picture

The patch in #47 works for me with an EntityReference field pointing to a Bean object - I used to get a colorful error, with the option enabled the reference is now removed from the parent object correctly.

DamienMcKenna’s picture

One question - from a usability perspective, shouldn't the reference value be removed automatically? Are there any use cases where it should not remove the reference? Right now it gives a great big error, i.e. your site breaks.

JvE’s picture

@DamienMcKenna: I agree with you, but "the community" raised these points:
Case #1: Modules may want to react on an entity losing a reference. Especially if the field is marked as #required. This lead to using full entity_save().
Case #2: There may be many references to the deleted entity. It could be the performance hit of removing all those references is be too big to do them all immediately. This lead to queuing.

kostajh’s picture

Patch in #47 worked nicely for me.

SocialNicheGuru’s picture

inghamn’s picture

Why not just delete the rows immediately, instead of queing? It shouldn't take too long, and would not bring in all the complexity of relying on yet other modules.

joachim’s picture

Because that will bypass all hooks and that FieldAPI will invoke. For instance, that won't work with an OG group field.

jojonaloha’s picture

Putting my vote in for the patch in #47. It looks good and works for me.

podarok’s picture

Status: Needs review » Needs work
+++ b/entityreference.module
@@ -348,6 +348,39 @@ function entityreference_entity_update($entity, $entity_type) {
+  $cacheDirty = false;

Please, do not mix variable naming styles. For D7 we have to not use camelCase format
https://www.drupal.org/coding-standards#naming

jojonaloha’s picture

Status: Needs work » Needs review
FileSize
6.86 KB

@podarok I went to work on your suggestions, but I think your review only applies to the patch in #53. Based on @joachim's comment in #54 and the previous comments, I don't think the patch in #53 is the currently accepted patch.

Attached is a re-roll of #47 (based on the comments and what I can tell of the diffs, is just a re-roll of #45 and #31)

partdigital’s picture

There's a bug in #57 where the reference table isn't being updated correctly. The column field is still marked as 0.

Note the code below, these conditions aren't always true even when they are supposed to be.

 db_update($table)
    ->condition('entity_type', $entity_type)
    ->condition('bundle', isset($entity->bundle) ? $entity->bundle : $entity_type)
    ->condition('entity_id', $entity_id)
    ->fields(array('deleted' => 1))
    ->execute();

The patch also checks entity type but not bundle type, so these processes are being needlessly run on entities that don't even have field references. This makes it slower than necessary.
For example:
There are three content types:

  • Basic page
  • Article
  • Event

The basic page references an event.

An article is deleted triggering the entity_delete hook.

The code below would be true. However the condition shouldn't be true because article DOESN'T have an entity reference.

// $target_type = 'node';
// $entity_type = 'node'; 
if ($cleanup_enabled && $target_type == $entity_type) {
     $query = new EntityFieldQuery();

We need to do another check.

// $bundle = 'event'
// $target_bundles = array('event'); 
 if ($cleanup_enabled && $target_type == $entity_type && in_array($bundle, $target_bundles)) {

I've rerolled the patch from #57 with fixes for these two issues as well as adding some formatting for clarity.

fullerja’s picture

Patch in #58 applied and work for me. I was able to remove an item from entityqueue.

lpeabody’s picture

@fullerja, if you've tested it can you mark as RTBC?

fullerja’s picture

I think there are several other entity reference use cases, the primary of which is as a field, which I have not tested. I would like to see that tested before it goes to RTBC.

podarok’s picture

Status: Needs review » Reviewed & tested by the community

#58 looks good

lhridley’s picture

I just tested patch #58 and it's not working on my end.

My case involves the Message module, which has a fieldable custom entity type not derived from a node. I am creating an activity log for a node to log all activity on that node. My message instance has two entity reference fields referencing the user acting on the node, and the node itself.

I would expect that deleting the node would delete any messages in the activity log that are tied to that node, but instead the messages are remaining in the activity log, only the referenced node itself no longer exists, resulting in a token that no longer gets replace by information on the deleted node (in effect, a broken message).

I have not tried deleting a user yet.

lhridley’s picture

Status: Reviewed & tested by the community » Needs work
SocialNicheGuru’s picture

@lhridely, I don't think the patch deletes the messages. I think it is supposed to modify the entityreference in the message. I believe it will delete the reference to the node that was deleted. I don't believe it was meant to delete the message itself.

JvE’s picture

This issue is about deleting the reference from the message to the no-longer-existing node.
For deleting the message itself you may want to take a look at https://www.drupal.org/project/ercd and/or https://www.drupal.org/project/field_referenced_delete

lhridley’s picture

I would expect that, since, the message contained a field that requires the reference be present to save the record, that deleting the entity referenced in that field would do one of two things:

(1) not allow you to delete the entity until the message had been deleted or the reference remove (similar to the way the Field API behaves when you attempt to remove a module that provides a custom field type and that field type is being used in a content type or fieldable entity), or
(2) delete the entity AND any records with required references to that entity.

Are my expectations counter to best practices, because I would assume that the behavior I described would be a best practice. Please correct me if my assumption is in error.

And, to clarify, the reference to the deleted entity remains on the message instance when you delete the referenced entity, even though it no longer exists.

partdigital’s picture

HI lhridley,

This issue is about deleting the reference not the message itself.
Deleting the message might be appropriate for your use case but deleting entities by default will cause problems for others who are using the entity reference field.
For example, if I have an "Athlete" node referencing a "country" node, if I delete the country that would delete the athlete. Not what I want necessarily.

(1) not allow you to delete the entity until the message had been deleted or the reference remove (similar to the way the Field API behaves when you attempt to remove a module that provides a custom field type and that field type is being used in a content type or fieldable entity), or

This would cause a frustrating user experience.
In my scenario above.

  • A country has 100 athletes in it.
  • I can’t delete the country until I reassign (or delete) all athletes referencing that country.

(2) delete the entity AND any records with required references to that entity.

If you look at my scenario above.

  • An athlete node requires a country reference
  • I delete the country
  • That athlete gets deleted. (not what we want)

So your use case is the exception, not the rule. For the purposes of this issue it’s only about deleting the entity reference. For what you're looking for I'd suggest checking out the links that JvE recommended.

As an aside, I set up the message module and tried some tests with it. It deleted all the references as expected so that patch still works as designed. I’m going to set this issue back to “needs review”

partdigital’s picture

Status: Needs work » Needs review
fullerja’s picture

Status: Needs review » Reviewed & tested by the community

Tested with Entity reference field, referencing nodes. Works for me

hideaway’s picture

Sorry to post this, but the provided patches are working only for one specific use case and that's when your entity reference field has Entity selection mode set to Simple, where you can select target bundles. But if you are using mode "Views: filter by an entity reference views" you're done, since in the entity_reference_delete hook you get empty target bundles in condition:

$target_bundles = (isset($field['settings']['handler_settings']['target_bundles'])) ? array_keys($field['settings']['handler_settings']['target_bundles']) : array();
partdigital’s picture

Great find hieaway!

I've revised the patch with this code. Now it will only do a check for target bundles if they exist. Otherwise it will run for all nodes. It will affect performance when using the Views Reference Filter module (or other modules) but it's an acceptable compromise in my opinion.

$bundle_valid = TRUE;

    // Target bundles won't always be defined, only do check when they are.
    if(!empty($target_bundles)){
      $bundle_valid = (in_array($bundle, $target_bundles)) ? TRUE : FALSE;
    }

    // Check if cleanup is enabled and the field references the same type as the referenced entity.
    if ($cleanup_enabled && $target_type == $entity_type && $bundle_valid) {
elephant.jim’s picture

I thought if $field['settings']['handler_settings']['target_bundles'] is empty (or missing), then it means use all bundles for that entity type. See EntityReference_SelectionHandler_Generic.class.php. Don't you need to call entity_get_info($entity_type); when setting $target_bundles above?

See attached patch. There was also a random whitespace change that I undid.

elephant.jim’s picture

Status: Reviewed & tested by the community » Needs review
elephant.jim’s picture

Ah, shoot. Changing isset on the field target_bundles to not empty.

quotesBro’s picture

about #75. It worked for 'message' entity with comment reference field.

But there is a problem (I believe the same as in #63): Message module implements hook_entity_delete() and handles messages deletion when referenced entities are being deleted. Afer applying #75, message_entity_delete() stopped doing its job because when it gets invoked, references to deleted entities have been already deleted.

Increasing weight of entityreference module in {system} solved this problem.

I added entityreference_update_7003() in this patch

elephant.jim’s picture

Status: Needs review » Needs work

I'm setting this to "needs work" because of comment #76 above along with this comment.

I believe there's a problem where we delete one reference among many, and the deltas are not rekeyed. That is, this code:

          foreach($items as $delta => $value) {
            if ($value['target_id'] == $entity_id) {
              unset($referencing_entity->{$field_name}[$language][$delta]);
            }
          }

should be:

          foreach($items as $delta => $value) {
            if ($value['target_id'] == $entity_id) {
              unset($referencing_entity->{$field_name}[$language][$delta]);
            }
          }
          $referencing_entity->{$field_name}[$language] = array_values($referencing_entity->{$field_name}[$language]);

See http://drupal.stackexchange.com/questions/31794/programmatically-delete-...

quotesBro’s picture

$referencing_entity->{$field_name}[$language] = array_values($referencing_entity->{$field_name}[$language]);

looks good

imre.horjan’s picture

#76: Increasing weight of entityreference module in {system} solved this problem.

I added entityreference_update_7003() in this patch.

Hi,
I think changing the module weight is out of the scope of this issue, since it can lead to unexpected compatibility issues with any other modules.
Don't you think that changing the weight of message module would be a better approach?

imre.horjan’s picture

Here's the patch for latest dev version fixing deltas mentioned in #77.

I've reviewed the code manually, and tested with the user badges 7.x-4.x-dev version (which is based on entities), and it worked for me.

Dave Reid’s picture

This should be done in a queue process, not on entity delete. This will slow down mass delete operations significantly. I still argue I don't think this should even be added to this module.

imre.horjan’s picture

Hi Dave,
I can't see any use-case when leaving references for deleted entities in the database is a good choice.
There's a queue worker already implemented for the deletion. Do you think it's not implemented in the right way?
JvE (and others) did a great job.
What do you think, how should this be implemented if not in entityreference module?

azinck’s picture

@imre.horjan: I'm not speaking for Dave, but I'd suggest this needs to be solved in a more general fashion as Entity Reference isn't the only type of field that suffers this problem. Something more like David Rothstein's https://www.drupal.org/project/field_reference_delete is a better direction, if you ask me.

imre.horjan’s picture

Thank you azinck,
it wasn't clear me until now that field_reference_delete is a general solution what works for all kind of fields.

ndobromirov’s picture

Issue summary: View changes
Anybody’s picture

Hey and thanks a lot for your work and discussion so far. What's the status of this issue? I think the patch from #80 is a great idea. What's your suggestion how to proceed?

Anybody’s picture

I agree with #82... again and again I'm running into the problem of broken references. In contrast to #81 and #82 I think that if a module like entity reference introduces a new kind of relation, that module also has to take care of clean removing this on delete again. Otherwise the database is just getting messed up and I can't see a good reason for such a decision?
If a target node is deleted, why should we ever point to that.
As you can see in#76 other modules already implement their own logic because this is missing in the module.
Other references like node reference, taxonomy reference, etc. should also handle the same point (on their own) for a finally clean solution.

Patch from #80 works great for me and we should really fix this in the module.

Please let's discuss this important issue.

jonathanshaw’s picture

The issue rolls on in D8: #2723323: [META] Clean up references to deleted entities. Given there is no consensus here, but there is an interest in fixing it for D7, it probably makes sense to have the separate (linked) D8 issue.

sam.spinoy@gmail.com’s picture

Patch in #80 doesn't seem to apply anymore to the latest dev? Or it doesn't apply cleanly for me anyway.
Created a new one.

This approach works for our setup, but I am seeing long loading time after deleting an entity, due to the db updates and such. Don't know if this approach would work for sites with a large amount of data.

Anybody’s picture

samspinoy is right. Should this be a batch API job instead perhaps (or at least as configurable option?)

ndobromirov’s picture

Or a queue that will be deleting them on cron asynchronously. There were issues in core like that.

sam.spinoy@gmail.com’s picture

Or a queue that will be deleting them on cron asynchronously

Actually that already is the case: current code puts in queue in chunks of 40, but it takes the first 40 results right away, causing the heavy load right after deletion of entity.

I ended up not using the patch but writing my own entity_delete hook based on the latest patch here, so I could optimize the code. I found that just marking the data as deleted and running field_cache_clear() right after removed the incorrect references from the db. However I'm guessing that bypasses the entity_save hooks etc.

eelkeblok’s picture

  1. +++ b/entityreference.install
    @@ -163,3 +163,13 @@ function entityreference_update_7002() {
    +
    +/**
    + * Set weight of the module to 1.
    + */
    +function entityreference_update_7003() {
    +  db_update('system')
    +    ->fields(array('weight' => 1))
    +    ->condition('name', 'entityreference')
    +    ->execute();
    +}
    

    I don't think this is the right approach (IIRC, this was added because a specific module (messages) had problems reacting to its reference field being empty). That is first and foremost a problem for that module to solve. Couldn't these use cases be caught by simply listening to hook_entity_save and checking for whatever reference field you're interested in being empty?

  2. +++ b/entityreference.module
    @@ -427,6 +482,26 @@ function _entityreference_field_settings_process($form, $form_state) {
    +    0 => t('Leave invalid references around when entities are deleted.'),
    

    I think this could use some rewording, it's quite informal as it is (mostly due to "around"). Maybe "Leave invalid references when entities are deleted."

  3. +++ b/entityreference.module
    @@ -427,6 +482,26 @@ function _entityreference_field_settings_process($form, $form_state) {
    +    '#description' => t('What to do when a referenced entity is deleted. Note that the second option may cause timeouts or memory issues when deleting large amounts of entities or references.'),
    

    I wonder whether this is still true with excess removals being handled by a queue. One thing that would worry me that this might trigger some sort of cascade (e.g. messages module that would like to remove an entire entity when there are no references anymore).

  4. +++ b/entityreference.module
    @@ -427,6 +482,26 @@ function _entityreference_field_settings_process($form, $form_state) {
    +    '#default_value' => isset($field['settings']['dereference_on_delete']) ? $field['settings']['dereference_on_delete'] : 0,
    

    Should this be set to enabled by default, i.e. for new fields? In that case, we would need an update hook to set it disabled for existing fields.

sdsheridan’s picture

I've just stumbled upon this thread as a result of narrowing down a failed search indexing to a referential integrity (RI) problem with entity references. The specific case was raising the error message

EntityMalformedException: Missing bundle property on entity of type file. in entity_extract_ids() (line 7922 of .../includes/common.inc).

After doing much hunting around, I discovered an entity reference field with a reference to a file (the fid) that no longer existed in the file_managed table.

The relational model is pretty clear on what to do in terms of RI for relationships, and this module certainly allows a site builder to create those relationships. So, this module it seems to me should be taking care of RI too, given that we're not using declarative or procedural (i.e. triggers) RI in the database to enforce RI (in which case the application / business logic / controller would handle any returned RI error messages in a user-friendly fashion).

The relational model says you do one of three things upon deletion of an entity-type instance (the parent) where the entity-type's primary key (PK) is the foreign key (FK) of another entity-type (the dependent):

  1. Restrict – Don't allow the deletion where instances of the dependent exist with the FK value the PK value of the parent instance you're attempting to delete. This is the default case where the relationship is 'required' (i.e., the FK cannot be NULL). One must relate the dependent instances to another instance of the parent before the parent instance in question can be deleted.
  2. Cascade – Delete all dependent instances (and their potential dependents in other entity-types) where their FK value is the PK value of the parent instance in question.
  3. Set Null – Simply set the FK value to NULL in the dependent, potentially orphaning them. Note that this is the only option the current patches implement.

I think in the case of entity reference, all three options should be offered if we're to maintain referential integrity within a site, which to me is the only way to ensure things keep functioning as expected (like search indexing per my particular case).

I'd design it as follows:

  1. Where a field is required, then restrict or cascade should be the choices, set by the site builder (so a setting on the field settings form).
  2. Where a field is optional, then set null or cascade should be the choices, again set by the site builder.
  3. If there is a need to mass-reassign dependents to a new parent, that could be built into the interface for deleting the parent instance. This might arise when a number of dependent instances were mistakenly assigned to a parent instance that is a duplicate of another parent instance, and a correction / consolidation under one parent needed to be made to delete the other parent. In this case, a confirmation screen on the entity delete permitting the mass re-assign to a new parent should be shown to the user, with a drop-down list of eligible parents. This could also be a setting on the field settings to show this option at time of entity deletion.
  4. The entity deletion screen should be modified vis-a-vis the message displayed to show the user what will happen to dependents should they choose to delete this instance.

That's my two-bits on it. All that said, though, I think we do absolutely need to take care of referential integrity to ensure we have properly functioning sites.

Shawn

JvE’s picture

Sad opinion:
It's been 5 years. This module is not going to be fixed.
Use a patch, another contrib module or custom code to clean up leftover references, entityreference is never going to do it for you.

Anybody’s picture

I absolutely agree with #94.
Is #95 an official statement or a sad opinion?

Could we get some feedback from a module developer?

bburg’s picture

Seeing that the last comment by a maintainer was in 2013, and that the module is listed as "Minimally Maintained." My guess is that the only way this is getting fixed is if someone volunteers as a new maintainer.

spotzero’s picture

Hey. One of the "new" maintainer here. We're focusing bug fixes (though we're mostly still going through the backlog of RTBCd patchs).

We don't have time to design and write a solution to this, but if we get this issue to needs review (and get some tests for it), I'll test it and roll it in.

Alina Basarabeanu’s picture

Update the patch for the newest version of the module (1.4). The only difference is the update_hook number

Alina Basarabeanu’s picture

Testing this pacth I came across an issue where looping through the fields for which references were found not always all the fields exists on the referencing entity, so I wrapped the foreach in a check if the field is not empty delete the specific value.

iMiksu’s picture

Interdiffs would be useful.

handkerchief’s picture

This is also a Drupal 8 (8.4.0) problem...

https://www.drupal.org/node/2827177#comment-12323089

JvE’s picture

This is also a Drupal 8 (8.4.0) problem...

Yes, see https://www.drupal.org/node/1281114#comment-11965237

sdsheridan’s picture

OK, after all this time, I needed a solution. So I wrote one. It's a separate module, and it only implements the "Restrict" option (for now), but here it is: https://www.drupal.org/project/erri . Please review / take a look.

Shawn

sdsheridan’s picture

Entity Reference Referential Integrity now also does "Set Null" for the case where the referring field is not a required field.