Problem/Motivation

When a partial entity is saved, _field_invoke() tries to invoke hooks for ALL the fields that the entity has, not just the ones included in the partial entity.

e.g., if you try to update a node that has a value in a file field and don't explicitly include the file field's current value in the object, file_field_update() gets invoked with the incorrect input that the file field is empty. Hooks think that the field is empty, but the entity itself still thinks the field is populated, and so there's a data integrity problem.

Proposed resolution

We do require that field storage modules treat a missing $entity->$field_name as 'leave the field untouched', to support this exact use.

However, _field invoke() does call field-type's hook_field_*() methods on all fields present in the bundle, indistinctly passing an empty array (whether $entity->$field_name was actually set to an empty array, or just missing).

From _field_invoke() :
$items = isset($entity->{$field_name}[$langcode]) ? $entity->{$field_name}[$langcode] : array();
// then call [module]_field_[op]($items);

As a result, taxonomy_field_update() acts as if hitting an empty field, and wipes the taxonomy_index records for the node. Similarly, file_field_update() wipes file usage records.

Not calling hook_field_update() on fields absent from the entity would not solve the taxonomy case, BTW - the 1st actual call to taxonomy_field_update() (if *one* taxo field is present) would still start by wiping the taxonomy_index records. Since the taxonomy_index table currently munges data from all taxo fields on the entity, I'm not sure how this would be fixed.

Moving to entity-level hooks #1050466-2: The taxonomy index should be maintained in a node hook, not a field hook

Remaining tasks

User interface changes

TBC

API changes

TBC

#1050466-2: The taxonomy index should be maintained in a node hook, not a field hook

Original report by tcmug

// Text of original report here.
(for legacy issues whose initial post was not the issue summary. Use rarely.)
field_attach_update screws up fields that are not present in the entity. Comments in the code in the function field_attach_update() at field.attach.inc lets you assume that the function should ignore all fields when they are not in the entity object (i.e. not set):

$entity = new stdClass();
$entity->delete_me = null; // would be deleted
// $entity->dont_delete; wont be deleted since it wont be set

Basically what happens is that taxonomy fields lose taxonomies from taxonomy_index and file fields lose files from file_managed when the fields are not present in the entity.

Clearly a documentation/comment or code failure:

// Collect the storage backends used by the remaining fields in the entities.
$storages = array();
foreach (field_info_instances($entity_type, $bundle) as $instance) {
$field = field_info_field_by_id($instance['field_id']);
$field_id = $field['id'];
$field_name = $field['field_name'];
// Leave the field untouched if $entity comes with no $field_name property,
// but empty the field if it comes as a NULL value or an empty array.
// Function property_exists() is slower, so we catch the more frequent
// cases where it's an empty array with the faster isset().
if (isset($entity->$field_name) || property_exists($entity, $field_name)) {
// Collect the storage backend if the field has not been written yet.
if (!isset($skip_fields[$field_id])) {
$storages[$field['storage']['type']][$field_id] = $field_id;
}
}
}

Code used:
// $entity is an entity loaded with entity_load($type, $entity_id)

$temp = new stdClass();

$temp->{$id} = $entity_id;
if ($bundle) {
$temp->{$bundle} = $entity->{$bundle};
}

$l = $entity->language;

// add radioactiveness
$value = $entity->{$this->field_name}[$l][0]['radioactivity_energy'];
$value += $energy;
$temp->{$this->field_name}[$l][0]['radioactivity_energy'] = $value;

field_attach_update($entity_type, $temp);

When testing - check caches :)

Comments

yched’s picture

Title: field_attach_update clears fields » hook_field_*() called on fields not set in the $entity object (partial updates)

Indeed.

We do require that field storage modules treat a missing $entity->$field_name as 'leave the field untouched', to support this exact use
However, _field invoke() does call field-type's hook_field_*() methods on all fields present in the bundle, indistinctly passing an empty array (whether $entity->$field_name was actually set to an empty array, or just missing). From _field_invoke() :

$items = isset($entity->{$field_name}[$langcode]) ? $entity->{$field_name}[$langcode] : array();
// then call [module]_field_[op]($items);

As a result, taxonomy_field_update() acts as if hitting an empty field, and wipes the taxonomy_index records for the node. Similarly, file_field_update() wipes file usage records.

Not calling hook_field_update() on fields absent from the entity would not solve the taxonomy case, BTW - the 1st actual call to taxonomy_field_update() (if *one* taxo field is present) would still start by wiping the taxonomy_index records. Since the taxonomy_index table currently munges data from all taxo fields on the entity, I'm not sure how this would be fixed.

imiksu’s picture

subscribing

damien tournoud’s picture

Version: 7.x-dev » 8.x-dev
Priority: Normal » Critical

Data loss = critical in my book.

About the taxonomy case, I think moving to entity-level hooks as suggested before (see #1050466-2: The taxonomy index should be maintained in a node hook, not a field hook) would fix this.

tcmug’s picture

Agree with the critical, also quite limiting now that I can't just update that one specific field. The performance hit for bigger entities might end up being noticeable on some setups & modules, and this is especially true for Radioactivity on large sites, but I guess theres always stuff you can do to bypass these kind of limitations.

catch’s picture

Issue tags: +Needs backport to D7

Tagging.

yched’s picture

Status: Active » Needs review
StatusFileSize
new4 KB

Let's just see what the tests say if we don't run field hooks on missing $entity->field_name entries.

Status: Needs review » Needs work

The last submitted patch, no_field_invoke_missing_fields-1126000.patch, failed testing.

yched’s picture

Heh, stupid me. If _field_invoke() doesn't call the default 'form' callback on missing field entries, all entity creation forms fail, of course.

Let's try this one : don't invoke field-type 'presave' and 'update' hooks on missing field entries.
Just to see what happens - code is awfull, and I'm really not sure I like the idea to begin with.

yched’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, no_field_invoke_missing_fields-1126000.patch, failed testing.

yched’s picture

Status: Needs work » Needs review
StatusFileSize
new3.02 KB

Should fix last warnings.

yched’s picture

Status: Needs review » Needs work

Green, back to needs work.

agi.novanta’s picture

subscribing

ankur’s picture

I applied the patch in #12 and it seems to work for me.

I had both the issue of a node's records disappearing from {taxonomy_index} as well as a node losing all of its file field values when calling field_attach_update for other fields.

I tested this w/ 7.9

tim.plunkett’s picture

Tagging.

andypost’s picture

Issue tags: +Needs tests

Is there any steps to reproduce and write a tests?

swentel’s picture

Status: Needs work » Needs review
StatusFileSize
new4.66 KB
new1.62 KB

Reroll of the patch + also a test on file usage which proves the bug is still there.

moshe weitzman’s picture

#18: 1255696-26.patch queued for re-testing.

yesct’s picture

was that enough tests? (removing needs tests tag since it was added)

steps to reproduce would help people review this (steps to reproduce contributor task doc: http://drupal.org/node/1468198)

I think this is also still looking to get it's issue summary updated. adding link to contributor task for that in hopes of helping someone tackle that task: http://drupal.org/node/1427826

xjm’s picture

Status: Needs review » Needs work

Thanks @YesCT! The closest thing to STR here are in the test @swentel provided, but I doubt it's reproducible entirely through the UI in HEAD.

What's happening here (I think, after lots of staring at this code) is that when you save a partial entity, _field_invoke() tries to invoke hooks for ALL the fields that the entity has, not just the ones included in your partial entity. For example, if you try to update a node that has a value in a file field and don't explicitly include the file field's current value in the object, file_field_update() gets invoked with the incorrect input that the file field is empty. Hooks think that the field is empty, but the entity itself still thinks the field is populated, and so there's a data integrity problem.

Alright, so now that I understand the bug, onto a patch review:

  1. +++ b/core/modules/field/field.attach.incundefined
    @@ -409,23 +409,42 @@ function _field_invoke($op, $entity_type, EntityInterface $entity, &$a = NULL, &
    +        $items = NULL;
    ...
    +              $items = isset($entity->{$field_name}[$langcode]) ? $entity->{$field_name}[$langcode] : array();
    ...
    +          $items = isset($entity->{$field_name}[$langcode]) ? $entity->{$field_name}[$langcode] : array();
    ...
    +        if (!is_null($items)) {
    +          $result = $function($entity_type, $entity, $field, $instance, $langcode, $items, $a, $b);
    +          if (isset($result)) {
    +            // For hooks with array results, we merge results together.
    +            // For hooks with scalar results, we collect results in an array.
    +            if (is_array($result)) {
    +              $return = array_merge($return, $result);
    +            }
    +            else {
    +              $return[] = $result;
    +            }
    +          }
    

    So, as far as I can see, we're doing some typing magic here. We initialize $items to NULL, and if the field on the entity has no values, we then set $items to be an empty array in order to explicitly indicate it's empty? And then we check against that when deciding whether or not to invoke the hooks (because hooks should not be invoked for fields that nothing is being done to). And then the pre-existing behavior at the end, where we again do some divination with different varieties of emptiness to decide whether to update the entity's field value with $items.

    Um. That's confusing as all flapjacks. Can we refactor this somehow? Use a separate Boolean flag rather than relying on different variants of empty, or reorganize the logic a little?

  2. +++ b/core/modules/field/field.attach.incundefined
    @@ -409,23 +409,42 @@ function _field_invoke($op, $entity_type, EntityInterface $entity, &$a = NULL, &
    +        if (!$options['default'] && in_array($op, array('update', 'preseave'))) {
    

    I'm pretty sure there is no "preseave" op. :) That also means we don't have complete test coverage. Based on the information in the issue, this applies to taxonomy and file fields (at least, but probably any field that has hooks that take action when a field is empty!), and based on this line of code, it applies to presave and update operations. So, we'd need tests for all four combinations of those conditions... (Ideally, though, I think we'd have unit tests for all the different code paths instead, but given the fact that this whole API is on the cusp of two major conversions, I'm not going to recommend going crazy refactoring this to be more testable.)

  3. +++ b/core/modules/field/field.attach.incundefined
    @@ -409,23 +409,42 @@ function _field_invoke($op, $entity_type, EntityInterface $entity, &$a = NULL, &
    +        else {
    +          $items = isset($entity->{$field_name}[$langcode]) ? $entity->{$field_name}[$langcode] : array();
    +        }
    

    I had to read a few times before I understood what was going on. Maybe just // Handle other operations normally. above the else. Edit: I take that back; "normally" isn't very helpful. I think the correct clarification is to refactor it.

  4. +++ b/core/modules/field/field.attach.incundefined
    @@ -409,23 +409,42 @@ function _field_invoke($op, $entity_type, EntityInterface $entity, &$a = NULL, &
    +          // Populate $items back in the field values, but avoid replacing missing
    +          // fields with an empty array (those are not equivalent on update).
    +          if ($items !== array() || isset($entity->{$field_name}[$langcode])) {
    +            $entity->{$field_name}[$langcode] = $items;
    +          }
    

    This is the pre-existing hunk, just moved into the conditional, but now I'm bending my brain trying to figure out how $items is going to be other than array() or the field values. It can't be NULL or array() here, and those are the only other two things we set it to beside the field values. So I think this hunk can be removed at least with the current logic.

  5. +++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldWidgetTest.phpundefined
    @@ -20,6 +20,36 @@ public static function getInfo() {
    +   * Test that field data is preserved when field keys are not present
    +   * on an entity during a save call.
    

    This should be a single line of 80 characters or fewer, beginning with (e.g.) "Tests" rather than "test". http://drupal.org/node/1354#functions

  6. +++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldWidgetTest.phpundefined
    @@ -20,6 +20,36 @@ public static function getInfo() {
    +    // Create filefield.
    

    Nitpick: "Create a file field." (With an article.)

  7. +++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldWidgetTest.phpundefined
    @@ -20,6 +20,36 @@ public static function getInfo() {
    +    $this->assertFileExists($node_file, t('New file saved to disk on node creation.'));
    

    We can remove the t() from the assertion message text. http://drupal.org/simpletest-tutorial-drupal7#t

xjm’s picture

no_angel’s picture

Status: Needs work » Needs review
Issue tags: -Needs issue summary update +Needs steps to reproduce

Tagging: removed needs issue summary, added needs steps to reproduce.
Status: needs mentor review (my issue summary).

xjm’s picture

Status: Needs review » Needs work
Issue tags: -Needs steps to reproduce

Thanks @no_angel! You can indicate that on the core mentoring site by marking your task attempt complete. :) On the Drupal.org issue, the needs work/needs review status refer only to the patch, so it's still NW for my post in #21.

A couple corrections: As I indicated in #21, this is not reproducible with HEAD through the UI, and @swentel's patch in #18 contains a test that reproduces it. One thing you could add to the summary is exactly what needs tests (my point 2 from #21). You could also add another point to the remaining tasks that the patch needs to be updated based on the rest of my feedback in that comment. Thanks for helping on this. :)

swentel’s picture

Even we change the code in _field_invoke(), we still have a problem with files in file_field_update:

  // On new revisions, all files are considered to be a new usage and no
  // deletion of previous file usages are necessary.
  if (!empty($entity->original) && $entity->getRevisionId() != $entity->original->getRevisionId()) {
    foreach ($items as $item) {
      file_usage()->add(file_load($item['fid']), 'file', $entity->entityType(), $entity->id());
    }
    return;
  }

This means that to make this work for files there are 2 options:

  • Pass on $items in field_invoke from the original entity: this is probably the safest option as we ensure then that contrib modules get the data they expect.
  • Or use the items directly from the original entity in file_field_update if the field is not set on the incoming entity.

I honestly don't like both options, and the code is smelly in the patch (even if we refactor it).

There's 3rd option imho: document that calling node_save() - or $entity->save() in D8 which should (probably will in 99.99999% of all cases) be a full entity anyway - of a partial entity is potentially dangerous because not all properties are on the object.

And with full Field API merging into the new Field Entity API, at some point field_attach_update will/should die anyway (if that merge happens of course).

swentel’s picture

Issue tags: +Field API

Adding tag

catch’s picture

There's 3rd option imho: document that calling node_save() - or $entity->save() in D8 which should (probably will in 99.99999% of all cases) be a full entity anyway - of a partial entity is potentially dangerous because not all properties are on the object.

That seems reasonable to me. Another thing with the new entity API is we were looking at tracking which properties changed before saving - so that this information can be passed to hook implementations etc. If we really wanted to allow partial updates that might be a way to force modules to skip acting on missing properties - only mark the ones being updated as changed.

Not sure where this leaves things for 7.x though.

catch’s picture

Would be good to get this out of the 8.x critical list, and I still think just documenting that saving partial entities is unsupported is enough for 8.x.

Not sure where to put those docs though.

See also #1992010: Reverting to revisions prior to addition of field translations is broken which is very similar.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new588 bytes

Patch.

swentel’s picture

Status: Needs review » Reviewed & tested by the community

Talked this through with catch on IRC, chx also confirmed that partial entities in D8 simply don't even exist anymore. And if you'd try, you're in trouble.

yched’s picture

True that. RTBC +1.

webchick’s picture

Title: hook_field_*() called on fields not set in the $entity object (partial updates) » Document that we don't support partial entity updates
Component: field system » documentation
Status: Reviewed & tested by the community » Fixed

Oooookay, I guess we can do this.

Committed and pushed to 8.x. Thanks!

catch’s picture

Title: Document that we don't support partial entity updates » hook_field_*() called on fields not set in the $entity object (partial updates)
Version: 8.x-dev » 7.x-dev
Component: documentation » field system
Status: Fixed » Active

Moving back to 7.x. The docs fix won't work there so this is more 'active' than 'to be ported'.

catch’s picture

Issue summary: View changes

issue summary update attempt.

fabianx’s picture

Title: hook_field_*() called on fields not set in the $entity object (partial updates) » [meta] hook_field_*() called on fields not set in the $entity object (partial updates)

Making a meta issue as there is a patch for the file issue.

  • webchick committed f770a28 on 8.3.x
    Issue #1126000 by yched, swentel, catch: Fixed Document that we don't...

  • webchick committed f770a28 on 8.3.x
    Issue #1126000 by yched, swentel, catch: Fixed Document that we don't...

  • webchick committed f770a28 on 8.4.x
    Issue #1126000 by yched, swentel, catch: Fixed Document that we don't...

  • webchick committed f770a28 on 8.4.x
    Issue #1126000 by yched, swentel, catch: Fixed Document that we don't...
stephencamilo’s picture

Status: Active » Closed (won't fix)
hestenet’s picture

Status: Closed (won't fix) » Active

Reset issue status.

Status: Active » Closed (outdated)

Automatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.