Problem/Motivation
When a partial entity is saved, _field_invoke() tries to invoke hooks for ALL the fields that the entity has, not just the ones included in the partial entity.
e.g., if you try to update a node that has a value in a file field and don't explicitly include the file field's current value in the object, file_field_update() gets invoked with the incorrect input that the file field is empty. Hooks think that the field is empty, but the entity itself still thinks the field is populated, and so there's a data integrity problem.
Proposed resolution
We do require that field storage modules treat a missing $entity->$field_name as 'leave the field untouched', to support this exact use.
However, _field invoke() does call field-type's hook_field_*() methods on all fields present in the bundle, indistinctly passing an empty array (whether $entity->$field_name was actually set to an empty array, or just missing).
From _field_invoke() :
$items = isset($entity->{$field_name}[$langcode]) ? $entity->{$field_name}[$langcode] : array();
// then call [module]_field_[op]($items);
As a result, taxonomy_field_update() acts as if hitting an empty field, and wipes the taxonomy_index records for the node. Similarly, file_field_update() wipes file usage records.
Not calling hook_field_update() on fields absent from the entity would not solve the taxonomy case, BTW - the 1st actual call to taxonomy_field_update() (if *one* taxo field is present) would still start by wiping the taxonomy_index records. Since the taxonomy_index table currently munges data from all taxo fields on the entity, I'm not sure how this would be fixed.
Moving to entity-level hooks #1050466-2: The taxonomy index should be maintained in a node hook, not a field hook
Remaining tasks
- Steps to reproduce http://drupal.org/node/1468198)
- Needs Tests
User interface changes
TBC
API changes
TBC
Related Issues
#1050466-2: The taxonomy index should be maintained in a node hook, not a field hook
Original report by tcmug
// Text of original report here.
(for legacy issues whose initial post was not the issue summary. Use rarely.)
field_attach_update screws up fields that are not present in the entity. Comments in the code in the function field_attach_update() at field.attach.inc lets you assume that the function should ignore all fields when they are not in the entity object (i.e. not set):
$entity = new stdClass();
$entity->delete_me = null; // would be deleted
// $entity->dont_delete; wont be deleted since it wont be set
Basically what happens is that taxonomy fields lose taxonomies from taxonomy_index and file fields lose files from file_managed when the fields are not present in the entity.
Clearly a documentation/comment or code failure:
// Collect the storage backends used by the remaining fields in the entities.
$storages = array();
foreach (field_info_instances($entity_type, $bundle) as $instance) {
$field = field_info_field_by_id($instance['field_id']);
$field_id = $field['id'];
$field_name = $field['field_name'];
// Leave the field untouched if $entity comes with no $field_name property,
// but empty the field if it comes as a NULL value or an empty array.
// Function property_exists() is slower, so we catch the more frequent
// cases where it's an empty array with the faster isset().
if (isset($entity->$field_name) || property_exists($entity, $field_name)) {
// Collect the storage backend if the field has not been written yet.
if (!isset($skip_fields[$field_id])) {
$storages[$field['storage']['type']][$field_id] = $field_id;
}
}
}
Code used:
// $entity is an entity loaded with entity_load($type, $entity_id)
$temp = new stdClass();
$temp->{$id} = $entity_id;
if ($bundle) {
$temp->{$bundle} = $entity->{$bundle};
}
$l = $entity->language;
// add radioactiveness
$value = $entity->{$this->field_name}[$l][0]['radioactivity_energy'];
$value += $energy;
$temp->{$this->field_name}[$l][0]['radioactivity_energy'] = $value;
field_attach_update($entity_type, $temp);
When testing - check caches :)
| Comment | File | Size | Author |
|---|---|---|---|
| #29 | 1126000.patch | 588 bytes | catch |
| #18 | 1255696-fail-26.patch | 1.62 KB | swentel |
| #18 | 1255696-26.patch | 4.66 KB | swentel |
| #12 | no_field_invoke_missing_fields-1126000-12.patch | 3.02 KB | yched |
| #9 | no_field_invoke_missing_fields-1126000.patch | 2.54 KB | yched |
Comments
Comment #1
yched commentedIndeed.
We do require that field storage modules treat a missing $entity->$field_name as 'leave the field untouched', to support this exact use
However, _field invoke() does call field-type's hook_field_*() methods on all fields present in the bundle, indistinctly passing an empty array (whether $entity->$field_name was actually set to an empty array, or just missing). From _field_invoke() :
As a result, taxonomy_field_update() acts as if hitting an empty field, and wipes the taxonomy_index records for the node. Similarly, file_field_update() wipes file usage records.
Not calling hook_field_update() on fields absent from the entity would not solve the taxonomy case, BTW - the 1st actual call to taxonomy_field_update() (if *one* taxo field is present) would still start by wiping the taxonomy_index records. Since the taxonomy_index table currently munges data from all taxo fields on the entity, I'm not sure how this would be fixed.
Comment #2
imiksusubscribing
Comment #3
damien tournoud commentedData loss = critical in my book.
About the taxonomy case, I think moving to entity-level hooks as suggested before (see #1050466-2: The taxonomy index should be maintained in a node hook, not a field hook) would fix this.
Comment #4
tcmug commentedAgree with the critical, also quite limiting now that I can't just update that one specific field. The performance hit for bigger entities might end up being noticeable on some setups & modules, and this is especially true for Radioactivity on large sites, but I guess theres always stuff you can do to bypass these kind of limitations.
Comment #5
catchTagging.
Comment #7
yched commentedLet's just see what the tests say if we don't run field hooks on missing $entity->field_name entries.
Comment #9
yched commentedHeh, stupid me. If _field_invoke() doesn't call the default 'form' callback on missing field entries, all entity creation forms fail, of course.
Let's try this one : don't invoke field-type 'presave' and 'update' hooks on missing field entries.
Just to see what happens - code is awfull, and I'm really not sure I like the idea to begin with.
Comment #10
yched commentedComment #12
yched commentedShould fix last warnings.
Comment #13
yched commentedGreen, back to needs work.
Comment #14
agi.novanta commentedsubscribing
Comment #15
ankur commentedI applied the patch in #12 and it seems to work for me.
I had both the issue of a node's records disappearing from {taxonomy_index} as well as a node losing all of its file field values when calling field_attach_update for other fields.
I tested this w/ 7.9
Comment #16
tim.plunkettTagging.
Comment #17
andypostIs there any steps to reproduce and write a tests?
Comment #18
swentel commentedReroll of the patch + also a test on file usage which proves the bug is still there.
Comment #19
moshe weitzman commented#18: 1255696-26.patch queued for re-testing.
Comment #20
yesct commentedwas that enough tests? (removing needs tests tag since it was added)
steps to reproduce would help people review this (steps to reproduce contributor task doc: http://drupal.org/node/1468198)
I think this is also still looking to get it's issue summary updated. adding link to contributor task for that in hopes of helping someone tackle that task: http://drupal.org/node/1427826
Comment #21
xjmThanks @YesCT! The closest thing to STR here are in the test @swentel provided, but I doubt it's reproducible entirely through the UI in HEAD.
What's happening here (I think, after lots of staring at this code) is that when you save a partial entity,
_field_invoke()tries to invoke hooks for ALL the fields that the entity has, not just the ones included in your partial entity. For example, if you try to update a node that has a value in a file field and don't explicitly include the file field's current value in the object, file_field_update() gets invoked with the incorrect input that the file field is empty. Hooks think that the field is empty, but the entity itself still thinks the field is populated, and so there's a data integrity problem.Alright, so now that I understand the bug, onto a patch review:
So, as far as I can see, we're doing some typing magic here. We initialize
$itemstoNULL, and if the field on the entity has no values, we then set$itemsto be an empty array in order to explicitly indicate it's empty? And then we check against that when deciding whether or not to invoke the hooks (because hooks should not be invoked for fields that nothing is being done to). And then the pre-existing behavior at the end, where we again do some divination with different varieties of emptiness to decide whether to update the entity's field value with$items.Um. That's confusing as all flapjacks. Can we refactor this somehow? Use a separate Boolean flag rather than relying on different variants of empty, or reorganize the logic a little?
I'm pretty sure there is no "preseave" op. :) That also means we don't have complete test coverage. Based on the information in the issue, this applies to taxonomy and file fields (at least, but probably any field that has hooks that take action when a field is empty!), and based on this line of code, it applies to presave and update operations. So, we'd need tests for all four combinations of those conditions... (Ideally, though, I think we'd have unit tests for all the different code paths instead, but given the fact that this whole API is on the cusp of two major conversions, I'm not going to recommend going crazy refactoring this to be more testable.)
I had to read a few times before I understood what was going on. Maybe just
// Handle other operations normally.above theelse. Edit: I take that back; "normally" isn't very helpful. I think the correct clarification is to refactor it.This is the pre-existing hunk, just moved into the conditional, but now I'm bending my brain trying to figure out how
$itemsis going to be other thanarray()or the field values. It can't beNULLorarray()here, and those are the only other two things we set it to beside the field values. So I think this hunk can be removed at least with the current logic.This should be a single line of 80 characters or fewer, beginning with (e.g.) "Tests" rather than "test". http://drupal.org/node/1354#functions
Nitpick: "Create a file field." (With an article.)
We can remove the
t()from the assertion message text. http://drupal.org/simpletest-tutorial-drupal7#tComment #22
xjmComment #23
no_angel commentedTagging: removed needs issue summary, added needs steps to reproduce.
Status: needs mentor review (my issue summary).
Comment #24
xjmThanks @no_angel! You can indicate that on the core mentoring site by marking your task attempt complete. :) On the Drupal.org issue, the needs work/needs review status refer only to the patch, so it's still NW for my post in #21.
A couple corrections: As I indicated in #21, this is not reproducible with HEAD through the UI, and @swentel's patch in #18 contains a test that reproduces it. One thing you could add to the summary is exactly what needs tests (my point 2 from #21). You could also add another point to the remaining tasks that the patch needs to be updated based on the rest of my feedback in that comment. Thanks for helping on this. :)
Comment #25
swentel commentedEven we change the code in
_field_invoke(), we still have a problem with files infile_field_update:This means that to make this work for files there are 2 options:
file_field_updateif the field is not set on the incoming entity.I honestly don't like both options, and the code is smelly in the patch (even if we refactor it).
There's 3rd option imho: document that calling node_save() - or $entity->save() in D8 which should (probably will in 99.99999% of all cases) be a full entity anyway - of a partial entity is potentially dangerous because not all properties are on the object.
And with full Field API merging into the new Field Entity API, at some point
field_attach_updatewill/should die anyway (if that merge happens of course).Comment #26
swentel commentedAdding tag
Comment #27
catchThat seems reasonable to me. Another thing with the new entity API is we were looking at tracking which properties changed before saving - so that this information can be passed to hook implementations etc. If we really wanted to allow partial updates that might be a way to force modules to skip acting on missing properties - only mark the ones being updated as changed.
Not sure where this leaves things for 7.x though.
Comment #28
catchWould be good to get this out of the 8.x critical list, and I still think just documenting that saving partial entities is unsupported is enough for 8.x.
Not sure where to put those docs though.
See also #1992010: Reverting to revisions prior to addition of field translations is broken which is very similar.
Comment #29
catchPatch.
Comment #30
swentel commentedTalked this through with catch on IRC, chx also confirmed that partial entities in D8 simply don't even exist anymore. And if you'd try, you're in trouble.
Comment #31
yched commentedTrue that. RTBC +1.
Comment #32
webchickOooookay, I guess we can do this.
Committed and pushed to 8.x. Thanks!
Comment #33
catchMoving back to 7.x. The docs fix won't work there so this is more 'active' than 'to be ported'.
Comment #33.0
catchissue summary update attempt.
Comment #34
fabianx commentedMaking a meta issue as there is a patch for the file issue.
Comment #39
stephencamilo commentedComment #40
hestenetReset issue status.