This issue is part of #2027583: [meta] Behavior when importing blank values

Original Report

My csv-import file does not contain all columns that are declared in the Node Feeds importer.
When I feeds the file to the Feeds importer (updating a previously created/imported node), all fields that are not in the import file are cleared from the Node, except for the taxonomy terms, which stay intact.
Is this expected behaviour?

My use case is this: My node type is populated from 3 different sources. I did not combine the 3 files to 1 new file, but I created a Feeds importer with 10 fields. Now I feed the 3 files to the importer. Each file contains 7+2+1 fields.

An alternative would be to create 3 importers for the same node type, but this is not supported, until Attach multiple importers to one content type is committed.

Comments

Title:Behavior when fields/columns are not in import file: do not clear field, but leave field untouched.Fields not present in import files are cleared.
Status:Needs work» Needs review
StatusFileSize
new1.94 KB

This patch only clears/initializes node fields if an appropriate source/column is provided in the feed/file. If a field is not in the source file, the field will be skipped (= not cleared) by Feeds.

Status:Active» Needs review
StatusFileSize
new1.94 KB

New attempt to upload the patch.

StatusFileSize
new1.99 KB

Patches #1 and #2 are invalid. Try this one.

changed patch to avoid warnings.

StatusFileSize
new3.57 KB

New version to support multiple-value fields.

N.B. If you use Feeds Tamper together with this patch, you need to apply #1145678: Too many calls to callback function in feeds_tamper_feeds_after_parse(), too.

Perhaps this patch should consider the option 'Replace/Update existing nodes'. Replacing nodes does mean clearing ALL fields.
My use case/tests only consider the option Update existing nodes.

This is a great feature and something I've had to hack my way around as well.

Thanks for the patch. Attached you will find an updated patch that works with the latest dev (August 2011)

Recent feeds version requires a new, attached patch.

This new patch only affects 1 file instead of 2.

Thanks, the patch works perfectly for me. Now I can update my nodes from multiple sources.

Category:bug» feature

Can this patch be ported to D6? So badly needed!!!

Category:feature» bug

oops...change back to bug report.. Sorry.

Status:Needs review» Closed (duplicate)

In some cases the replace/clear behavior is desired, so we should try to make this flexible:

#1107522: Framework for expected behavior when importing empty/blank values + text field fix has a good summary on this, and the plan so far is to add a config for each mapper (configs for mappers are almost ready now).

I'm re-opening this as a sub-task of #1107522: Framework for expected behavior when importing empty/blank values + text field fix, because each mapper needs to implement it's own options and logics.

Status:Closed (duplicate)» Needs work

Title:Fields not present in import files are cleared.Behavior when importing empty/404 files

Title:Behavior when importing empty/404 filesBehavior when importing empty fields

Status:Needs work» Closed (duplicate)

Indeed, I read the initial report, which was addressing the specific file problem, but the patch merged into the problem we face on #1107522.

Title:Behavior when importing empty fieldsBehavior when fields/columns are not in import file: do not clear field, but leave field untouched.
Status:Closed (duplicate)» Needs review

Let's keep this open.

The last patch from #11 makes it possible to import 'incomplete' files (e.g. not all columns are in the csv file / you DO NOT provide a field in the importer).
This offers the possibility to fill a content/entity type from 2 or more import files.
- Without this patch: missing field in importer is cleared in Entity.
- With this patch: missing field in importer is untouched in Entity.

#1107522: Framework for expected behavior when importing empty/blank values + text field fix considers the case when you DO provide a field, but the value is empty.
Indeed, that is a more general case, but requires a modification in each mapper.

Mmm, I'm still thinking we need to merge the efforts there. I know the work is a little stalled, I could use some help on the issues needing work on #1107522: Framework for expected behavior when importing empty/blank values + text field fix. I really wish to get a consistent behavior on Feeds on all those edge cases.

Franz, I'll test the other one when I upgrade my Feeds again.

Status:Needs review» Reviewed & tested by the community

#11 is working for me. See also #12, so RTBC.

Status:Reviewed & tested by the community» Needs work

Only now did I understand it properly.

+++ b/plugins/FeedsProcessor.inc
@@ -398,16 +386,29 @@ abstract class FeedsProcessor extends FeedsPlugin {
+      if (!empty($value)) {

One question. Doesn't this affect the *existing* field with empty value as well?

+++ b/plugins/FeedsProcessor.inc
@@ -398,16 +386,29 @@ abstract class FeedsProcessor extends FeedsPlugin {
+        // But we do this only for provided mappers, so $value was checked first.

This comment is confusing, shouldn't it be "provided sources"? Maybe a little more explanation to make it clear we're talking about a missing field and not an empty field. Still, I don't see how this code discerns one from the other.

Title:Fields not present in import files are cleared.Behavior when fields/columns are not in import file: do not clear field, but leave field untouched.
Status:Needs review» Needs work

I'll review the patches soon. The patch from this issue might be obsolete when #860748: Config for mappers? gets in.
It seems the most striking (only?) difference is that (!empty($value)) is centralized here, whereas other patches do this in each mapper.

On a side note: IMO the each mapper must be as clean a possible. For example, the following line could be done by Feeds, removing it from the mappers:

if (!is_array($value)) {
    $value = array($value);
  }

@johnv, I completely agree, and it has bother me a lot that mappers have to deal with multiple values, empty values, etc. I think a mapper should be able to do it, but not needed to. I already thought about turning mappers into a class that can be extended, maybe it's time to work on something along that way.

Hi,
#11 work for me, but reference field in field collection dont update :(

I haven't read this issue extensively yet, so sorry if I'm repeating things, or missing things. I do, however, have some thoughts on the issue.

The current behavior is like this:

  • Replace existing*: Entities are are saved using only the data in the feed. (Possible performance improvements, but the current implementation makes that irrelevant.) *All* fields on entity will be removed/replaced with data from the feed.
  • Update existing: Fields managed by Feeds are updated using data in the feed. If a field is managed by Feeds, but the data does not exist in the feed, then the field is deleted.

*Replace existing: Currently only for nodes, that's dumb.

What is desired is a new behavior:

  • Update existing, but if the field is not present in the feed, do not update the field.

From what I can gather, the existing behavior is the default expected behavior. The proposed behavior is useful, but tricky. Each parser has to take very special care to set its items correctly. I currently jump through a lot of hoops with the XPath parser to get this right.

Re: set target callbacks repeating themselves,

I totally agree. However, it needs to be tackled differently. Rather than make *any* assumptions about how a target expects its data, we should supply some generic target callbacks that work for, entity properties, single column fields, and multi-column fields. On top of this, we would need a validate callback that a target can implement, (we can supply some generic ones) that filters the values prior to mapping.

Unlike my promises in june/july, I haven't been able to update feeds and test all outstanding patches regarding 'missing fields in feeds sources'.

@Twistor, the (possibly outdated) patch from #11 does just that: "Update existing, but if the field is not present in the feed, do not update the field".
The complete node/entity is read, but only fields in the feeds are updated. I've been using that for a long time, having several feeds updating the same node - it seems to be working fine.
How would that affect the rest of the parser?

Right, the problem is expected behavior. #11 makes sense if you remove a column from your CSV file, and re-import. But if you have an RSS feed that you update, and the author tag is removed, I think it would make sense to remove it from the node. I could be wrong, but this is a pretty big change in behavior.

Your patch it seems, would also make it impossible to delete a field wouldn't it? How would you set it to NULL? Or a numeric field to 0? Just a note.

It only affects parsers in that they have to be careful to handle falsy values correctly.

Ad "patch it seems, would also make it impossible to delete a field."
You can delete a field by adding an empty column. Then the field should be reset.

Ad "I think it would make sense to remove [Author] it from the node?"
I am not sure about this use case. Why should you remove the author?

I see, with csv you have control over the columns you have/supply. (or, you can choose one of multiple importers for the same csv, each importer containing another subset of field mappers.)
But for RSS feeds, you have no control over the supplied fields, ofcourse.

I haven't seen the new 'config for mappers' yet. Does/Should it contain an option 'behaviour when not in source' ?

Err, doesn't the !empty() remove any chance of a NULL value being set in a field?

Let me put it this way, the expected behavior is that the data in your entities reflects the data in the feed. Anything outside of that is unexpected/extra.

Well... I've been leaning against using it for settings that could apply to every field. In this case I think that a processor level setting would suffice. But, I am also not very excited about adding yet another checkbox.

"doesn't the !empty() remove any chance of a NULL value.
Needs explicit testing. You might have a point. An empty column/field should empty the field, NO column should leave it untouched.

Version:7.x-2.x-dev» 7.x-2.0-alpha6
Status:Needs work» Needs review
StatusFileSize
new2.95 KB

Here is a re-rolled version of the patch from #11 that applies to the latest release of feeds. Not sure where this issue is headed, but I know that I use this functionality on a live site daily. Hopefully I didn't miss something in the conversation of this thread that indicates this patch is superfluous. If so, at least I got my "exercise" today :)

@TyrelDenison Thank you, this is an essential patch, for my purposes at least! I was just pulling out my hair, after installing feeds without this path, when I remembered this patch and reapplied.

One problem I have with it is that if I set a numeric value to anything other than 0, and then attempt to set the value to 0, the value will not change. I can set to anything else, but not zero. In fact, I assume it would probably not allow me to change the value to null either (I think it should). Any hints on where I can tweak the code to allow zero values?

That's because the patch use empty(), which returns TRUE for 0.

Issue summary:View changes

Linking to meta-issue

Version:7.x-2.0-alpha6» 7.x-2.x-dev
Category:bug» feature
Status:Needs review» Closed (works as designed)

Sorry.

All of this is getting a little carried away. This is not expected behavior, and would be very surprising if it were to change.

I have some ideas for a contrib module to handle this use case, but it does not belong in Feeds.

Issue summary:View changes

Updated issue summary.

Patch #36 tested with the currently official release 7.x-2.0-alpha8: works for me.

I tried a few patches found in the Feeds support threads. Only this could do both of the following:
- Not mess up values for existing items when the corresponding imported item, matched via GUID, does not have a value for a file field.
- Not mess up values for existing items when the corresponding imported item, matched via GUID, is skipped because of a filter set in Feeds Tamper.

I understand that Twistor is following a different approach, but a solution is not there yet so I believe it's ok to underline successful patches.