Updated: Comment #120

Problem/Motivation

Feeds importer adds nodes but there is no way to get rid of nodes that used to be available from importer source but aren't available anymore.

Proposed resolution

Keep list of all nodes imported by importer and optionally unpublish those that were not updated.

Remaining tasks

  • Tests

User interface changes

Added 'Update entities missing in the feed' radio element to processor configuration from.

API changes

New clean() method added to FeedsProcessor

Original report by PsycleInteractive

I have re-created the patch for this to work with 6.x-1.0-beta11, this comes from another issue see http://drupal.org/node/1240366.

I have tried to clean up some of the code styling issues that the original patch had, and will be working on a D7 version when i get some time to do it.

CommentFileSizeAuthor
#237 feeds-remove_nodes_not_in_feed-1470530-237-D6.patch6.92 KBfuerst
#204 interdiff-1470530-203-204.txt3.76 KBMegaChriz
#204 feeds-unpublish-delete-entities-not-in-feed-1470530-204.patch43.49 KBMegaChriz
#203 interdiff.txt2.76 KBstefan.r
#203 feeds-unpublish-delete-entities-not-in-feed-1470530-203.patch43.28 KBstefan.r
#202 feeds-unpublish-delete-entities-not-in-feed-1470530-202.patch40.44 KBstefan.r
#202 interdiff.txt426 bytesstefan.r
#196 interdiff.txt2.48 KBstefan.r
#196 feeds-unpublish-delete-entities-not-in-feed-1470530-196.patch40.43 KBstefan.r
#178 feeds-unpublish-delete-entities-not-in-feed-1470530-178.patch40.21 KBstefan.r
#178 interdiff.txt6.96 KBstefan.r
#168 interdiff-1470530-161-168.txt1.87 KBvinmassaro
#168 feeds-unpublish-delete-entities-not-in-feed-1470530-168.patch44.26 KBvinmassaro
#161 interdiff.txt497 bytesstefan.r
#161 feeds-unpublish-delete-entities-not-in-feed-1470530-161.patch44.26 KBstefan.r
#160 feeds-unpublish-delete-entities-not-in-feed-1470530-160.patch44.27 KBstefan.r
#160 interdiff.txt522 bytesstefan.r
#159 feeds-unpublish-delete-entities-not-in-feed-1470530-157.patch44.18 KBstefan.r
#158 interdiff.txt535 bytesstefan.r
#157 unpublish-delete-entities-not-in-feed-1470530-157.patch44.18 KBstefan.r
#154 unpublish-delete-entities-not-in-feed-1470530-153.patch44.24 KBstefan.r
#154 interdiff.txt38.28 KBstefan.r
#145 unpublish-delete-entities-not-in-feed-1470530-145.patch38.16 KBstefan.r
#145 interdiff.txt1.97 KBstefan.r
#143 unpublish-delete-entities-not-in-feed-1470530-143.patch38.59 KBstefan.r
#143 interdiff.txt5.32 KBstefan.r
#134 interdiff.txt1.65 KBkostajh
#134 unpublish-delete-entities-not-in-feed-1470530-134.patch37.98 KBkostajh
#130 nodes_entities.png89.78 KBmikran
#125 tests_added-1470530-125.patch37.65 KBmikran
#125 interdiff.txt30.63 KBmikran
#123 feeds-unpublish_delete_nodes_not_included_in_feed-1470530-123.patch9.53 KBbyronveale
#114 feeds_remove_entities_not_in_feed-1470530-114.patch11.25 KBgnucifer
#113 feeds_remove_entities_not_in_feed-1470530-113.patch8.5 KBgnucifer
#109 feeds_remove_entities_not_in_feed-1470530-109.patch8.37 KBmikran
#109 interdiff.txt1.35 KBmikran
#95 feeds_remove_entities_not_in_feed-1470530-94.patch7.02 KBimclean
#92 feeds_remove_entities_not_in_feed-1470530-91.patch7.02 KBimclean
#89 feeds_remove_entities_not_in_feed-1470530-89.patch7.1 KBimclean
#87 feeds_remove_entities_not_in_feed-1470530-87.patch7.26 KBimclean
#86 feeds_remove_entities_not_in_feed-1470530-86.patch7.27 KBimclean
#65 feeds-1470530-65.patch7.39 KBMithrandir
#53 feeds-1470530-53.patch7.24 KBstar-szr
#53 interdiff.txt4.63 KBstar-szr
#52 feeds-1470530-52.patch7.24 KBstar-szr
#45 remove-entities-not-in-feed-1470530-38.patch7.4 KBjanchojnacki
#37 remove-entities-not-in-feed-1470530-37.patch7.26 KBGaëlG
#36 remove-entities-not-in-feed-1470530-36.patch5.46 KBGaëlG
#33 feeds-remove-nodes-not-in-feed-1470530-33.patch6.07 KBnrambeck
#32 feeds-remove-not-included-nodes-1470530-32.patch6.42 KBGaëlG
#19 feeds-remove_nodes_not_in_feed-1470530-19.patch7.02 KBdbassendine
#2 feeds_D7_unpublish_delete_nodes_removed_from_feed.patch6.52 KBriho
delete_unpublish_nodes_nolonger_in_feed.patch6.11 KBPsycle Interactive
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

arrubiu’s picture

Subscribe :)

riho’s picture

Version: 6.x-1.0-beta11 » 7.x-2.x-dev
FileSize
6.52 KB

I ported your code to 7.x-2x-dev version, but it might need some fine tuning here and there.

katherinecory’s picture

I've patched my copy of the latest 7x dev version, re-imported my content and I received no errors. As far as I know the feed didn't change so no nodes were deleted.

Psycle Interactive’s picture

Thanks riho, have not had the time to create to port myself. I will do a test with your patch, just to clarify that the delete is working based on the comment from katherinecory.

yannickoo’s picture

Version: 7.x-2.x-dev » 6.x-1.0-beta11
Status: Needs review » Reviewed & tested by the community

I can confirm that the patch for drupal 6 works. Now any drupal 7 user should review the path from #2.

yannickoo’s picture

Status: Reviewed & tested by the community » Needs review

Strange behavior... I tested the module with the patch included today and saw strange behavior.

Created 3 News nodes.
News uehfowehfuo has been deleted.
News Testnews has been deleted.
News Irgendwas has been deleted.
3 node(s) removed

Why does the module removes the nodes after importing them first time?!

yannickoo’s picture

Status: Needs review » Needs work

Ah okay, I founded the problem: When you select "Remove non existant nodes" and import a feed, all new feed items will be deleted.

Anyone know how to fix that?

riho’s picture

Does that happen in the 6.x-1.0-beta11 version or 7.x-2.x-dev or both?
Also discovered that in case of "cURL error (52) Empty reply from server" error the patch deletes all nodes. Working on a fix for that in 7.x-2.x-dev version.

yannickoo’s picture

I didn't test the 7.x-2.x-dev version but this problem happens in the 6.x-1.0-beta11 and I don't know how to fix that.

balintk’s picture

The patch in #2 works great.

DannyPfeiffer’s picture

I can confirm the deletion of new nodes problem in the patch for 6.x-1.0-beta11 version.

The problem is that newly imported nodes are *not* present in the array $importer->processor->processed_nids but *are* present in the array $importer->processor->getFeedNids($source)

That triggers them for deletion, which is incorrect. Unfortunately I'm not familiar enough with feeds to identify an immediate solution - anyone?

alCHE’s picture

Did someone tested this on D7? Can someone confirm if it works?

aweizd’s picture

The patch #2 work with 7.x-2.0-alpha5. I've just tested it for a couple of minutes, but it seems to work. No problems so far. Hope this ends up in the modules code soon.

bsevere’s picture

I have installed the patch from #2 on version 7.x-2.0-alpha5. So far all processor settings are working as expected. I tried to recreate the problem mentioned in #6 and #7 but could not.

This patch solves a big problem for me. Thanks!

yannickoo’s picture

Hey smoobv, thanks for testing but my problem appeared in the drupal 6 version.

alCHE’s picture

Thank you very much for this patch!
It would be great to add it into module code.

yannickoo’s picture

So we should the issue status as RTBC but we need to fix this for drupal 6 first.

aweizd’s picture

After a couple days of testing I did find a strange behaviour in 7.

My .csv has 153 entries. When I import it, feeds only fetches the first 50 items and display "Importing... 33%". This doesn't change until the next cron run. So I run cron and go back to the import page, now it says "Importing... 64%", so the next 50 nodes where processed. Next cron run I get "Importing... 98%" and another 50 items processed. So far so good, I have 150 nodes imported. Now this is where the "magic" happens. I still have 3 unimported items in my csv. I run cron, it fetches and imports the 3 entries, but at the same time deletes the 150 items imported earlier as if they weren't in the csv at all.

I'll try increasing the import limit for feeds, so it fethces the entire csv file not just 50 item chunks. Let you know if it helps.

EDIT: So I increased my import limit to 200 and feeds imports the entire file at once. I also have the feed importer setup to import every hour.
When I import via the import form everything works fine, but when the import is trigger by cron then every entry in deleted. It seems patch #2 doesn't get a proper list of nids when trigger by cron, not the import form. I don't know much about the feeds API, so it might take a while for me to figure this out. And help would be greatly appreciated.

dbassendine’s picture

Version: 6.x-1.0-beta11 » 6.x-1.x-dev
FileSize
7.02 KB

I've worked on the "deleting new nodes" behaviour described in #7 and #11.

This patch is against 6.x-1.x-dev, and makes a couple of changes:
* In FeedsNodeProcessor.inc, getFeedNids - I have added an extra condition for where the feed being used has an "id"
(ie. import form) rather than "feed_nid" (ie. a feed node)
* In feeds.module - I added a hook_feeds_after_parse to pull out the existing feed nodes *before the import takes place. Previously, this list of pre-existing nodes was drawn from the database *after* import had occurred, by which time the database had already been updated
* In feeds.module, hook_feeds_after_import - I made the logic for forming the list of nodes to remove clearer. I run through the "pre-existing nodes" list and if the nid isn't in the "incoming nodes" list, then I mark it for removal (or unpublish).

It would be great if people could test this. I have have tested for: new rows added, rows deleted, and a combination of both. I have tested using the import form but not using a feed node, although there shouldn't be any difference.

Thanks, David

celstonvml’s picture

Patch from post #2 works just fine for me against version 7.x-2.0-alpha5.

DannyPfeiffer’s picture

David,

This seems to resolve the problem I was having with the earlier patch. Thanks so much for your work ! I'll keep on eye on this for a while, but it seems to have done the trick.

-danny

stylesuxx’s picture

Patch from #2 works great. I would also love to see this implemented.

yannickoo’s picture

Status: Needs work » Reviewed & tested by the community

It seems like the patch form comment #2 works fine for the current 7.x-2.0-alpha5 version and the patch from #19 for 6.x-1.x-dev.

Let's mark this as RTBC.

bjalford’s picture

Tested #2 as well and works as expected.

twistor’s picture

Version: 6.x-1.x-dev » 7.x-2.x-dev
Assigned: Psycle Interactive » Unassigned
Status: Reviewed & tested by the community » Needs work

This will need to go into 7.x first. It will need tests.

It is a commonly requested feature that I would happily accept. I would appreciate it if someone could consolidate the various issues into one. Whichever one is furthest along.

clifmo’s picture

I had to split patch from #2 into separate patches for the files in different directories, for whatever reason they couldn't be found on their own.

I get the options to skip, unpublish, or delete nodes missing from the feed, but adding a dummy node and then running the import does not alter the extraneous node. Using 7.x-2.0-alpha5

jos_s’s picture

I used the patched from #2 and on 7.x-2.0-alpha5 and then got this error for any page I tried to retrieve:

PHP Fatal error: Cannot redeclare feeds_feeds_after_import() (previously declared in /sites/all/modules/feeds/feeds.module:763) in /sites/all/modules/feeds/feeds.module on line 893

Should it work on 7.x-2.0-alpha5 at all? Or should I try the dev branch?

steven.be’s picture

Hi,

Will this work on 7.x-2.0-alpha6 as well?

Thanks,

filip.hoeven’s picture

Applied the changes from the patch in #2 to "7.x-2.0-alpha7" of "feeds" and it works perfectly.

Please note that I did these changes manually by looking at the patch (because I'm not familiar with how to do this automatically and it didn't work for me in Eclipse).
I noticed one difference where the original source has changed where the extra code should be placed inside "FeedsProcessor.inc":

      // Check if this item already exists.
      $entity_id = $this->existingEntityId($source, $parser_result);

      if ($entity_id != 0) {
        $this->processed_nids[] = $entity_id;
      }

      $skip_existing = $this->config['update_existing'] == FEEDS_SKIP_EXISTING;

As I read a lot of people saying it works, it hope it is added to the standard feeds code very soon.

GaëlG’s picture

@filip.hoeven A patch against latest 7.x-dev version is needed for this to get included in the next Feeds release. Can you submit it?
See http://drupal.org/project/feeds/git-instructions for more info on how to create patches.

Erik.Johansson’s picture

I have run in to some issues while testing the patch from #2. When I start a import with "Remove non existant nodes" option all my nodes gets deleted. When all nodes have been deleted feeds will start to import them all over again. I have multiple importers that triggers in a specific order so my scenario is quite special.
Anyone else that have noticed this behavior?

GaëlG’s picture

Status: Needs work » Needs review
FileSize
6.42 KB

Here's the patch according to #29 instructions. Works for me.

nrambeck’s picture

Patch #32 works for me, but created a few PHP notices. I've re-rolled the patch to define the 3 new CONSTANTS, remove some debug code and standardize code formatting.

GaëlG’s picture

Status: Needs review » Needs work

Actually I face the same problem as Erik.Johansson. I suppose #19 is the answer.

dbassendine’s picture

I've been testing my patch #19 for 6.x-dev quite extensively for some client work against quite some large imports (~3000 rows), and this has now moved into production. Thanks everyone else for testing, glad it's working for you too.

If I find some time, I'll check if the changes I made in 6.x can be ported over to 7.x too, as the bug reports for 7.x look quite similar.

GaëlG’s picture

Status: Needs work » Needs review
FileSize
5.46 KB

I don't know for D6 but in D7 the global variable $existingNodes will be lost between batches. I tried to create a new patch from scratch, using some pieces of different previous patches and the logic of #19, i.e. store a list of previously existing entities before actual import.
It works for any entity types, not just nodes, and delete not found items. Unpublish is something specific to node entity type, I'll post a new patch soon including an overwrite of FeedsProcessor::clean() in NodeProcessor.
I tested it in different use cases and it seems to work well.

GaëlG’s picture

As promised, this one includes unpublish feature for nodes. Untested but should work.

FreeFox’s picture

I tested the patch in #37 against current dev. Works great !!

Many thanks to all contributors.

kingandy’s picture

Likewise, tested #37 against current dev; applies smoothly, intended behaviour appears to be implemented (our nodes are now unpublishing themselves).

yannickoo’s picture

Status: Needs review » Reviewed & tested by the community

Okay I think that is enough for RTBC. After the patch is applied we have to do the 6.x stuff.

kingandy’s picture

Since installing the patch in #37 I'm occasionally experiencing items being imported as unpublished - I suspect these are items with a GUID matching items which were previously imported, then absent from the import file (therefore unpublished), and then reintroduced.

It's probably an edge case, and in any event I'm not sure this is something that it would be Feeds' responsibility to do anything about. Certainly it makes sense that when "updating" an existing node, Feeds would not touch the published status unless explicitly instructed to ... it's just ever-so-slightly counter-intuitive to have things automatically unpublished when absent, but not republished when found.

Possibly there could be another setting to control this, but it may be best to just leave it up to admins to set up a field mapping.

joeysantiago’s picture

I agree with #41. It would be nice to have a feature like republishing nodes. But, kingandy, what do you mean by "leave it up to admins to set up a field mapping"? is there a workaround by setting some field mapping in order to publish nodes back?

Plus, i have a problem. Sometimes i can't reach my feed, so it marks all of the nodes to be unpublished (i guess its assumption is "no feed" = "no nodes" = "unpublish all"). It would be instead cool to catch the error and don't do anything in this case :)

I am using old version (7.x-2.0-alpha5), so it's quite useless to create a patch since the actual version changed quite a lot i just found out. I patched my code for the "unreachable feed bug" by adding some lines to function feeds_feeds_after_import(FeedsSource $source). The lines are:

  // importer does not exist, probably http error so drop out
  if (!isset($source->importer)){
    return;
  } 

I hope i'll have time to create a patch for the actual version soon.

kingandy’s picture

I was thinking that since Published status is available as a mapping destination for nodes, it should be possible to force updated items to be published. Looks like it might not be as straightforward as I thought at first, though. It might still be possible with Feeds tamper or similar, but I'm definitely leaning towards the notion of including "publish on import" as an option in the node processor.

stuwat’s picture

I haven't road-tested it extensively, but patch #37 applied to 7.x-2.0-alpha7, appears to successfully delete nodes from the destination that no longer exist in the source feed. Thanks.

janchojnacki’s picture

It failed to apply patch #37 for 7.x-2.0-alpha7

yannickoo’s picture

Use the development version for applying the patches, not the current stable release...

twistor’s picture

Status: Reviewed & tested by the community » Needs work

Parts of this need to be moved into FeedsProcessor so that it is more generic.

Also, and this is a biggie, the delete operation needs to batch. The current implementation will not scale.

nedjo’s picture

dudde’s picture

I applied the patch #38 on 7.x-2.0-alpha7 and it's working. Thank you!

star-szr’s picture

This still needs tests as well, per #25.

@twistor/#47 - what parts need to be moved into FeedsProcessor? At a glance, the only code added to FeedsNodeProcessor is the unpublish functionality.

Edit: Looking at it again the FeedsNodeProcessor perhaps should only handle unpublishing, and hand the rest off to the parent method.

star-szr’s picture

Title: Unpublish/Delete nodes not included in feed, updated for beta11 » Unpublish/Delete nodes not included in feed
star-szr’s picture

Status: Needs work » Needs review
FileSize
7.24 KB

First of all, rerolled to fix conflict with #1711648: Abstract bundle handling..

star-szr’s picture

FileSize
4.63 KB
7.24 KB

…and corrected spelling of 'existant' to 'existent' throughout.

star-szr’s picture

Status: Needs review » Needs work

And regardless of the test results, back to 'needs work'.

cdmo’s picture

I tested #53 on 7.x 2.x dev just now in a sandbox and it worked.

abdul.muiz.meer’s picture

This patch looks to work fine but actually what's happening is, it creates all the nodes again and delete all previous nodes created by feed, which removes all nodes which are not present in CSV but on every import changes the nid, which is not good.
I think this is because the following check is not working in "FeedsProcessor.inc" and function "public function process(FeedsSource $source, FeedsParserResult $parser_result)".

// If it's included in the feed, it must not be removed on clean.
      if ($entity_id) {
        unset($state->to_be_removed[$entity_id]);
      }
abdul.muiz.meer’s picture

yannickoo’s picture

CSV? Do you mean in the Git repository?

cdmo’s picture

I think #56 is referring to a feed where the source is a comma separated values document.

yannickoo’s picture

Oh, confused CSV with CVS :D

gordon’s picture

I have created a contrib module which will do this. see http://drupal.org/node/1939256

There are 3 patches left to get into feeds, and I will make it a real project.

How ever one of the patches will not get into feeds, so I am investigating ways to get around this.

abdul.muiz.meer’s picture

This patch works fine with "user processor" but making one of the field as unique is neccessary for that.
Problem mentioned in #56 is in "node processor" only.

Mithrandir’s picture

I'll try the patch from #53, but will also +1 on the notion that if the feed retrieval fails or is empty, no nodes should be unpublished or deleted. I'll see if I can create a patch for this, but wouldn't mind if someone beat me to it :)

Mithrandir’s picture

7.x-2.x-dev as of today April 25th, the patch from #53 cannot be applied directly, since it will not update plugins/FeedsProcessor.php

I'll make a manual merge and re-package a patch. I also think I have found a solution to the "Only remove nodes, if there are items in the feed". As I see it is just a matter of checking the count of items in the parser result before marking nodes for removal - no solution to the "Fetch in several requests" challenge, though.

// FeedsProcessor::process()
if (!isset($state->to_be_removed) && count($parser_result->items) > 0 ) {
  $this->initEntitiesToBeRemoved($source, $state);
}

Comments?

Mithrandir’s picture

FileSize
7.39 KB

And here's the combined patch of #53 applied to the current 7.x-2.x with my above check for items in the result before marking entities to be removed.

DamienMcKenna’s picture

Status: Needs work » Needs review
aleada’s picture

Is included the #65 patch inside the latest 7.2 branch?
If not when it will be part of the latest released version?
Thanks

aleada’s picture

Version: 7.x-2.x-dev » 7.x-2.0-alpha8

I tried to apply patch #65 to 7x-2.0-alpha8 and the output is the following

amatech@ubuntu:~/Desktop/feeds$ patch -p1 < feeds-1470530-65.patch 
patching file includes/FeedsSource.inc
Hunk #1 FAILED at 90.
1 out of 1 hunk FAILED -- saving rejects to file includes/FeedsSource.inc.rej
patching file plugins/FeedsNodeProcessor.inc
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 204.
Hunk #3 FAILED at 378.
3 out of 3 hunks FAILED -- saving rejects to file plugins/FeedsNodeProcessor.inc.rej
patching file plugins/FeedsProcessor.inc
Hunk #1 FAILED at 9.
Hunk #2 FAILED at 176.
Hunk #3 FAILED at 264.
Hunk #4 FAILED at 290.
Hunk #5 FAILED at 311.
Hunk #6 FAILED at 538.
Hunk #7 FAILED at 597.
7 out of 7 hunks FAILED -- saving rejects to file plugins/FeedsProcessor.inc.rej

What I am doing wrong?

star-szr’s picture

Version: 7.x-2.0-alpha8 » 7.x-2.x-dev

@amatech - you should apply patches against the latest dev version.

yannickoo’s picture

"latest dev version" – I would clone the git repository. That is the best way.

git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git

dgastudio’s picture

yannickoo’s picture

Status: Needs review » Needs work
+++ b/plugins/FeedsProcessor.incundefined
@@ -311,6 +334,56 @@ abstract class FeedsProcessor extends FeedsPlugin {
+  ¶

Oh, whitespaces.

+++ b/plugins/FeedsProcessor.incundefined
@@ -597,6 +671,17 @@ abstract class FeedsProcessor extends FeedsPlugin {
+    ¶

Whitespaces again

Mithrandir’s picture

Seriously? No comments to logic or working code or anything, because of two whitespaces?

yannickoo’s picture

Sorry but I had no time to test it and I mark that as needs work because with these two whitespaces the patch cannot get committed :/

yannickoo’s picture

A small thing which I saw right now:

+++ b/plugins/FeedsProcessor.incundefined
@@ -176,11 +179,18 @@ abstract class FeedsProcessor extends FeedsPlugin {
+    if (!isset($state->to_be_removed) && count($parser_result->items) > 0 ) {

Why do you check for greater than zero? Why not only count($parser_result->items)?

Mithrandir’s picture

It's fair to mention the whitespaces, I just felt "They won't even LOOK at the code or test it, because the automated test showed two non-blocking issues", which was a little downing :) especially considering that I just wanted feedback on the logic.

I think it is an old habit of mine to check for greater than zero when using count. Now that I think about it, I can't place where it comes from... perhaps it is just that when I "read the code aloud" in my head, I feel that count($something) should BE something and to highlight that it is a numeric value. Perhaps it is a little too explicit for what is necessary.

twistor’s picture

he, you don't need the count at all.

More importantly, this needs to batch the deletes. Also, isn't there a use-case for importing an empty feed and having all the items unpublished/deleted?

[Fixed typo]

Mithrandir’s picture

Also, isn't there a use-case for importing and empty feed and having all the items unpublished/deleted?

Am I reading the typo correctly? Should it be "isn't there a use-case for importing an empty feed" ?

In that case, I guess there would be a possible use-case for removing all published items... but if so, we'll need very clear way of separating faulty feeds from empty feeds - I see quite often that a faulty feed is correct XML and all, but just a different structure with an error message in it... I don't know if there is an easy way to detect that with the usual methods of parsing (XPath etc.) when you don't know the error structure - I very rarely know how a faulty feed from a partner looks, until it happens, so I would quite often not be able to write e.g. an XPath statement to fetch an error message.

DamienMcKenna’s picture

Reading through the patch I agree with the current option but would recommend improving the option's description to note that selecting the 'delete' option could have unexpected results, that unpublishing would be much safer. I would suggest that the system should fail if the import file is blank, there's no way of the system knowing if it's blank on purpose or by accident, and the last thing you want to do is wipe out records because of a mistake elsewhere.

twistor’s picture

Issue tags: +Needs tests

Poking around at this, and we can't batch at this point because of #1363088: Feeds batch process not fully implemented - needed for large imports. So I'm willing to take this as-is. But, it does need a couple of tests.

+++ b/includes/FeedsSource.incundefined
@@ -90,6 +90,11 @@ class FeedsState {
   /**
+   * IDs of entities to be removed.
+   */
+   public $to_be_removed;
+

to_be_removed should be removeList or something camelCase.

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -204,6 +206,7 @@ class FeedsNodeProcessor extends FeedsProcessor {
+    $form['update_non_existent']['#options'][FEEDS_UNPUBLISH_NON_EXISTENT] = 'Unpublish non existent nodes';

Missing t().

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -378,4 +381,35 @@ class FeedsNodeProcessor extends FeedsProcessor {
+/**
+   * Overwrites FeedsProcessor::clean() to allow unpublish instead of delete.

Overrides*

Overrides FeedsProcessor::clean(). Should be the first line.

The comment should come after.

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -378,4 +381,35 @@ class FeedsNodeProcessor extends FeedsProcessor {
+    $info = $this->entityInfo();

Unused.

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -378,4 +381,35 @@ class FeedsNodeProcessor extends FeedsProcessor {
+    // We clean only if needed.
+    if (!isset($this->config['update_non_existent']) || $this->config['update_non_existent'] == FEEDS_SKIP_NON_EXISTENT) {
+      return;
+    }
+
+    $total = count($state->to_be_removed);
+    if ($total) {
+      if ($this->config['update_non_existent'] == FEEDS_DELETE_NON_EXISTENT) {
+        $this->entityDeleteMultiple($state->to_be_removed);
+        $state->deleted += $total;
+      }
+      else if ($this->config['update_non_existent'] == FEEDS_UNPUBLISH_NON_EXISTENT) {
+        foreach ($state->to_be_removed as $nid) {
+          $node = node_load($nid);
+          if ($node->status == 0) {
+            continue;
+          }
+          node_unpublish_action($node);
+          node_save($node);
+          $state->updated ++;
+        }
+      }
+    }

This should check for the unpublish setting, and if it's not there, pass to the parent::clean() method.

Also, the nodes should be bulk loaded with node_load_multiple().

+++ b/plugins/FeedsProcessor.incundefined
@@ -176,11 +179,18 @@ abstract class FeedsProcessor extends FeedsPlugin {
+    if (!isset($state->to_be_removed) && count($parser_result->items) > 0 ) {

No need to the count(). And empty array will be falsy.

+++ b/plugins/FeedsProcessor.incundefined
@@ -311,6 +334,56 @@ abstract class FeedsProcessor extends FeedsPlugin {
+  /**
+   * Initialize the array of entities to remove with all existing entities
+   * previously imported from the source.
+   * Partially copied from clear() function.
+   * @todo Pool this?

This needs to be cleaned up. I don't know what pooling is. Also, needs the method needs type checking.

+++ b/plugins/FeedsProcessor.incundefined
@@ -311,6 +334,56 @@ abstract class FeedsProcessor extends FeedsPlugin {
+  protected function initEntitiesToBeRemoved($source, $state) {
+    $state->to_be_removed = array();

Missing type in arguments.

+++ b/plugins/FeedsProcessor.incundefined
@@ -311,6 +334,56 @@ abstract class FeedsProcessor extends FeedsPlugin {
+  /**
+   * Delete entities which were not found on process.
+   * @todo batch delete?

Deletes* and a newline before the todo.

+++ b/plugins/FeedsProcessor.incundefined
@@ -311,6 +334,56 @@ abstract class FeedsProcessor extends FeedsPlugin {
+  protected function clean($state) {
+    $info = $this->entityInfo();

Argument type and unused entityInfo().

hyperidler’s picture

Subscribing

hyperidler’s picture

I've been trying out the #65 patch provided by Mithrandir for several days importing a test Google calendar I created and so far no problems and it works to delete nodes that are no longer in the feed.

thanks,

dan2k3k4’s picture

Using patch in #65 by Mithrandir, had to clone the git repo so using 7.x-2.x and it seems to work well.

I have 2 importers that use JSON Path parser to import the JSON feeds to 2 different content types, set to check and parse the feeds every hour (with errors reported back to email) so will let you know if any errors occur but so far so good.

g089h515r806’s picture

Apply #65 success with 2 warnings
"2 line add whitespace errors".
"plugins/FeedsProcessor.inc has type 100644, expected 100755".

have not test it yet.

yannickoo’s picture

That is what I noticed in #72.

imclean’s picture

Status: Needs work » Needs review
FileSize
7.27 KB

This addresses most of the issues raised by twistor in #80. It still needs tests and possibly tidying up of some comments.

imclean’s picture

Removed some whitespace.

Status: Needs review » Needs work

The last submitted patch, feeds_remove_entities_not_in_feed-1470530-87.patch, failed testing.

imclean’s picture

imclean’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, feeds_remove_entities_not_in_feed-1470530-89.patch, failed testing.

imclean’s picture

Status: Needs work » Needs review
FileSize
7.02 KB

Missed one. How do I run these tests locally before uploading?

Status: Needs review » Needs work

The last submitted patch, feeds_remove_entities_not_in_feed-1470530-91.patch, failed testing.

twistor’s picture

To run the tests locally enable the "Testing" module, "simpletest" from Drush. Then go to admin/config/development/testing.

imclean’s picture

Status: Needs work » Needs review
FileSize
7.02 KB
yannickoo’s picture

Status: Needs review » Needs work

Thank you imcelan, patch looks fine to me. I found two minor things. I would like to set the status to "Needs work" because the batch deleting is missing.

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -378,4 +381,30 @@ class FeedsNodeProcessor extends FeedsProcessor {
+      foreach ($nodes as &$node) {

We should add a batch here.

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -378,4 +381,30 @@ class FeedsNodeProcessor extends FeedsProcessor {
+    // Delegate to parent if not unpublishing or option not set

Full stop is missing.

+++ b/plugins/FeedsNodeProcessor.incundefined
@@ -378,4 +381,30 @@ class FeedsNodeProcessor extends FeedsProcessor {
+      foreach ($nodes as &$node) {

I don't think that it's necessary to reference $node because you are passing the whole node object to node_unpublish_action and node_save.

+++ b/plugins/FeedsProcessor.incundefined
@@ -313,6 +336,59 @@ abstract class FeedsProcessor extends FeedsPlugin {
+   * @todo batch delete?

Currently all entities are deleted once right, so we should figure this out.

zuernBernhard’s picture

Love to see this in the module ... can i suppose that it is added if i do the mentioned points above for the last patch ?

-> Add batch processing
-> Fix Comments
-> ... ??

vinmassaro’s picture

@zuernBernhard: if you can add what needs to be done in #96 then we can test the patch and get it moving. I would also like to see support for this.

Pol’s picture

I will try to provide a new patch with batch support tomorrow because we need this feature.

Pol’s picture

Status: Needs work » Needs review

Hello Yannickoo,

Could you comment on this, this is the method clean() and unpublishNode() from plugins/FeedsNodeProcessor.inc

  /**
   * Overrides FeedsProcessor::clean()
   * Allow unpublish instead of delete.
   *
   * @param FeedsState $state
   */
  protected function clean(FeedsState $state) {
    // Delegate to parent if not unpublishing or option not set
    if (!isset($this->config['update_non_existent']) || $this->config['update_non_existent'] != FEEDS_UNPUBLISH_NON_EXISTENT) {
      return parent::clean($state);
    }

    $total = count($state->removeList);
    if ($total) {
      $nodes = node_load_multiple($state->removeList);
      foreach ($nodes as &$node) {
        $operations[] = array(
          $this->unpublishNode,
          array(
            $node,
            t('(Unpublishing node #@nid)', array('@nid' => $node->nid))
          )
        );
        $state->updated++;
      }

      $batch = array(
        'title' => t('Unpublishing nodes'),
        'operations' => $operations,
        'progress_message' => '',
      );
      batch_set($batch);
    }
  }

  protected function unpublishNode(&$node) {
    // Should we split this function in 2 operations ?
    if ($node->status == 0) {
      return;
    }
    node_unpublish_action($node);
    node_save($node);
  }

I'm not used to work with batch, so bear with me :)

If it's ok, I'll polish it and do a patch.

Thanks !

Pol’s picture

Hi all,

I woke up this morning and see that all the nodes were unpublished using patch #95, so I suppose that there is an inconsistencies in the patch.
EDIT: Sorry about that, I made further test, the patch is perfectly ok.

Is there people using this patch and having problems and/or success stories ?

Another question comes in my mind: what to do if one of the unpublished nodes appears again in the imported stream ? Should we change it's state to 'published' again ?
Edit: This is easy to fix by mapping a field to the published state field of the node and the job is done.

Thanks!

imclean’s picture

Status: Needs review » Needs work

Hi Pol,

Thanks for your work.

Please don't add/change information in by editing your post as it makes it hard to follow (especially for people who receive email notifications). Just post again with the updated or new information.

Is there people using this patch and having problems and/or success stories ?

Please create a patch against the version in git for us to test.

Also see Making a Drupal patch with Git.

byronveale’s picture

Status: Needs work » Needs review

I just tried the patch in #95, and it worked a treat! We currently have 882 items in our feeds_item table; with this last test, it correctly identified and unpublished the 121 entries that were no longer in the feed.

And having the option to un-publish instead of deleting the nodes totally saves our content manager from extra work; otherwise, he would have lost the additional info entered on older feeds nodes, nodes that have been superseded by newer feed entries (changing feed source, long story, can provide details if you _really_ want them...).

Thanks for everyone's efforts on this.

Is there anything I can do to help get this into code? At least the dev version? We have a client that this feature is just perfect for, and IMHO, really makes Drupal shine...

(Note: yes, I did edit this comment, sorry, but only to clarify what it was I tried...)

byronveale’s picture

Status: Needs review » Needs work

Further apologies -- I didn't mean to change the status when I posted my comment; I hadn't refreshed my browser since yesterday afternoon, and so missed imclean's comment #102.

imclean’s picture

#96 lists some work to do. There are 3 minor issues + the batching to solve.

eugene.ilyin’s picture

Hm, if node will not exists in next file for import, it will be unpublished. But what if same node will be available again in next import? Maybe need to set

$node->status = 1;

for all updated nodes in function process()?

Pol’s picture

Hi @Eugene,

I had the same problem and I got it fixed by adding a new mapping to the status field.
I also had to modify the feed and include a new data 'status' too.

You can see the result here: http://jobs.trasys.be/

All the jobs you see there are imported by feeds. They are retrieved using a callback (Feeds Callback Fetcher), the callback is using the Taleo PHP Library to get the vacancies from the company I work for.

For an unknown reason that I cannot understand, sometimes, all the jobs are unpublished. Maybe the feeds returned is empty, but still, it should publish them automatically at the next cron (every 15 min) but it doesn't do it.

I'm still looking...

gnucifer’s picture

@imclean I can't see why the the join on line 356 is necessary?

    $info = $this->entityInfo();
    $select = db_select($info['base table'], 'e');
    $select->addField('e', $info['entity keys']['id'], 'entity_id');
    $select->join(
      'feeds_item',
      'fi',
      "e.{$info['entity keys']['id']} = fi.entity_id AND fi.entity_type = '{$this->entityType()}'");
    $select->condition('fi.id', $this->id);
    $select->condition('fi.feed_nid', $source->feed_nid);
    $entities = $select->execute();

As far as I can see it would be enough to select the entity-ids directly from the feeds_item table?
Something like this:

    $info = $this->entityInfo();
    $select = db_select('feeds_item', 'fi')
       ->addField('fi', 'entity_id')
       ->addField('fi', 'entity_type')
       ->condition('fi.id', $this->id)
       ->condition('fi.feed_nid', $source->feed_nid);
    $entities = $select->execute();
mikran’s picture

Issue description is not very descriptive so I'm not sure about issue scope here, but I've added 'block users' action to user processor.

Pol’s picture

I had an idea in my mind today when reading the thread...

What do you think about a system that would unpublish (or delete) an entity if it hasn't been found in the feed more than X times ?
As example, let's say that we configure the importer to unpublish nodes that are not in the feed more than 3 times. It's only when the nodes is not found more than 3 times that she get unpublished.

What do you think about this idea ?

gnucifer’s picture

@Pol It seems like very convoluted logic to me. I can't imagine a situation where this would be useful? It would make more sense to implement this custom behavior in a separate processor extending the node-processor for that specific use case.

gnucifer’s picture

I have a suggestion concerning the variable name $removeList. Since (for example) there is an option to unpublish in the node-processor this name is a bit misleading. Also, as it's initialized to all entities for the current feed the name could cause confusion since the variable name implies they are all to be removed (even if there is a comment describing the variable's intended purpose). I think for example "remaining" is more correct, since it's just state, not necessarily associated with a specific purpose/action.

gnucifer’s picture

Here is the patch updated with the variable name change.

gnucifer’s picture

I myself have a use case where I need to keep nodes no longer found in the feed and flag them in a certain way. I added a feature to trigger an event for the missing items (that can be used to set a value for a field in my case). This rule could in theory also be used to delete/unpublish items, duplicating the functionality in the clean-method, but that would be a very obfuscated way of providing this feature. The current quick and easy way of doing this is much better.

Summit’s picture

Hi @gnucifer,
Could you elaborate more on your solution. Did you make special rule for it?
My usecase are products (entities) which are not in the feed anymore, but are connected to nodes or just in the product list. How can I get some kind of logging those products are not in the feed. And may be which are new in the feed?
Greetings, Martijn

imclean’s picture

Hmm...Let's keep this issue concise.

@pol, that would be a separate feature, perhaps added externally. You could also use the existing expire logic.

@gnuficer, "remaining" is the opposite of what removeList is. Items in removeList are deleted or unpublished, items which are not in the list remain.

I'm not sure about about the rules addition either at this stage. A hook may be more flexible anyway, which could then be used to create a rule somewhere else.

Between this issue and the other one, which was started 4 years ago, it's been a long time getting the feature request this far. The important things to work on right now are:

1. Tidying up comments and code
2. Batch support
3. Tests

gnucifer’s picture

@imclean I was thinking more in terms of "remains to be processed". Could also be called "unprocessed_items", "unprocessed" or some other perhaps better name. "RemoveList" is misleading primarily because items might not be removed at all, but blocked in the case of users, or unpublished if nodes. It's a pretty small detail, but still important since a change to a more appropriate name would make the code more readable.

EDIT: This also applies to the method name "initEntitiesToBeRemoved".

You are correct a hook is probably a better choice, and sure there is more important task to work on. I mostly posted the rules patch because I need that functionality myself, and if someone else does to it's nice to have a patch for it. I will probably rework the code to use a hook instead, and perhaps place the rule-integration in a separate module instead of a conditional invokation.

@Summit An event available in rules is triggered for each item not found in the feed. I'm not 100% sure of what you are trying to do, but you might be able to achieve it by creating a custom rule for the "not found" event.

byronveale’s picture

Just wanted to clarify before attempting a new patch:

- imcelan's patch in #95 needs some minor updating, as outlined in yannickoo's comment #96

- Pol's comment #100 includes an attempt at fixing batch processing

- creating a patch incorporating the above items against the version in Git, and submitting for testing, will help get the unpublish/delete functionality working in the dev version of Feeds

- items in #106, #107, and #108 aren't critical, as the patch in #95 is working (clarification here would be greatly appreciated!)

- items brought up in comments #109 to #117 (and beyond?) aren't directly addressing this issue, and so should be addressed elsewhere

Sound correct?

gnucifer’s picture

@byronveale #117 addresses the patch in this issue, so creating a new issue would be confusing.

@Pol regarding #100. I have thought about the batch-processing, and will later try to elaborate further on the problems implementing this for the "clean" operation, but your approach will not work since batch operations will not be processed if the feed is updated on cron. It's a really hairy issue, and I think it will be difficult to implement batch-processing, at least in a synchronous way.

mikran’s picture

Poking around at this, and we can't batch at this point because of #1363088: Feeds batch process not fully implemented - needed for large imports. So I'm willing to take this as-is. But, it does need a couple of tests.

Quote from twistor, comment #80.

So that leaves us with tests only.

Issue title states nodes; so users, rules, batch etc can follow as follow-ups.

gnucifer’s picture

Regarding batch processing, here are my thoughts:

As far as I understand it feeds will invoke FeedsSource::import() either as a background job though the Job Scheduler module, or using the drupal batch-api.

Job scheduler will run it's taks on cron, calling this worker function:

/**
 * Scheduler callback for importing from a source.
 */
function feeds_source_import($job) {
  $source = feeds_source($job['type'], $job['id']);
  try {
    $source->existing()->import();
  }
  catch (FeedsNotExistingException $e) {
    // Do nothing.
  }
  catch (Exception $e) {
    $source->log('import', $e->getMessage(), array(), WATCHDOG_ERROR);
  }
  $source->scheduleImport();
}

The batch-api-job is set in FeedsSource (or a Job Scheduler-job if the importer in configured to process in the background):

  public function startImport() {
    module_invoke_all('feeds_before_import', $this);
    $config = $this->importer->getConfig();
    if ($config['process_in_background']) {
      $this->startBackgroundJob('import');
    }
    else {
      $this->startBatchAPIJob(t('Importing'), 'import');
    }
  }

If the job is done or not is determined in FeedsSource::progressImporting():

   /**
   * Report progress as float between 0 and 1. 1 = FEEDS_BATCH_COMPLETE.
   */
  public function progressImporting() {
    $fetcher = $this->state(FEEDS_FETCH);
    $parser = $this->state(FEEDS_PARSE);
    if ($fetcher->progress == FEEDS_BATCH_COMPLETE && $parser->progress == FEEDS_BATCH_COMPLETE) {
      return FEEDS_BATCH_COMPLETE;
    }
    // Fetching envelops parsing.
    // @todo: this assumes all fetchers neatly use total. May not be the case.
    $fetcher_fraction = $fetcher->total ? 1.0 / $fetcher->total : 1.0;
    $parser_progress = $parser->progress * $fetcher_fraction;
    $result = $fetcher->progress - $fetcher_fraction + $parser_progress;
    if ($result == FEEDS_BATCH_COMPLETE) {
      return 0.99;
    }
    return $result;
  }

It's used both for the batch-job, and the job-scheduler-job.

If we want to batch the deletion items we probably need to make FeedsSource::progressImporting() take this into account. But that is really difficult to do since we know how many items needs to be deleted/processed first after the import is done. So this needs to be done in another job. In that case we cannot call clean inside of process, since we do not know if we are invoked through cron or a browser, clean must be able to be invoked independent of process or state which makes things more complicated. We cannot use state for storage, but must find another solution.

An alternative option would be to always schedule a background-job in process and passing on the $removeList. There might be other approaches, and I will try to figure out some other way of dealing with this, but to me it seems pretty hard to solve this in a "nice" way.

gnucifer’s picture

@mikran I agree batch processing might not be worth looking at right now. It must be pretty rare for such a large variance among items that deletion will cause a problem. Feeds will probably choke of other reasons before this becomes an issue.

I also had an other idea that will allow us to get rid of the dependence of state. If we assign each import a unique ID, save the latest import id, and mark all items with the current import ID, we can at any time determine which items are missing in the feed by a simple SQL-query:

SELECT * FROM feeds_items WHERE import_id <> $latest_import_id AND feed_nid = $feed_nid AND id = $id

There are some problems with this though. If the import crashes in the middle, it will seem like half of the items where removed. Therefore we must save all processed entity ids during the import and commit those to the database when the import is done. Issues also arises if another Import is runnig at the same time as "garbage collecting". This could perhaps be solved through locking.

But if one can avoid all pitfalls this will allow us to "garbage collect" at any point, making implementing batching/job-scheduling easier.

byronveale’s picture

Status: Needs work » Needs review
FileSize
9.53 KB

I've created a patch against the 7.x-2.x version in git incorporating Pol's work in #100.

I tested in in a local Acquia dev instance, Drupal 7.23, MySQL 5.1.66, PHP 5.3.18, Mac OS X 10.8.4. There are 767 items in my feed (coming from LDAP), and I tested first manually, from the feed's import page, and then using the periodic import option with a cron job.

Seems to be working fine; please try for yourselves.

Any other information needed, do let me know...

gnucifer’s picture

@byronveale Please consider what I wrote in the previous posts about batching. It's working fine because #100 only implements batching for the unpublish action, not delete. I you where to choose the unpublish-option nothing would happen (running on cron).

mikran’s picture

Issue tags: -Needs tests
FileSize
30.63 KB
37.65 KB

Some tests added to patch from comment #95.

byronveale’s picture

@gnucifer: I would consider them, if I actually could; but while I managed to get a patch uploaded (first one), I don't have the chops to really follow what you guys are doing...

And actually my patch is not working fine -- after leaving it to run every fifteen minutes through cron for the day, I have found that it's deleting nodes on me, despite being set to "unpublish", and despite the content not being removed from the feed.

So to try and get the delete/unpublish-items-no-longer-in-the-feed functionality working (the goal of this issue, right? and also what our client needs!), and following on from twistor's and mikran's comments, I'll try mikran's patch from #125 (tests added to imclean's patch from #95). I'll test running manually and with cron, and both unpublishing and deleting expired nodes.

Yes, batching would be great; gnucifer, Pol, perhaps you could direct your efforts towards issue #1363088?

Thanks for everyone's efforts!

twistor’s picture

Assigned: Unassigned » twistor

Need to take a look.

byronveale’s picture

One final (for the moment) question, what about gnucifer's comment #108 -- that wasn't just about batch processing, right? Is that something we should try to incorporate?

Thanks again everyone...

gnucifer’s picture

@byronveale Yes, that comment had nothing to do with batch processing, but what just about the join being potentially unnecessary.

gnucifer’s picture

Issue summary: View changes

issue summary updated

mikran’s picture

Status: Needs review » Needs work
FileSize
89.78 KB

Config form isn't consistent how it's calling things:

nodes_entities.png

yannickoo’s picture

Should be "nodes" because it is the node processor.

arrubiu’s picture

A question, sorry but the thread is very long.
Which is the latest patch that works? I need this feature and I've to apply to the stable version of feeds.

Thanks.

pollegie’s picture

@arrubiu Got it working?

Same thoughts here..

pollegie’s picture

Issue summary: View changes

#2092895 added to the list of related issues

kostajh’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
37.98 KB
1.65 KB

I updated the patch from #125 to address the wording concerns noted in #130. If you are using the Node Processor, you will see "nodes" instead of "entities". I also tried to make the descriptive text in the form a little clearer.

ket qua bong da’s picture

Did someone tested this on D7? Can someone confirm if it works?

It's ok

Summit’s picture

Status: Needs review » Reviewed & tested by the community
zuernBernhard’s picture

when does this patch get merged into the module ?

Pol’s picture

The patch doesn't apply anymore.

Sorry, my mistake.

twistor’s picture

Assigned: twistor » Unassigned
Status: Reviewed & tested by the community » Needs work

It seems an anonymous function got snuck in this patch.

MegaChriz’s picture

+++ b/plugins/FeedsNodeProcessor.inc
@@ -204,6 +206,13 @@ class FeedsNodeProcessor extends FeedsProcessor {
+      array_walk_recursive($form['update_non_existent'], function(&$value, &$key) {
+        $value = str_replace('entities', 'nodes', $value);
+      });

The anonymous function from the patch in #134.

@twistor I think you mean anonymous functions should not be used because they are available since PHP 5.3 and Feeds should work with PHP 5.2.5 (minimal requirement for Drupal 7)?

mikran’s picture

Whatever the case is with PHP requirements that anonymous function added in #134 is wrong solution to issue noted in #130.

MegaChriz’s picture

+++ b/plugins/FeedsProcessor.inc
@@ -598,6 +675,17 @@ abstract class FeedsProcessor extends FeedsPlugin {
+        FEEDS_SKIP_NON_EXISTENT => 'Skip non-existent entities',
+        FEEDS_DELETE_NON_EXISTENT => 'Delete non-existent entities',

It looks like there are some untranslatable strings in the patch as well. Ideally, this should be something like t('Skip non-existent @entity_type_plural_label'), then you don't have to override this string for each entity type.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
5.32 KB
38.59 KB

Re-roll, incorporated feedback from #140, #142.

mikran’s picture

Status: Needs review » Needs work
+++ b/plugins/FeedsNodeProcessor.inc
@@ -185,13 +187,26 @@ class FeedsNodeProcessor extends FeedsProcessor {
+      array_walk_recursive($form['update_non_existent'], array($this, 'replaceEntitiesByNodes'));
...
+  protected function replaceEntitiesByNodes(&$value, &$key) {
+    $value = str_replace('entities', 'nodes', $value);
+  }

This isn't multilingual solution at all, str_replace() won't work here. Whole string needs to be re-entered inside t(), or then some kind of substitution like t('Delete non existent @entities', ...

+++ b/plugins/FeedsProcessor.inc
@@ -290,6 +303,16 @@ abstract class FeedsProcessor extends FeedsPlugin {
+      $messages[] = array(
+       'message' => format_plural(
+          $state->deleted,
+          'Removed @number @entity.',
+          'Removed @number @entities.',
+          array('@number' => $state->deleted) + $tokens
+        ),
+      );

This is good example how those strings could be constructed.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.97 KB
38.16 KB
johnnydarkko’s picture

#145 is working great for me.

One thing to take note of is that when the feed item's entity gets unpublished, the feeds_after_save hook doesn't get invoked. I haven't really dug deep into the patch to see what other hooks don't get invoked, but as the patch stands in #145, only the node api hooks are available to those who need to implement hooks on feeds unpublishing the nodes.

stefan.r’s picture

johnnydarko, on items that are actually included in the feed, where an entity is being being updated/created, this code runs:

        // Allow modules to alter the entity before saving.
        module_invoke_all('feeds_presave', $source, $entity, $item, $entity_id);
        if (module_exists('rules')) {
          rules_invoke_event('feeds_import_'. $source->importer()->id, $entity);
        }

        // Enable modules to skip saving at all.
        if (!empty($entity->feeds_item->skip)) {
          continue;
        }

        // This will throw an exception on failure.
        $this->entitySaveAccess($entity);
        $this->entitySave($entity);

        // Allow modules to perform operations using the saved entity data.
        // $entity contains the updated entity after saving.
        module_invoke_all('feeds_after_save', $source, $entity, $item, $entity_id);

The patch just does a node_save instead. So in terms of other hooks than hook_feeds_after_save, it's also hook_feeds_presave, a rules invocation and a "skip" flag that are not running. Invoking any of those seems like too much of an API change ($item would be NULL and until now it was implied those hooks only ran for existing feed items where a node was being updated/created, so things may break on existing code implementing these hooks).

Maybe a hook_feeds_after_unpublish_non_existent_node() hook could be added if people need the $source FeedsSource object, or if they need to be able to distinguish between nodes that are (a) unpublished because they are not in the feed or (b) unpublished for other reasons. It seems like an edge case though.

tostinni’s picture

Thanks for the work you guys are doing, juste a quick question, is someone taking care of republishing nodes that has been previously unpublished like mentionned in #41 ?

stefan.r’s picture

tostinni, that is actually relevant to the point raised in #147, ie. right now we can't distinguish between nodes that are (a) unpublished because they are not in the feed or (b) unpublished for other reasons.

Perhaps a "non_existing" property on field_item may help solve this problem. It would cost us an extra database field, some extra logic and an update hook :)

That way, if feed items needing to be unpublished upon disappearing but later needing to be republished upon reappearing is this something that happens often enough to warrant #148 going into the feeds module, we can do something like this in FeedsNodeProcessor::entitySave()

if ($this->config['republish_previously_non_existing'] && $entity->feed_item->non_existing == 1 && $entity->status == 0) {
    $entity->status = 1;
}
johnnydarkko’s picture

Thanks for the clarification, stefan.r. That edge case is actually what I ran into immediately and turns out to be just fine implementing hook_node_update() for my particular use case. Additional hooks would definitely be a 'would be nice' feature to consider for future development. Thanks for your work!

stefan.r’s picture

Thanks. What I meant is for most purposes the node API hooks should be enough (as in your case). The edge case would be if anything beyond that were needed. :)

littledynamo’s picture

+1 for #149

This is causing us problems with nodes node not being re-published. We are using this patch as part of a staff import from FileMaker and our website should reflect the status of the staff member in the FileMaker database. It's quite common for people to be de-activated and then re-activated again in FileMaker. So, the staff member would disappear from the website in this scenario. We're handling the re-publishing manually at the minute, but we're relying on people telling us that a staff member is missing from the website, so far from ideal.

Abelito’s picture

Another +1 for #149

I guess it is not really an "edge case".
Similar functionality would be used for the User Processor "user blocking", mentioned in #109

stefan.r’s picture

Edit: this patch is broken because of a typo, please refer to #157 instead.

Status: Needs review » Needs work

The last submitted patch, 154: unpublish-delete-entities-not-in-feed-1470530-153.patch, failed testing.

Abelito’s picture

There is a work-around that may work for some people who have the problem mentioned in #149.
- For Node processor, add a "Published" column to your import file with "1" (meaning it will be published) in all the rows, and map that to the "Published status". (This will explicitly set the node to be published, and therefore will republish an unpublished node. Good for most cases, but bad if some imported items should remain unpublished [like if they were unpublished manually instead of by the feed importer].)
- For User processor, add a "Status" column to your import file with "1" (meaning it will be active) in all the rows, and map that to the "Account status".

Also note, if there are no rows in the import file (just headers), it will NOT set your nodes to "Unpublished", or your users' account status to "Blocked". This could be a problem for some, or a failsafe for others.

stefan.r’s picture

This is an implementation of #149 that differentiates between manually unpublished nodes and nodes that are automatically unpublished for being nonexisting. In the latter case it will republish the node if it shows up again in the feed.

There now is a configurable option to set the action to take when nodes that were previously unpublished for being non existent, show up again in the feed ("republish" or "leave unpublished").

Any nodes that are unpublished automatically will be flagged for republishing as soon as they show up again (unless the option is set to "leave unpublished", in which case the flag will be ignored).

The default is for this option right now is "leave unpublished". Does that seem right?

stefan.r’s picture

Status: Needs work » Needs review
FileSize
535 bytes

Setting to Needs review so testbot will pick patch #157 up

stefan.r’s picture

Seems like that wasn't picked up, reposting to be sure.

Just to clarify, this is exactly the same file as in #157, and #157 was just correcting a typo that broke the patch in #154.

#154 still has the relevant interdiff for the new republishing feature (except the typo).

stefan.r’s picture

Small correction to maintain compatibility with another feeds patch.

stefan.r’s picture

And once more as previous patch was incomplete (sorry for the noise)

jamiroconca’s picture

bdanin’s picture

patch in #161 appears to be working great for me

Honza Pobořil’s picture

Could you add similar feature for User Processor? (optional deactivate or delete)

mikran’s picture

Could you add similar feature for User Processor? (optional deactivate or delete)

I've already posted separate issue for that #2092895: Block users not included in feed.

newtoid’s picture

Patch effective but did miss one bit, here is the terminal output:

patching file feeds.install
Hunk #1 succeeded at 205 (offset -1 lines).
Hunk #2 succeeded at 223 (offset -1 lines).
Hunk #3 succeeded at 674 with fuzz 2 (offset -16 lines).
patching file includes/FeedsSource.inc
Hunk #1 succeeded at 90 (offset -1 lines).
Hunk #2 succeeded at 103 (offset -1 lines).
patching file plugins/FeedsNodeProcessor.inc
Hunk #2 succeeded at 203 (offset 16 lines).
Hunk #3 FAILED at 396.
1 out of 3 hunks FAILED -- saving rejects to file plugins/FeedsNodeProcessor.inc.rej
patching file plugins/FeedsProcessor.inc
Hunk #7 succeeded at 646 (offset -71 lines).
Hunk #8 succeeded at 711 (offset -71 lines).
Hunk #9 succeeded at 869 (offset -71 lines).
patching file tests/feeds/developmentseed_missing.rss2
patching file tests/feeds_processor_node.test
vinmassaro’s picture

@david newton: Patch applies against 7.x-2.x, not 7.x-2.0-alpha8:

/tmp $ git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git
Cloning into 'feeds'...
remote: Counting objects: 10288, done.
remote: Compressing objects: 100% (5700/5700), done.
remote: Total 10288 (delta 7095), reused 5447 (delta 3748)
Receiving objects: 100% (10288/10288), 2.64 MiB | 1.66 MiB/s, done.
Resolving deltas: 100% (7095/7095), done.
Checking connectivity... done.
/tmp $ cd feeds
/tmp/feeds $ patch -p1 < ~/Downloads/feeds-unpublish-delete-entities-not-in-feed-1470530-161.patch
patching file feeds.install
patching file includes/FeedsSource.inc
patching file plugins/FeedsNodeProcessor.inc
patching file plugins/FeedsProcessor.inc
patching file tests/feeds/developmentseed_missing.rss2
patching file tests/feeds_processor_node.test
vinmassaro’s picture

Here is an update to the patch to add consistency to the spelling of "non-existent". I've included an interdiff to show there are no functional changes.

I also tested the patch against a feed that removes items when they are no longer valid, and they were correctly deleted from my site.

vinmassaro’s picture

Status: Needs review » Reviewed & tested by the community
JKingsnorth’s picture

Status: Reviewed & tested by the community » Needs review

The patch will need a couple more positive reviews before it is RTBC

fdefeyter@gmail.com’s picture

To me, it not working: sql error can't find a column that concerns the republishing. Thanks anyway.

stefan.r’s picture

#171: did you run update.php?

LauriKoobas’s picture

#168 worked for me.

mErilainen’s picture

Status: Needs review » Reviewed & tested by the community

This is golden, works like a charm!

dkre’s picture

#168

Beautiful work.

twistor’s picture

Status: Reviewed & tested by the community » Needs work

Where did the republishing feature come from? I haven't read back though the whole issue. Republishing can be controlled simply though whether or not there is a value mapped to the published status.

Either way, it's outside the scope of this feature.

stefan.r’s picture

@twistor - the republishing was asked for in comments #41, #42, #148, #152, #153

If this issue is committed without any republishing feature, I am not sure there is a way to let nodes that were manually unpublished (instead of by the feed importer) remain unpublished? I don't think mapping a value to published solves that...

Even if we commit it without a republishing feature, can you think of a way for us to make the distinction "manually unpublished" (must not be republished) vs "unpublished by the feed importer" (must be republished)

stefan.r’s picture

Status: Needs work » Needs review
FileSize
6.96 KB
40.21 KB

- Republishing feature taken out per maintainer request

- Added $config['update_non_existent'] to FeedsEntityProcessor as it was causing FeedsEntityProcessorTest to fail

stefan.r’s picture

Please can a few people review so this can go back to RTBC?

JKingsnorth’s picture

@twistor: Without meaning to prolong this issue, if the republishing feature was working OK in #168 then why not run with it? Seems like a very useful thing for the reasons mentioned by stefan.r in #177

MegaChriz’s picture

Status: Needs review » Needs work

UI

I tested the patch from #178 and it works great! This were the scenarios that I tested:

  • Nodes (Node processor):
    • Update existing nodes, unpublish non-existent
    • Do not update existing nodes, unpublish non-existent
    • Update existing nodes, delete non-existent
    • Do not update existing nodes, delete non-existent
    • Update existing nodes, skip non-existent
    • Do not update existing nodes, skip non-existent
  • Entities (Generic entity processor):
    • Update existing entities, delete non-existent
    • Do not update existing entities, delete non-existent
    • Update existing entities, skip non-existent
    • Do not update existing entities, skip non-existent

All scenarios were tested with File upload fetcher, CSV parser and with "Skip hash check" on.
One thing I noticed was that when importing a "blank" CSV file (with a header, but no rows), no nodes/entities got deleted. I think this behaviour is good as I think that one usually does not have the intention to import a blank source.

Code

I found a few minor issues in the code. Most is about coding standards. One is about a potential, but minor security issue: in the method initEntitiesToBeRemoved() from the class FeedsProcessor some variables are unescaped passed to a database query.

I also noticed that while tests for unpublish/delete nodes were added, there is no equivalent test for the generic entity processor. I think this should be added too.

  1. +++ b/includes/FeedsSource.inc
    @@ -87,10 +87,16 @@ class FeedsState {
    +   public $removeList;
    

    Extra whitespace before "public".

  2. +++ b/plugins/FeedsNodeProcessor.inc
    @@ -1,5 +1,7 @@
    +define('FEEDS_UNPUBLISH_NON_EXISTENT', 'unpublish');
    

    No doc block.

  3. +++ b/plugins/FeedsNodeProcessor.inc
    @@ -370,4 +376,29 @@ class FeedsNodeProcessor extends FeedsProcessor {
    +    // Delegate to parent if not unpublishing or option not set
    

    Missing period.

  4. +++ b/plugins/FeedsNodeProcessor.inc
    @@ -370,4 +376,29 @@ class FeedsNodeProcessor extends FeedsProcessor {
    +   * @param FeedsState $state
    
    +++ b/plugins/FeedsProcessor.inc
    @@ -313,6 +346,59 @@ abstract class FeedsProcessor extends FeedsPlugin {
    +   * @param FeedsState $state
    ...
    +   * @param FeedsState $state
    

    Missing comments for param "$state".

  5. +++ b/plugins/FeedsProcessor.inc
    @@ -190,17 +200,17 @@ abstract class FeedsProcessor extends FeedsPlugin {
    -      // Do not proceed if the item exists, has not changed, and we're not
    -      // forcing the update.
    ...
    +        // Do not proceed if the item exists, has not changed, we're not
    +        // forcing the update, and was not previously unpublished
    

    This change sounds like it has something to do with the now removed "republish" feature.

  6. +++ b/plugins/FeedsProcessor.inc
    @@ -313,6 +346,59 @@ abstract class FeedsProcessor extends FeedsPlugin {
    +      "e.{$info['entity keys']['id']} = fi.entity_id AND fi.entity_type = '{$this->entityType()}'");
    

    I think $info['entity keys']['id'] and $this->entityType() should be escaped with db_escape_field(). See the method FeedsProcessor::entityLoad() where escaping is also done.

Summary

Tested the patch from #178.

  1. Patch works great! :) Previously imported nodes/entities get deleted/unpublished when they are not in the source with the the appropriate setting enabled.
  2. When source is empty, no nodes/entities are deleted/unpublished, but I think this behaviour is good.
  3. Potential security issue: non-escaped variable passed to database query.
  4. Test for generic entity processor is missing.
  5. Minor code style issues.

Setting to "Needs work" because of point #3 and #4.

a.vakulenko’s picture

#181 nice job, we are getting closer to releasing this feature!

jenlampton’s picture

Also tested the patch in #128, as I hoped the batching in this patch would help with #1363088: Feeds batch process not fully implemented - needed for large imports

Unfortunately, this patch doesn't work with the iCal importer, specifically all the repeating dates have been lost.

MegaChriz’s picture

@jenlampton
Thanks for testing! I don't get what you mean with "tested the patch in #128" because there is no patch in #128 in this issue. Did you test the latest patch, from #178?

Unfortunately, this patch doesn't work with the iCal importer, specifically all the repeating dates have been lost.

This sounds like this is caused by other recent changes in Feeds, specifically the change from #2093651: Simplify target callbacks.. Have you tested with the latest dev of Feeds with and without the patch from #178 applied and only experienced this issue with the patch (from #178) applied? If so, can you provide the steps to reproduce the issue with the iCal importer or - even better - point out which piece of the patch in #178 causes the fact that "repeating dates" get lost? By the way, does "repeating dates have been lost" means that some values get cleared out on import or does it mean something else?

@stefan.r
Are you willing to provide a new patch that addresses the issues noted in #181?

manuelBS’s picture

Nice patch, it also works for me, thanks a lot!

Status: Needs work » Needs review
Charles Belov’s picture

I'm trying to wrap my head around the scope of this change, which would be a welcome one. I have some questions, speaking as someone who has not yet tried the patch and wants to know what to expect before I do:

1. Does the patch for this issue work with nodes that existed prior to the patch for this issue being implemented?

2. If so, how does the patch determine which nodes to delete when they don't have corresponding items in the first feed to run after implementation?

3. If not, do I need to delete all of the pre-existing nodes that were created by a particular feed immediately prior to the first time the patch runs for that feed?

4. If not, is there a reasonable way to detect which pre-existing nodes are no longer relevant, so that I can do a one-time clean-up?

5. How does this patch interact with the Update Existing Nodes setting?

MegaChriz’s picture

Status: Needs review » Needs work

@Charles Belov

  1. Yes, there were no database updates with the patch, so if you adjust your settings of your importer after applying the patch, the "unpublish/delete" setting will also be active on items that were imported prior the existence of the patch.
  2. Feeds keep track of all items that were imported in the feeds_item table. If you do a new import with the "unpublish/delete" option enabled, all tracked items for that particular importer that are no longer in the source will be deleted or unpublished.
  3. No, you don't need to delete pre-existing items for this patch to work.
  4. You can check feeds_item table.
  5. The settings "unpublish/delete" and "update existing" work in parallel. If the setting "unpublish/delete" is enabled, but the "update existing" is not, items not in the source will still be unpublished/deleted. These settings don't influence each other.

As per #181, setting this issue back to "Needs work".

dpico’s picture

Nice patch, thank you very much! I need something like this for a Drupal Commerce site with a large number of products, so site owners can manage products through Excel sheets and removed products from the Excel sheets are removed from the site.

Do you think the patch would work with Commerce Feeds? I couldn't test it yet.

barryvdh’s picture

So how much work is needed to implement this? I'm also looking for this feature.

MegaChriz’s picture

@Barryvdh

  1. Variables need to be escaped before they are passed to a database query.
  2. A test must be added for the generic entity processor: the test should assert that entities that are no longer in the source are deleted and that entities that are still in the source remain.
  3. There are some minor code style issues that need to be fixed.

See #181 for details.

kristygislason’s picture

This patch works great for me and I am already using for nodes. Thank you!

When I use this patch with user entities, I am only given one option - to delete users not included in the list. It would be helpful if there was also the option to set the user to "blocked" rather than deleting the accounts (Similar to unpublishing nodes vs. deleting nodes).

Also, any content the user created is deleted when the user is deleted. It would also be useful if when the user was removed from the imported list, the account is deleted and any associated content respects the user account settings that are set here 'admin/config/people/accounts'. It currently deletes the associated content regardless of the setting below:

"When cancelling a user account:

  • Disable the account and keep its content.
  • Disable the account and unpublish its content.
  • Delete the account and make its content belong to the Anonymous user.
  • Delete the account and its content."
imclean’s picture

@kgislason, there's a separate issue for blocking users: #2092895: Block users not included in feed

MrDaleSmith’s picture

Using this patch on a site at the moment, and whilst it adds republish_when_exists as a key in the schema defined in feeds.install, it doesn't also add the republish_when_exists field to the schema. If you reinstall the module or add a new subsite using the module, you get a fatal error when trying to create the DB tables.

rogerpfaff’s picture

The implementation of republishing content that is again available in the feed is greatly appreciated. :)

stefan.r’s picture

Status: Needs work » Needs review
FileSize
40.43 KB
2.48 KB

,

imclean’s picture

@stefan.r, what works for node only at the moment? It should work for unpublishing nodes, deleting nodes and deleting other entities. And re-publishing etc.

Status: Needs review » Needs work

JKingsnorth’s picture

@imclean Looks like the republishing feature was removed in #178?

stefan.r’s picture

Status: Needs work » Needs review
FileSize
426 bytes
40.44 KB

Removed stray quote

stefan.r’s picture

Added entity processor tests.

@MegaChriz, look good now?

MegaChriz’s picture

Looks very good. I did the same manual tests as in #181 to ensure the feature still worked as intended and noticed nothing different, which is good.

I found a few minor things in the code, such as that the constant "FEEDS_UNPUBLISH_NON_EXISTENT" was defined before the @file docs (which I apparently didn't noticed before). I fixed this one and changed a few code comments in the attached patch.

Looks ready to me. Thanks for your hard work, stefan.r! :)

stefan.r’s picture

Status: Needs review » Reviewed & tested by the community

Thanks!

mark_schaal’s picture

Has this been adopted into the dev branch, or is it still a patch?

stefan.r’s picture

We are still waiting for this to be committed, the issue would be marked fixed otherwise :)

mark_schaal’s picture

Touche, I just now saw the status [sheepish] Thanks for your hard work stefan!

AndrackOSnack’s picture

Hi,

thank you so much for your work stefan.r. But there is still one little problem on my instance (maybe only me, heh ?).

I have a feed importer that will check a distant folder via an API request. Long story short : If the xml that I retrieve is empty but my importer contains nodes, it will bypass the suppression and just tell me that "there is no new node". I've worked around this by commenting this (quick&dirty, I need this asap :)) :

    // if (!isset($state->removeList) && $parser_result->items) {
      $this->initEntitiesToBeRemoved($source, $state);
    // }

It appears that the $state->removeList is empty, resulting that initEntitiesToBeRemoved is never called.

If I understood well, the node contained in my feed importer will be erased by the clean method.
One again, I don't know if this will only apply on me or anyone else but, you know, just to be sure :)

stefan.r’s picture

I am not sure I understand. If you receive an empty feed, you want all items to be deleted? In that case you'd want:

 if (!isset($state->removeList)) {
      $this->initEntitiesToBeRemoved($source, $state);
    }

Maybe when this patch is committed we can turn that into an option, in a separate issue?

MegaChriz’s picture

@AndrackOSnack
I noticed back in #181 too that no items are deleted/unpublished when the source is completely empty, but I would say this is intentional. I think that it is rare to be wanting to import a blank source. For example, it could happen that something went wrong during fetching the source, ultimately resulting in a blank source. You don't want all items to be deleted in such case.

  • twistor committed 3e2bbd1 on 7.x-2.x authored by stefan.r
    Issue #1470530 by stefan.r, GaëlG, mikran, Cottser, gnucifer, MegaChriz...
twistor’s picture

Status: Reviewed & tested by the community » Fixed

Congratulations everybody! That was a long time coming.

JKingsnorth’s picture

Fantastic! Thanks especially to stefan.r.

Should we now open a follow-up issue for the re-publishing feature that was removed in comment #178?

yannickoo’s picture

Yeah after 2,5 years this issue is finally fixed <3 Thanks to everyone who made this possible!

mxwright’s picture

This is great news. Congrats to everyone!

manuelBS’s picture

Yeah thats really great news for the weekend ;-) Thaks to everybody who helped getting this done.

bendev’s picture

This is very good news.
I am looking forward to test this.

Just for my understanding, what is the best approach ? use the .dev version or wait for the next alpha release (Any idea when this would be available) ? Or create a patch for the current alpha8 version ?

thanks anyway for the good work!

Pol’s picture

@bendev: it will be ready when it's ready :)

For the moment, the best is to use the dev version which is quite stable already.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

jenlampton’s picture

okay, so I see that this is in now, but thought I'd post my results here anyway.
I'm now running 7.x-2.x-dev https://www.drupal.org/node/927032

I updated the module and ran drush updb.
I then deleted all the items that are imported with feeds (in my case, calendar events).

Then I went back to run the import, and in the UI I see the system crank for about 2 minutes, then I get this error:
The website encountered an unexpected error. Please try again later.

In the error log, I get this error:
Recoverable fatal error: Object of class FeedsSource could not be converted to string in DatabaseStatementBase->execute() (line 2171 of drupal-7/includes/database/database.inc).

Anyway, I'm still suffering from the problem over at #1363088: Feeds batch process not fully implemented - needed for large imports
I have a site I'm supporting that needs it feeds_process_limit increased by 200 each quarter, so that the import will complete on cron as they add more data. I'd love to see a suggestion / recommendation on how to solve that problem over there, if anyone has one.

JKingsnorth’s picture

Hi jenlampton, it's probably best to create a new issue as a bug report for this.

At over 220 comments I think it's best we put this issue to bed, since the feature request is now in the module.

Shane Birley’s picture

Since this could be used to delete users (entities) -- is UID1 protected if it isn't included in a CSV?

JKingsnorth’s picture

Hi Shane, can you open a new support request (or bug if you can demonstrate the issue) for this, it may not get picked up on the end of a closed issue. Thanks.

andrea.brogi’s picture

Great,
thanks to everybody.
Using dev version the new option "Delete non-existent nodi" is working correctly.

jenlampton’s picture

Tabestan’s picture

I need the patch for 6.x (yes I know). Where can I find it? All of these are for D7.

ucscholar’s picture

Would I have to apply the patch from #204 since I'm using feeds 7.x-2.0-alpha9. I'm unsure because I can't find this commit in the CHANGELOG.txt nor release notes

JKingsnorth’s picture

From the release notes for the latest alpha: ' This release only contains security fixes, no additional bug fixes or features.'
So yes, the patch needs to be applied, or use the latest dev release.

shaneod’s picture

Looks like this has been added to the latest release feeds 7.x-2.0-beta1

https://www.drupal.org/node/2531428

Thanks and well done to everyone!

rudetrue’s picture

This functionality isn't working for me in 7.x-2.0-beta1. Can anyone else confirm?

rudetrue’s picture

I found out I'm only having issues when the feed is empty. I've created a new issue: https://www.drupal.org/node/2553331

andreboc’s picture

Anyone with a fix to the unpublish nodes with the 7.x-2.0-beta1 ???
Mine isn't working as well @rudetrue.

alinouman’s picture

Yes, not working with 7.x-2.0-beta1. Does any one able to make it with 7.x-2.0-beta1 ?

MegaChriz’s picture

This feature is backed up with automated tests and should still be working (else the tests would report a failure). If it is not, please open a new issue with steps provided to reproduce the issue. The only case were unpublishing or deleting items does not happen is when the source is completely empty. This is by design as in most cases you don't want to wipe out everything when an empty source is provided. There is an issue open with a request to make that behaviour configurable: #2333667: Add an option to delete all nodes when the feed source is empty.

alinouman’s picture

@MegaChriz, Sorry I was using this Commerce Feeds https://www.drupal.org/node/2270797 and it was using above patch but not working with Feeds beta now. So this issue is related to Commerce feeds. Feeds 7.x-2.0-beta1 is fine i have tested it. Thanks.

fuerst’s picture

For anyone who still uses Feeds 6.x - attached patch is an updated version of #19 which applies to Feeds 6.x-1.0-beta13.

nielsvoo’s picture

After trying for ours and ours using all kinds off approaches i need some help with this.

This is my goal:

After importing a feed containing users, i want to block all the accounts in the site that are missing in the imported feed. Here i don't want to make any difference between previous imported or not previous imported.

In my case users are created not by the feed itself but by users opening the site having ldap connections.

Thanks in advance
Nielsvoo

dmitrii’s picture

@fuerst
I created a D6 sandbox module based on your patch.
https://www.drupal.org/sandbox/dmitrii/2787943