Updated: Comment #120
Problem/Motivation
Feeds importer adds nodes but there is no way to get rid of nodes that used to be available from importer source but aren't available anymore.
Proposed resolution
Keep list of all nodes imported by importer and optionally unpublish those that were not updated.
Remaining tasks
- Tests
User interface changes
Added 'Update entities missing in the feed' radio element to processor configuration from.
API changes
New clean() method added to FeedsProcessor
Related Issues
- #2092895: Block users not included in feed
- #1240366: Unpublish/Delete nodes not included in feed
- #1363088: Feeds batch process not fully implemented - needed for large imports
Original report by PsycleInteractive
I have re-created the patch for this to work with 6.x-1.0-beta11, this comes from another issue see http://drupal.org/node/1240366.
I have tried to clean up some of the code styling issues that the original patch had, and will be working on a D7 version when i get some time to do it.
Comments
Comment #1
arrubiu CreditAttribution: arrubiu commentedSubscribe :)
Comment #2
riho CreditAttribution: riho commentedI ported your code to 7.x-2x-dev version, but it might need some fine tuning here and there.
Comment #3
katherinecory CreditAttribution: katherinecory commentedI've patched my copy of the latest 7x dev version, re-imported my content and I received no errors. As far as I know the feed didn't change so no nodes were deleted.
Comment #4
Psycle Interactive CreditAttribution: Psycle Interactive commentedThanks riho, have not had the time to create to port myself. I will do a test with your patch, just to clarify that the delete is working based on the comment from katherinecory.
Comment #5
yannickooI can confirm that the patch for drupal 6 works. Now any drupal 7 user should review the path from #2.
Comment #6
yannickooStrange behavior... I tested the module with the patch included today and saw strange behavior.
Why does the module removes the nodes after importing them first time?!
Comment #7
yannickooAh okay, I founded the problem: When you select "Remove non existant nodes" and import a feed, all new feed items will be deleted.
Anyone know how to fix that?
Comment #8
riho CreditAttribution: riho commentedDoes that happen in the 6.x-1.0-beta11 version or 7.x-2.x-dev or both?
Also discovered that in case of "cURL error (52) Empty reply from server" error the patch deletes all nodes. Working on a fix for that in 7.x-2.x-dev version.
Comment #9
yannickooI didn't test the 7.x-2.x-dev version but this problem happens in the 6.x-1.0-beta11 and I don't know how to fix that.
Comment #10
balintk CreditAttribution: balintk commentedThe patch in #2 works great.
Comment #11
DannyPfeiffer CreditAttribution: DannyPfeiffer commentedI can confirm the deletion of new nodes problem in the patch for 6.x-1.0-beta11 version.
The problem is that newly imported nodes are *not* present in the array
$importer->processor->processed_nids
but *are* present in the array$importer->processor->getFeedNids($source)
That triggers them for deletion, which is incorrect. Unfortunately I'm not familiar enough with feeds to identify an immediate solution - anyone?
Comment #12
alCHE CreditAttribution: alCHE commentedDid someone tested this on D7? Can someone confirm if it works?
Comment #13
aweizd CreditAttribution: aweizd commentedThe patch #2 work with 7.x-2.0-alpha5. I've just tested it for a couple of minutes, but it seems to work. No problems so far. Hope this ends up in the modules code soon.
Comment #14
bsevere CreditAttribution: bsevere commentedI have installed the patch from #2 on version 7.x-2.0-alpha5. So far all processor settings are working as expected. I tried to recreate the problem mentioned in #6 and #7 but could not.
This patch solves a big problem for me. Thanks!
Comment #15
yannickooHey smoobv, thanks for testing but my problem appeared in the drupal 6 version.
Comment #16
alCHE CreditAttribution: alCHE commentedThank you very much for this patch!
It would be great to add it into module code.
Comment #17
yannickooSo we should the issue status as RTBC but we need to fix this for drupal 6 first.
Comment #18
aweizd CreditAttribution: aweizd commentedAfter a couple days of testing I did find a strange behaviour in 7.
My .csv has 153 entries. When I import it, feeds only fetches the first 50 items and display "Importing... 33%". This doesn't change until the next cron run. So I run cron and go back to the import page, now it says "Importing... 64%", so the next 50 nodes where processed. Next cron run I get "Importing... 98%" and another 50 items processed. So far so good, I have 150 nodes imported. Now this is where the "magic" happens. I still have 3 unimported items in my csv. I run cron, it fetches and imports the 3 entries, but at the same time deletes the 150 items imported earlier as if they weren't in the csv at all.
I'll try increasing the import limit for feeds, so it fethces the entire csv file not just 50 item chunks. Let you know if it helps.
EDIT: So I increased my import limit to 200 and feeds imports the entire file at once. I also have the feed importer setup to import every hour.
When I import via the import form everything works fine, but when the import is trigger by cron then every entry in deleted. It seems patch #2 doesn't get a proper list of nids when trigger by cron, not the import form. I don't know much about the feeds API, so it might take a while for me to figure this out. And help would be greatly appreciated.
Comment #19
dbassendine CreditAttribution: dbassendine commentedI've worked on the "deleting new nodes" behaviour described in #7 and #11.
This patch is against 6.x-1.x-dev, and makes a couple of changes:
* In FeedsNodeProcessor.inc, getFeedNids - I have added an extra condition for where the feed being used has an "id"
(ie. import form) rather than "feed_nid" (ie. a feed node)
* In feeds.module - I added a hook_feeds_after_parse to pull out the existing feed nodes *before the import takes place. Previously, this list of pre-existing nodes was drawn from the database *after* import had occurred, by which time the database had already been updated
* In feeds.module, hook_feeds_after_import - I made the logic for forming the list of nodes to remove clearer. I run through the "pre-existing nodes" list and if the nid isn't in the "incoming nodes" list, then I mark it for removal (or unpublish).
It would be great if people could test this. I have have tested for: new rows added, rows deleted, and a combination of both. I have tested using the import form but not using a feed node, although there shouldn't be any difference.
Thanks, David
Comment #20
celstonvml CreditAttribution: celstonvml commentedPatch from post #2 works just fine for me against version 7.x-2.0-alpha5.
Comment #21
DannyPfeiffer CreditAttribution: DannyPfeiffer commentedDavid,
This seems to resolve the problem I was having with the earlier patch. Thanks so much for your work ! I'll keep on eye on this for a while, but it seems to have done the trick.
-danny
Comment #22
stylesuxx CreditAttribution: stylesuxx commentedPatch from #2 works great. I would also love to see this implemented.
Comment #23
yannickooIt seems like the patch form comment #2 works fine for the current 7.x-2.0-alpha5 version and the patch from #19 for 6.x-1.x-dev.
Let's mark this as RTBC.
Comment #24
bjalford CreditAttribution: bjalford commentedTested #2 as well and works as expected.
Comment #25
twistor CreditAttribution: twistor commentedThis will need to go into 7.x first. It will need tests.
It is a commonly requested feature that I would happily accept. I would appreciate it if someone could consolidate the various issues into one. Whichever one is furthest along.
Comment #26
clifmo CreditAttribution: clifmo commentedI had to split patch from #2 into separate patches for the files in different directories, for whatever reason they couldn't be found on their own.
I get the options to skip, unpublish, or delete nodes missing from the feed, but adding a dummy node and then running the import does not alter the extraneous node. Using 7.x-2.0-alpha5
Comment #27
jos_s CreditAttribution: jos_s commentedI used the patched from #2 and on 7.x-2.0-alpha5 and then got this error for any page I tried to retrieve:
PHP Fatal error: Cannot redeclare feeds_feeds_after_import() (previously declared in /sites/all/modules/feeds/feeds.module:763) in /sites/all/modules/feeds/feeds.module on line 893
Should it work on 7.x-2.0-alpha5 at all? Or should I try the dev branch?
Comment #28
steven.be CreditAttribution: steven.be commentedHi,
Will this work on 7.x-2.0-alpha6 as well?
Thanks,
Comment #29
filip.hoeven CreditAttribution: filip.hoeven commentedApplied the changes from the patch in #2 to "7.x-2.0-alpha7" of "feeds" and it works perfectly.
Please note that I did these changes manually by looking at the patch (because I'm not familiar with how to do this automatically and it didn't work for me in Eclipse).
I noticed one difference where the original source has changed where the extra code should be placed inside "FeedsProcessor.inc":
As I read a lot of people saying it works, it hope it is added to the standard feeds code very soon.
Comment #30
GaëlG@filip.hoeven A patch against latest 7.x-dev version is needed for this to get included in the next Feeds release. Can you submit it?
See http://drupal.org/project/feeds/git-instructions for more info on how to create patches.
Comment #31
Erik.Johansson CreditAttribution: Erik.Johansson commentedI have run in to some issues while testing the patch from #2. When I start a import with "Remove non existant nodes" option all my nodes gets deleted. When all nodes have been deleted feeds will start to import them all over again. I have multiple importers that triggers in a specific order so my scenario is quite special.
Anyone else that have noticed this behavior?
Comment #32
GaëlGHere's the patch according to #29 instructions. Works for me.
Comment #33
nrambeck CreditAttribution: nrambeck commentedPatch #32 works for me, but created a few PHP notices. I've re-rolled the patch to define the 3 new CONSTANTS, remove some debug code and standardize code formatting.
Comment #34
GaëlGActually I face the same problem as Erik.Johansson. I suppose #19 is the answer.
Comment #35
dbassendine CreditAttribution: dbassendine commentedI've been testing my patch #19 for 6.x-dev quite extensively for some client work against quite some large imports (~3000 rows), and this has now moved into production. Thanks everyone else for testing, glad it's working for you too.
If I find some time, I'll check if the changes I made in 6.x can be ported over to 7.x too, as the bug reports for 7.x look quite similar.
Comment #36
GaëlGI don't know for D6 but in D7 the global variable $existingNodes will be lost between batches. I tried to create a new patch from scratch, using some pieces of different previous patches and the logic of #19, i.e. store a list of previously existing entities before actual import.
It works for any entity types, not just nodes, and delete not found items. Unpublish is something specific to node entity type, I'll post a new patch soon including an overwrite of FeedsProcessor::clean() in NodeProcessor.
I tested it in different use cases and it seems to work well.
Comment #37
GaëlGAs promised, this one includes unpublish feature for nodes. Untested but should work.
Comment #38
FreeFox CreditAttribution: FreeFox commentedI tested the patch in #37 against current dev. Works great !!
Many thanks to all contributors.
Comment #39
kingandy CreditAttribution: kingandy commentedLikewise, tested #37 against current dev; applies smoothly, intended behaviour appears to be implemented (our nodes are now unpublishing themselves).
Comment #40
yannickooOkay I think that is enough for RTBC. After the patch is applied we have to do the 6.x stuff.
Comment #41
kingandy CreditAttribution: kingandy commentedSince installing the patch in #37 I'm occasionally experiencing items being imported as unpublished - I suspect these are items with a GUID matching items which were previously imported, then absent from the import file (therefore unpublished), and then reintroduced.
It's probably an edge case, and in any event I'm not sure this is something that it would be Feeds' responsibility to do anything about. Certainly it makes sense that when "updating" an existing node, Feeds would not touch the published status unless explicitly instructed to ... it's just ever-so-slightly counter-intuitive to have things automatically unpublished when absent, but not republished when found.
Possibly there could be another setting to control this, but it may be best to just leave it up to admins to set up a field mapping.
Comment #42
joeysantiago CreditAttribution: joeysantiago commentedI agree with #41. It would be nice to have a feature like republishing nodes. But, kingandy, what do you mean by "leave it up to admins to set up a field mapping"? is there a workaround by setting some field mapping in order to publish nodes back?
Plus, i have a problem. Sometimes i can't reach my feed, so it marks all of the nodes to be unpublished (i guess its assumption is "no feed" = "no nodes" = "unpublish all"). It would be instead cool to catch the error and don't do anything in this case :)
I am using old version (7.x-2.0-alpha5), so it's quite useless to create a patch since the actual version changed quite a lot i just found out. I patched my code for the "unreachable feed bug" by adding some lines to function feeds_feeds_after_import(FeedsSource $source). The lines are:
I hope i'll have time to create a patch for the actual version soon.
Comment #43
kingandy CreditAttribution: kingandy commentedI was thinking that since Published status is available as a mapping destination for nodes, it should be possible to force updated items to be published. Looks like it might not be as straightforward as I thought at first, though. It might still be possible with Feeds tamper or similar, but I'm definitely leaning towards the notion of including "publish on import" as an option in the node processor.
Comment #44
stuwat CreditAttribution: stuwat commentedI haven't road-tested it extensively, but patch #37 applied to 7.x-2.0-alpha7, appears to successfully delete nodes from the destination that no longer exist in the source feed. Thanks.
Comment #45
janchojnacki CreditAttribution: janchojnacki commentedIt failed to apply patch #37 for 7.x-2.0-alpha7
Comment #46
yannickooUse the development version for applying the patches, not the current stable release...
Comment #47
twistor CreditAttribution: twistor commentedParts of this need to be moved into FeedsProcessor so that it is more generic.
Also, and this is a biggie, the delete operation needs to batch. The current implementation will not scale.
Comment #48
nedjoMarked #1240366: Unpublish/Delete nodes not included in feed and #661314: "Sync" or "cache" mode as duplicates.
Comment #49
dudde CreditAttribution: dudde commentedI applied the patch #38 on 7.x-2.0-alpha7 and it's working. Thank you!
Comment #50
star-szrThis still needs tests as well, per #25.
@twistor/#47 - what parts need to be moved into FeedsProcessor? At a glance, the only code added to FeedsNodeProcessor is the unpublish functionality.
Edit: Looking at it again the FeedsNodeProcessor perhaps should only handle unpublishing, and hand the rest off to the parent method.
Comment #51
star-szrComment #52
star-szrFirst of all, rerolled to fix conflict with #1711648: Abstract bundle handling..
Comment #53
star-szr…and corrected spelling of 'existant' to 'existent' throughout.
Comment #54
star-szrAnd regardless of the test results, back to 'needs work'.
Comment #55
cdmo CreditAttribution: cdmo commentedI tested #53 on 7.x 2.x dev just now in a sandbox and it worked.
Comment #56
abdul.muiz.meer CreditAttribution: abdul.muiz.meer commentedThis patch looks to work fine but actually what's happening is, it creates all the nodes again and delete all previous nodes created by feed, which removes all nodes which are not present in CSV but on every import changes the nid, which is not good.
I think this is because the following check is not working in "FeedsProcessor.inc" and function "public function process(FeedsSource $source, FeedsParserResult $parser_result)".
Comment #57
abdul.muiz.meer CreditAttribution: abdul.muiz.meer commentedComment #58
yannickooCSV? Do you mean in the Git repository?
Comment #59
cdmo CreditAttribution: cdmo commentedI think #56 is referring to a feed where the source is a comma separated values document.
Comment #60
yannickooOh, confused CSV with CVS :D
Comment #61
gordon CreditAttribution: gordon commentedI have created a contrib module which will do this. see http://drupal.org/node/1939256
There are 3 patches left to get into feeds, and I will make it a real project.
How ever one of the patches will not get into feeds, so I am investigating ways to get around this.
Comment #62
abdul.muiz.meer CreditAttribution: abdul.muiz.meer commentedThis patch works fine with "user processor" but making one of the field as unique is neccessary for that.
Problem mentioned in #56 is in "node processor" only.
Comment #63
Mithrandir CreditAttribution: Mithrandir commentedI'll try the patch from #53, but will also +1 on the notion that if the feed retrieval fails or is empty, no nodes should be unpublished or deleted. I'll see if I can create a patch for this, but wouldn't mind if someone beat me to it :)
Comment #64
Mithrandir CreditAttribution: Mithrandir commented7.x-2.x-dev as of today April 25th, the patch from #53 cannot be applied directly, since it will not update plugins/FeedsProcessor.php
I'll make a manual merge and re-package a patch. I also think I have found a solution to the "Only remove nodes, if there are items in the feed". As I see it is just a matter of checking the count of items in the parser result before marking nodes for removal - no solution to the "Fetch in several requests" challenge, though.
Comments?
Comment #65
Mithrandir CreditAttribution: Mithrandir commentedAnd here's the combined patch of #53 applied to the current 7.x-2.x with my above check for items in the result before marking entities to be removed.
Comment #66
DamienMcKennaComment #67
aleada CreditAttribution: aleada commentedIs included the #65 patch inside the latest 7.2 branch?
If not when it will be part of the latest released version?
Thanks
Comment #68
aleada CreditAttribution: aleada commentedI tried to apply patch #65 to 7x-2.0-alpha8 and the output is the following
What I am doing wrong?
Comment #69
star-szr@amatech - you should apply patches against the latest dev version.
Comment #70
yannickoo"latest dev version" – I would clone the git repository. That is the best way.
git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git
Comment #71
dgastudio CreditAttribution: dgastudio commentedhttp://clip2net.com/clip/m207331/1367771527-clip-7kb.png?nocache=1
Comment #72
yannickooOh, whitespaces.
Whitespaces again
Comment #73
Mithrandir CreditAttribution: Mithrandir commentedSeriously? No comments to logic or working code or anything, because of two whitespaces?
Comment #74
yannickooSorry but I had no time to test it and I mark that as needs work because with these two whitespaces the patch cannot get committed :/
Comment #75
yannickooA small thing which I saw right now:
Why do you check for greater than zero? Why not only
count($parser_result->items)
?Comment #76
Mithrandir CreditAttribution: Mithrandir commentedIt's fair to mention the whitespaces, I just felt "They won't even LOOK at the code or test it, because the automated test showed two non-blocking issues", which was a little downing :) especially considering that I just wanted feedback on the logic.
I think it is an old habit of mine to check for greater than zero when using count. Now that I think about it, I can't place where it comes from... perhaps it is just that when I "read the code aloud" in my head, I feel that count($something) should BE something and to highlight that it is a numeric value. Perhaps it is a little too explicit for what is necessary.
Comment #77
twistor CreditAttribution: twistor commentedhe, you don't need the count at all.
More importantly, this needs to batch the deletes. Also, isn't there a use-case for importing an empty feed and having all the items unpublished/deleted?
[Fixed typo]
Comment #78
Mithrandir CreditAttribution: Mithrandir commentedAm I reading the typo correctly? Should it be "isn't there a use-case for importing an empty feed" ?
In that case, I guess there would be a possible use-case for removing all published items... but if so, we'll need very clear way of separating faulty feeds from empty feeds - I see quite often that a faulty feed is correct XML and all, but just a different structure with an error message in it... I don't know if there is an easy way to detect that with the usual methods of parsing (XPath etc.) when you don't know the error structure - I very rarely know how a faulty feed from a partner looks, until it happens, so I would quite often not be able to write e.g. an XPath statement to fetch an error message.
Comment #79
DamienMcKennaReading through the patch I agree with the current option but would recommend improving the option's description to note that selecting the 'delete' option could have unexpected results, that unpublishing would be much safer. I would suggest that the system should fail if the import file is blank, there's no way of the system knowing if it's blank on purpose or by accident, and the last thing you want to do is wipe out records because of a mistake elsewhere.
Comment #80
twistor CreditAttribution: twistor commentedPoking around at this, and we can't batch at this point because of #1363088: Feeds batch process not fully implemented - needed for large imports. So I'm willing to take this as-is. But, it does need a couple of tests.
to_be_removed should be removeList or something camelCase.
Missing t().
Overrides*
Overrides FeedsProcessor::clean(). Should be the first line.
The comment should come after.
Unused.
This should check for the unpublish setting, and if it's not there, pass to the parent::clean() method.
Also, the nodes should be bulk loaded with node_load_multiple().
No need to the count(). And empty array will be falsy.
This needs to be cleaned up. I don't know what pooling is. Also, needs the method needs type checking.
Missing type in arguments.
Deletes* and a newline before the todo.
Argument type and unused entityInfo().
Comment #81
hyperidler CreditAttribution: hyperidler commentedSubscribing
Comment #82
hyperidler CreditAttribution: hyperidler commentedI've been trying out the #65 patch provided by Mithrandir for several days importing a test Google calendar I created and so far no problems and it works to delete nodes that are no longer in the feed.
thanks,
Comment #83
dan2k3k4 CreditAttribution: dan2k3k4 commentedUsing patch in #65 by Mithrandir, had to clone the git repo so using 7.x-2.x and it seems to work well.
I have 2 importers that use JSON Path parser to import the JSON feeds to 2 different content types, set to check and parse the feeds every hour (with errors reported back to email) so will let you know if any errors occur but so far so good.
Comment #84
g089h515r806 CreditAttribution: g089h515r806 commentedApply #65 success with 2 warnings
"2 line add whitespace errors".
"plugins/FeedsProcessor.inc has type 100644, expected 100755".
have not test it yet.
Comment #85
yannickooThat is what I noticed in #72.
Comment #86
imclean CreditAttribution: imclean commentedThis addresses most of the issues raised by twistor in #80. It still needs tests and possibly tidying up of some comments.
Comment #87
imclean CreditAttribution: imclean commentedRemoved some whitespace.
Comment #89
imclean CreditAttribution: imclean commentedOops.
Comment #90
imclean CreditAttribution: imclean commentedComment #92
imclean CreditAttribution: imclean commentedMissed one. How do I run these tests locally before uploading?
Comment #94
twistor CreditAttribution: twistor commentedTo run the tests locally enable the "Testing" module, "simpletest" from Drush. Then go to admin/config/development/testing.
Comment #95
imclean CreditAttribution: imclean commentedComment #96
yannickooThank you imcelan, patch looks fine to me. I found two minor things. I would like to set the status to "Needs work" because the batch deleting is missing.
We should add a batch here.
Full stop is missing.
I don't think that it's necessary to reference
$node
because you are passing the whole node object tonode_unpublish_action
andnode_save
.Currently all entities are deleted once right, so we should figure this out.
Comment #97
zuernBernhard CreditAttribution: zuernBernhard commentedLove to see this in the module ... can i suppose that it is added if i do the mentioned points above for the last patch ?
-> Add batch processing
-> Fix Comments
-> ... ??
Comment #98
vinmassaro CreditAttribution: vinmassaro commented@zuernBernhard: if you can add what needs to be done in #96 then we can test the patch and get it moving. I would also like to see support for this.
Comment #99
PolI will try to provide a new patch with batch support tomorrow because we need this feature.
Comment #100
PolHello Yannickoo,
Could you comment on this, this is the method
clean()
andunpublishNode()
fromplugins/FeedsNodeProcessor.inc
I'm not used to work with batch, so bear with me :)
If it's ok, I'll polish it and do a patch.
Thanks !
Comment #101
PolHi all,
I woke up this morning and see that all the nodes were unpublished using patch #95, so I suppose that there is an inconsistencies in the patch.EDIT: Sorry about that, I made further test, the patch is perfectly ok.
Is there people using this patch and having problems and/or success stories ?
Another question comes in my mind: what to do if one of the unpublished nodes appears again in the imported stream ? Should we change it's state to 'published' again ?
Edit: This is easy to fix by mapping a field to the
published state
field of the node and the job is done.Thanks!
Comment #102
imclean CreditAttribution: imclean commentedHi Pol,
Thanks for your work.
Please don't add/change information in by editing your post as it makes it hard to follow (especially for people who receive email notifications). Just post again with the updated or new information.
Please create a patch against the version in git for us to test.
Also see Making a Drupal patch with Git.
Comment #103
byronveale CreditAttribution: byronveale commentedI just tried the patch in #95, and it worked a treat! We currently have 882 items in our feeds_item table; with this last test, it correctly identified and unpublished the 121 entries that were no longer in the feed.
And having the option to un-publish instead of deleting the nodes totally saves our content manager from extra work; otherwise, he would have lost the additional info entered on older feeds nodes, nodes that have been superseded by newer feed entries (changing feed source, long story, can provide details if you _really_ want them...).
Thanks for everyone's efforts on this.
Is there anything I can do to help get this into code? At least the dev version? We have a client that this feature is just perfect for, and IMHO, really makes Drupal shine...
(Note: yes, I did edit this comment, sorry, but only to clarify what it was I tried...)
Comment #104
byronveale CreditAttribution: byronveale commentedFurther apologies -- I didn't mean to change the status when I posted my comment; I hadn't refreshed my browser since yesterday afternoon, and so missed imclean's comment #102.
Comment #105
imclean CreditAttribution: imclean commented#96 lists some work to do. There are 3 minor issues + the batching to solve.
Comment #106
eugene.ilyin CreditAttribution: eugene.ilyin commentedHm, if node will not exists in next file for import, it will be unpublished. But what if same node will be available again in next import? Maybe need to set
for all updated nodes in function process()?
Comment #107
PolHi @Eugene,
I had the same problem and I got it fixed by adding a new mapping to the status field.
I also had to modify the feed and include a new data 'status' too.
You can see the result here: http://jobs.trasys.be/
All the jobs you see there are imported by feeds. They are retrieved using a callback (Feeds Callback Fetcher), the callback is using the Taleo PHP Library to get the vacancies from the company I work for.
For an unknown reason that I cannot understand, sometimes, all the jobs are unpublished. Maybe the feeds returned is empty, but still, it should publish them automatically at the next cron (every 15 min) but it doesn't do it.
I'm still looking...
Comment #108
gnucifer CreditAttribution: gnucifer commented@imclean I can't see why the the join on line 356 is necessary?
As far as I can see it would be enough to select the entity-ids directly from the feeds_item table?
Something like this:
Comment #109
mikran CreditAttribution: mikran commentedIssue description is not very descriptive so I'm not sure about issue scope here, but I've added 'block users' action to user processor.
Comment #110
PolI had an idea in my mind today when reading the thread...
What do you think about a system that would unpublish (or delete) an entity if it hasn't been found in the feed more than X times ?
As example, let's say that we configure the importer to unpublish nodes that are not in the feed more than 3 times. It's only when the nodes is not found more than 3 times that she get unpublished.
What do you think about this idea ?
Comment #111
gnucifer CreditAttribution: gnucifer commented@Pol It seems like very convoluted logic to me. I can't imagine a situation where this would be useful? It would make more sense to implement this custom behavior in a separate processor extending the node-processor for that specific use case.
Comment #112
gnucifer CreditAttribution: gnucifer commentedI have a suggestion concerning the variable name $removeList. Since (for example) there is an option to unpublish in the node-processor this name is a bit misleading. Also, as it's initialized to all entities for the current feed the name could cause confusion since the variable name implies they are all to be removed (even if there is a comment describing the variable's intended purpose). I think for example "remaining" is more correct, since it's just state, not necessarily associated with a specific purpose/action.
Comment #113
gnucifer CreditAttribution: gnucifer commentedHere is the patch updated with the variable name change.
Comment #114
gnucifer CreditAttribution: gnucifer commentedI myself have a use case where I need to keep nodes no longer found in the feed and flag them in a certain way. I added a feature to trigger an event for the missing items (that can be used to set a value for a field in my case). This rule could in theory also be used to delete/unpublish items, duplicating the functionality in the clean-method, but that would be a very obfuscated way of providing this feature. The current quick and easy way of doing this is much better.
Comment #115
Summit CreditAttribution: Summit commentedHi @gnucifer,
Could you elaborate more on your solution. Did you make special rule for it?
My usecase are products (entities) which are not in the feed anymore, but are connected to nodes or just in the product list. How can I get some kind of logging those products are not in the feed. And may be which are new in the feed?
Greetings, Martijn
Comment #116
imclean CreditAttribution: imclean commentedHmm...Let's keep this issue concise.
@pol, that would be a separate feature, perhaps added externally. You could also use the existing expire logic.
@gnuficer, "remaining" is the opposite of what removeList is. Items in removeList are deleted or unpublished, items which are not in the list remain.
I'm not sure about about the rules addition either at this stage. A hook may be more flexible anyway, which could then be used to create a rule somewhere else.
Between this issue and the other one, which was started 4 years ago, it's been a long time getting the feature request this far. The important things to work on right now are:
1. Tidying up comments and code
2. Batch support
3. Tests
Comment #117
gnucifer CreditAttribution: gnucifer commented@imclean I was thinking more in terms of "remains to be processed". Could also be called "unprocessed_items", "unprocessed" or some other perhaps better name. "RemoveList" is misleading primarily because items might not be removed at all, but blocked in the case of users, or unpublished if nodes. It's a pretty small detail, but still important since a change to a more appropriate name would make the code more readable.
EDIT: This also applies to the method name "initEntitiesToBeRemoved".
You are correct a hook is probably a better choice, and sure there is more important task to work on. I mostly posted the rules patch because I need that functionality myself, and if someone else does to it's nice to have a patch for it. I will probably rework the code to use a hook instead, and perhaps place the rule-integration in a separate module instead of a conditional invokation.
@Summit An event available in rules is triggered for each item not found in the feed. I'm not 100% sure of what you are trying to do, but you might be able to achieve it by creating a custom rule for the "not found" event.
Comment #118
byronveale CreditAttribution: byronveale commentedJust wanted to clarify before attempting a new patch:
- imcelan's patch in #95 needs some minor updating, as outlined in yannickoo's comment #96
- Pol's comment #100 includes an attempt at fixing batch processing
- creating a patch incorporating the above items against the version in Git, and submitting for testing, will help get the unpublish/delete functionality working in the dev version of Feeds
- items in #106, #107, and #108 aren't critical, as the patch in #95 is working (clarification here would be greatly appreciated!)
- items brought up in comments #109 to #117 (and beyond?) aren't directly addressing this issue, and so should be addressed elsewhere
Sound correct?
Comment #119
gnucifer CreditAttribution: gnucifer commented@byronveale #117 addresses the patch in this issue, so creating a new issue would be confusing.
@Pol regarding #100. I have thought about the batch-processing, and will later try to elaborate further on the problems implementing this for the "clean" operation, but your approach will not work since batch operations will not be processed if the feed is updated on cron. It's a really hairy issue, and I think it will be difficult to implement batch-processing, at least in a synchronous way.
Comment #120
mikran CreditAttribution: mikran commentedQuote from twistor, comment #80.
So that leaves us with tests only.
Issue title states nodes; so users, rules, batch etc can follow as follow-ups.
Comment #121
gnucifer CreditAttribution: gnucifer commentedRegarding batch processing, here are my thoughts:
As far as I understand it feeds will invoke FeedsSource::import() either as a background job though the Job Scheduler module, or using the drupal batch-api.
Job scheduler will run it's taks on cron, calling this worker function:
The batch-api-job is set in FeedsSource (or a Job Scheduler-job if the importer in configured to process in the background):
If the job is done or not is determined in FeedsSource::progressImporting():
It's used both for the batch-job, and the job-scheduler-job.
If we want to batch the deletion items we probably need to make FeedsSource::progressImporting() take this into account. But that is really difficult to do since we know how many items needs to be deleted/processed first after the import is done. So this needs to be done in another job. In that case we cannot call clean inside of process, since we do not know if we are invoked through cron or a browser, clean must be able to be invoked independent of process or state which makes things more complicated. We cannot use state for storage, but must find another solution.
An alternative option would be to always schedule a background-job in process and passing on the $removeList. There might be other approaches, and I will try to figure out some other way of dealing with this, but to me it seems pretty hard to solve this in a "nice" way.
Comment #122
gnucifer CreditAttribution: gnucifer commented@mikran I agree batch processing might not be worth looking at right now. It must be pretty rare for such a large variance among items that deletion will cause a problem. Feeds will probably choke of other reasons before this becomes an issue.
I also had an other idea that will allow us to get rid of the dependence of state. If we assign each import a unique ID, save the latest import id, and mark all items with the current import ID, we can at any time determine which items are missing in the feed by a simple SQL-query:
SELECT * FROM feeds_items WHERE import_id <> $latest_import_id AND feed_nid = $feed_nid AND id = $id
There are some problems with this though. If the import crashes in the middle, it will seem like half of the items where removed. Therefore we must save all processed entity ids during the import and commit those to the database when the import is done. Issues also arises if another Import is runnig at the same time as "garbage collecting". This could perhaps be solved through locking.
But if one can avoid all pitfalls this will allow us to "garbage collect" at any point, making implementing batching/job-scheduling easier.
Comment #123
byronveale CreditAttribution: byronveale commentedI've created a patch against the 7.x-2.x version in git incorporating Pol's work in #100.
I tested in in a local Acquia dev instance, Drupal 7.23, MySQL 5.1.66, PHP 5.3.18, Mac OS X 10.8.4. There are 767 items in my feed (coming from LDAP), and I tested first manually, from the feed's import page, and then using the periodic import option with a cron job.
Seems to be working fine; please try for yourselves.
Any other information needed, do let me know...
Comment #124
gnucifer CreditAttribution: gnucifer commented@byronveale Please consider what I wrote in the previous posts about batching. It's working fine because #100 only implements batching for the unpublish action, not delete. I you where to choose the unpublish-option nothing would happen (running on cron).
Comment #125
mikran CreditAttribution: mikran at Mediamaisteri Oy commentedSome tests added to patch from comment #95.
Comment #126
byronveale CreditAttribution: byronveale commented@gnucifer: I would consider them, if I actually could; but while I managed to get a patch uploaded (first one), I don't have the chops to really follow what you guys are doing...
And actually my patch is not working fine -- after leaving it to run every fifteen minutes through cron for the day, I have found that it's deleting nodes on me, despite being set to "unpublish", and despite the content not being removed from the feed.
So to try and get the delete/unpublish-items-no-longer-in-the-feed functionality working (the goal of this issue, right? and also what our client needs!), and following on from twistor's and mikran's comments, I'll try mikran's patch from #125 (tests added to imclean's patch from #95). I'll test running manually and with cron, and both unpublishing and deleting expired nodes.
Yes, batching would be great; gnucifer, Pol, perhaps you could direct your efforts towards issue #1363088?
Thanks for everyone's efforts!
Comment #127
twistor CreditAttribution: twistor commentedNeed to take a look.
Comment #128
byronveale CreditAttribution: byronveale commentedOne final (for the moment) question, what about gnucifer's comment #108 -- that wasn't just about batch processing, right? Is that something we should try to incorporate?
Thanks again everyone...
Comment #129
gnucifer CreditAttribution: gnucifer commented@byronveale Yes, that comment had nothing to do with batch processing, but what just about the join being potentially unnecessary.
Comment #129.0
gnucifer CreditAttribution: gnucifer commentedissue summary updated
Comment #130
mikran CreditAttribution: mikran commentedConfig form isn't consistent how it's calling things:
Comment #131
yannickooShould be "nodes" because it is the node processor.
Comment #132
arrubiu CreditAttribution: arrubiu commentedA question, sorry but the thread is very long.
Which is the latest patch that works? I need this feature and I've to apply to the stable version of feeds.
Thanks.
Comment #133
pollegie CreditAttribution: pollegie commented@arrubiu Got it working?
Same thoughts here..
Comment #133.0
pollegie CreditAttribution: pollegie commented#2092895 added to the list of related issues
Comment #134
kostajh CreditAttribution: kostajh commentedI updated the patch from #125 to address the wording concerns noted in #130. If you are using the Node Processor, you will see "nodes" instead of "entities". I also tried to make the descriptive text in the form a little clearer.
Comment #135
ket qua bong da CreditAttribution: ket qua bong da commentedIt's ok
Comment #136
Summit CreditAttribution: Summit commentedComment #137
zuernBernhard CreditAttribution: zuernBernhard commentedwhen does this patch get merged into the module ?
Comment #138
PolThe patch doesn't apply anymore.Sorry, my mistake.
Comment #139
twistor CreditAttribution: twistor commentedIt seems an anonymous function got snuck in this patch.
Comment #140
MegaChriz CreditAttribution: MegaChriz commentedThe anonymous function from the patch in #134.
@twistor I think you mean anonymous functions should not be used because they are available since PHP 5.3 and Feeds should work with PHP 5.2.5 (minimal requirement for Drupal 7)?
Comment #141
mikran CreditAttribution: mikran commentedWhatever the case is with PHP requirements that anonymous function added in #134 is wrong solution to issue noted in #130.
Comment #142
MegaChriz CreditAttribution: MegaChriz commentedIt looks like there are some untranslatable strings in the patch as well. Ideally, this should be something like
t('Skip non-existent @entity_type_plural_label')
, then you don't have to override this string for each entity type.Comment #143
stefan.r CreditAttribution: stefan.r commentedRe-roll, incorporated feedback from #140, #142.
Comment #144
mikran CreditAttribution: mikran commentedThis isn't multilingual solution at all, str_replace() won't work here. Whole string needs to be re-entered inside t(), or then some kind of substitution like t('Delete non existent @entities', ...
This is good example how those strings could be constructed.
Comment #145
stefan.r CreditAttribution: stefan.r commentedComment #146
johnnydarkko CreditAttribution: johnnydarkko commented#145 is working great for me.
One thing to take note of is that when the feed item's entity gets unpublished, the feeds_after_save hook doesn't get invoked. I haven't really dug deep into the patch to see what other hooks don't get invoked, but as the patch stands in #145, only the node api hooks are available to those who need to implement hooks on feeds unpublishing the nodes.
Comment #147
stefan.r CreditAttribution: stefan.r commentedjohnnydarko, on items that are actually included in the feed, where an entity is being being updated/created, this code runs:
The patch just does a node_save instead. So in terms of other hooks than
hook_feeds_after_save
, it's alsohook_feeds_presave
, a rules invocation and a "skip" flag that are not running. Invoking any of those seems like too much of an API change ($item
would be NULL and until now it was implied those hooks only ran for existing feed items where a node was being updated/created, so things may break on existing code implementing these hooks).Maybe a
hook_feeds_after_unpublish_non_existent_node()
hook could be added if people need the $source FeedsSource object, or if they need to be able to distinguish between nodes that are (a) unpublished because they are not in the feed or (b) unpublished for other reasons. It seems like an edge case though.Comment #148
tostinni CreditAttribution: tostinni commentedThanks for the work you guys are doing, juste a quick question, is someone taking care of republishing nodes that has been previously unpublished like mentionned in #41 ?
Comment #149
stefan.r CreditAttribution: stefan.r commentedtostinni, that is actually relevant to the point raised in #147, ie. right now we can't distinguish between nodes that are (a) unpublished because they are not in the feed or (b) unpublished for other reasons.
Perhaps a "non_existing" property on field_item may help solve this problem. It would cost us an extra database field, some extra logic and an update hook :)
That way, if feed items needing to be unpublished upon disappearing but later needing to be republished upon reappearing is this something that happens often enough to warrant #148 going into the feeds module, we can do something like this in
FeedsNodeProcessor::entitySave()
Comment #150
johnnydarkko CreditAttribution: johnnydarkko commentedThanks for the clarification, stefan.r. That edge case is actually what I ran into immediately and turns out to be just fine implementing hook_node_update() for my particular use case. Additional hooks would definitely be a 'would be nice' feature to consider for future development. Thanks for your work!
Comment #151
stefan.r CreditAttribution: stefan.r commentedThanks. What I meant is for most purposes the node API hooks should be enough (as in your case). The edge case would be if anything beyond that were needed. :)
Comment #152
littledynamo CreditAttribution: littledynamo commented+1 for #149
This is causing us problems with nodes node not being re-published. We are using this patch as part of a staff import from FileMaker and our website should reflect the status of the staff member in the FileMaker database. It's quite common for people to be de-activated and then re-activated again in FileMaker. So, the staff member would disappear from the website in this scenario. We're handling the re-publishing manually at the minute, but we're relying on people telling us that a staff member is missing from the website, so far from ideal.
Comment #153
Abelito CreditAttribution: Abelito commentedAnother +1 for #149
I guess it is not really an "edge case".
Similar functionality would be used for the User Processor "user blocking", mentioned in #109
Comment #154
stefan.r CreditAttribution: stefan.r commentedEdit: this patch is broken because of a typo, please refer to #157 instead.
Comment #156
Abelito CreditAttribution: Abelito commentedThere is a work-around that may work for some people who have the problem mentioned in #149.
- For Node processor, add a "Published" column to your import file with "1" (meaning it will be published) in all the rows, and map that to the "Published status". (This will explicitly set the node to be published, and therefore will republish an unpublished node. Good for most cases, but bad if some imported items should remain unpublished [like if they were unpublished manually instead of by the feed importer].)
- For User processor, add a "Status" column to your import file with "1" (meaning it will be active) in all the rows, and map that to the "Account status".
Also note, if there are no rows in the import file (just headers), it will NOT set your nodes to "Unpublished", or your users' account status to "Blocked". This could be a problem for some, or a failsafe for others.
Comment #157
stefan.r CreditAttribution: stefan.r commentedThis is an implementation of #149 that differentiates between manually unpublished nodes and nodes that are automatically unpublished for being nonexisting. In the latter case it will republish the node if it shows up again in the feed.
There now is a configurable option to set the action to take when nodes that were previously unpublished for being non existent, show up again in the feed ("republish" or "leave unpublished").
Any nodes that are unpublished automatically will be flagged for republishing as soon as they show up again (unless the option is set to "leave unpublished", in which case the flag will be ignored).
The default is for this option right now is "leave unpublished". Does that seem right?
Comment #158
stefan.r CreditAttribution: stefan.r commentedSetting to Needs review so testbot will pick patch #157 up
Comment #159
stefan.r CreditAttribution: stefan.r commentedSeems like that wasn't picked up, reposting to be sure.
Just to clarify, this is exactly the same file as in #157, and #157 was just correcting a typo that broke the patch in #154.
#154 still has the relevant interdiff for the new republishing feature (except the typo).
Comment #160
stefan.r CreditAttribution: stefan.r commentedSmall correction to maintain compatibility with another feeds patch.
Comment #161
stefan.r CreditAttribution: stefan.r commentedAnd once more as previous patch was incomplete (sorry for the noise)
Comment #162
jamiroconca CreditAttribution: jamiroconca commented161: feeds-unpublish-delete-entities-not-in-feed-1470530-161.patch queued for re-testing.
Comment #163
bdanin CreditAttribution: bdanin commentedpatch in #161 appears to be working great for me
Comment #164
Honza Pobořil CreditAttribution: Honza Pobořil commentedCould you add similar feature for User Processor? (optional deactivate or delete)
Comment #165
mikran CreditAttribution: mikran commentedI've already posted separate issue for that #2092895: Block users not included in feed.
Comment #166
newtoidPatch effective but did miss one bit, here is the terminal output:
Comment #167
vinmassaro CreditAttribution: vinmassaro commented@david newton: Patch applies against 7.x-2.x, not 7.x-2.0-alpha8:
Comment #168
vinmassaro CreditAttribution: vinmassaro commentedHere is an update to the patch to add consistency to the spelling of "non-existent". I've included an interdiff to show there are no functional changes.
I also tested the patch against a feed that removes items when they are no longer valid, and they were correctly deleted from my site.
Comment #169
vinmassaro CreditAttribution: vinmassaro commentedComment #170
JKingsnorth CreditAttribution: JKingsnorth commentedThe patch will need a couple more positive reviews before it is RTBC
Comment #171
fdefeyter@gmail.com CreditAttribution: fdefeyter@gmail.com commentedTo me, it not working: sql error can't find a column that concerns the republishing. Thanks anyway.
Comment #172
stefan.r CreditAttribution: stefan.r commented#171: did you run update.php?
Comment #173
LauriKoobas CreditAttribution: LauriKoobas commented#168 worked for me.
Comment #174
mErilainen CreditAttribution: mErilainen commentedThis is golden, works like a charm!
Comment #175
dkre CreditAttribution: dkre commented#168
Beautiful work.
Comment #176
twistor CreditAttribution: twistor commentedWhere did the republishing feature come from? I haven't read back though the whole issue. Republishing can be controlled simply though whether or not there is a value mapped to the published status.
Either way, it's outside the scope of this feature.
Comment #177
stefan.r CreditAttribution: stefan.r commented@twistor - the republishing was asked for in comments #41, #42, #148, #152, #153
If this issue is committed without any republishing feature, I am not sure there is a way to let nodes that were manually unpublished (instead of by the feed importer) remain unpublished? I don't think mapping a value to published solves that...
Even if we commit it without a republishing feature, can you think of a way for us to make the distinction "manually unpublished" (must not be republished) vs "unpublished by the feed importer" (must be republished)
Comment #178
stefan.r CreditAttribution: stefan.r commented- Republishing feature taken out per maintainer request
- Added
$config['update_non_existent']
toFeedsEntityProcessor
as it was causingFeedsEntityProcessorTest
to failComment #179
stefan.r CreditAttribution: stefan.r commentedPlease can a few people review so this can go back to RTBC?
Comment #180
JKingsnorth CreditAttribution: JKingsnorth commented@twistor: Without meaning to prolong this issue, if the republishing feature was working OK in #168 then why not run with it? Seems like a very useful thing for the reasons mentioned by stefan.r in #177
Comment #181
MegaChriz CreditAttribution: MegaChriz commentedUI
I tested the patch from #178 and it works great! This were the scenarios that I tested:
All scenarios were tested with File upload fetcher, CSV parser and with "Skip hash check" on.
One thing I noticed was that when importing a "blank" CSV file (with a header, but no rows), no nodes/entities got deleted. I think this behaviour is good as I think that one usually does not have the intention to import a blank source.
Code
I found a few minor issues in the code. Most is about coding standards. One is about a potential, but minor security issue: in the method
initEntitiesToBeRemoved()
from the class FeedsProcessor some variables are unescaped passed to a database query.I also noticed that while tests for unpublish/delete nodes were added, there is no equivalent test for the generic entity processor. I think this should be added too.
Extra whitespace before "public".
No doc block.
Missing period.
Missing comments for param "$state".
This change sounds like it has something to do with the now removed "republish" feature.
I think
$info['entity keys']['id']
and$this->entityType()
should be escaped withdb_escape_field()
. See the methodFeedsProcessor::entityLoad()
where escaping is also done.Summary
Tested the patch from #178.
Setting to "Needs work" because of point #3 and #4.
Comment #182
a.vakulenko CreditAttribution: a.vakulenko commented#181 nice job, we are getting closer to releasing this feature!
Comment #183
jenlamptonAlso tested the patch in #128, as I hoped the batching in this patch would help with #1363088: Feeds batch process not fully implemented - needed for large imports
Unfortunately, this patch doesn't work with the iCal importer, specifically all the repeating dates have been lost.
Comment #184
MegaChriz CreditAttribution: MegaChriz commented@jenlampton
Thanks for testing! I don't get what you mean with "tested the patch in #128" because there is no patch in #128 in this issue. Did you test the latest patch, from #178?
This sounds like this is caused by other recent changes in Feeds, specifically the change from #2093651: Simplify target callbacks.. Have you tested with the latest dev of Feeds with and without the patch from #178 applied and only experienced this issue with the patch (from #178) applied? If so, can you provide the steps to reproduce the issue with the iCal importer or - even better - point out which piece of the patch in #178 causes the fact that "repeating dates" get lost? By the way, does "repeating dates have been lost" means that some values get cleared out on import or does it mean something else?
@stefan.r
Are you willing to provide a new patch that addresses the issues noted in #181?
Comment #185
manuelBS CreditAttribution: manuelBS commentedNice patch, it also works for me, thanks a lot!
Comment #187
Charles BelovI'm trying to wrap my head around the scope of this change, which would be a welcome one. I have some questions, speaking as someone who has not yet tried the patch and wants to know what to expect before I do:
1. Does the patch for this issue work with nodes that existed prior to the patch for this issue being implemented?
2. If so, how does the patch determine which nodes to delete when they don't have corresponding items in the first feed to run after implementation?
3. If not, do I need to delete all of the pre-existing nodes that were created by a particular feed immediately prior to the first time the patch runs for that feed?
4. If not, is there a reasonable way to detect which pre-existing nodes are no longer relevant, so that I can do a one-time clean-up?
5. How does this patch interact with the Update Existing Nodes setting?
Comment #188
MegaChriz CreditAttribution: MegaChriz commented@Charles Belov
As per #181, setting this issue back to "Needs work".
Comment #189
dpico CreditAttribution: dpico commentedNice patch, thank you very much! I need something like this for a Drupal Commerce site with a large number of products, so site owners can manage products through Excel sheets and removed products from the Excel sheets are removed from the site.
Do you think the patch would work with Commerce Feeds? I couldn't test it yet.
Comment #190
barryvdh CreditAttribution: barryvdh commentedSo how much work is needed to implement this? I'm also looking for this feature.
Comment #191
MegaChriz CreditAttribution: MegaChriz commented@Barryvdh
See #181 for details.
Comment #192
kristygislason CreditAttribution: kristygislason commentedThis patch works great for me and I am already using for nodes. Thank you!
When I use this patch with user entities, I am only given one option - to delete users not included in the list. It would be helpful if there was also the option to set the user to "blocked" rather than deleting the accounts (Similar to unpublishing nodes vs. deleting nodes).
Also, any content the user created is deleted when the user is deleted. It would also be useful if when the user was removed from the imported list, the account is deleted and any associated content respects the user account settings that are set here 'admin/config/people/accounts'. It currently deletes the associated content regardless of the setting below:
"When cancelling a user account:
Comment #193
imclean CreditAttribution: imclean commented@kgislason, there's a separate issue for blocking users: #2092895: Block users not included in feed
Comment #194
MrDaleSmith CreditAttribution: MrDaleSmith commentedUsing this patch on a site at the moment, and whilst it adds republish_when_exists as a key in the schema defined in feeds.install, it doesn't also add the republish_when_exists field to the schema. If you reinstall the module or add a new subsite using the module, you get a fatal error when trying to create the DB tables.
Comment #195
rogerpfaffThe implementation of republishing content that is again available in the feed is greatly appreciated. :)
Comment #196
stefan.r CreditAttribution: stefan.r commented,
Comment #197
imclean CreditAttribution: imclean commented@stefan.r, what works for node only at the moment? It should work for unpublishing nodes, deleting nodes and deleting other entities. And re-publishing etc.
Comment #200
JKingsnorth CreditAttribution: JKingsnorth commented@imclean Looks like the republishing feature was removed in #178?
Comment #202
stefan.r CreditAttribution: stefan.r commentedRemoved stray quote
Comment #203
stefan.r CreditAttribution: stefan.r commentedAdded entity processor tests.
@MegaChriz, look good now?
Comment #204
MegaChriz CreditAttribution: MegaChriz commentedLooks very good. I did the same manual tests as in #181 to ensure the feature still worked as intended and noticed nothing different, which is good.
I found a few minor things in the code, such as that the constant "FEEDS_UNPUBLISH_NON_EXISTENT" was defined before the @file docs (which I apparently didn't noticed before). I fixed this one and changed a few code comments in the attached patch.
Looks ready to me. Thanks for your hard work, stefan.r! :)
Comment #205
stefan.r CreditAttribution: stefan.r commentedThanks!
Comment #206
mark_schaal CreditAttribution: mark_schaal commentedHas this been adopted into the dev branch, or is it still a patch?
Comment #207
stefan.r CreditAttribution: stefan.r commentedWe are still waiting for this to be committed, the issue would be marked fixed otherwise :)
Comment #208
mark_schaal CreditAttribution: mark_schaal commentedTouche, I just now saw the status [sheepish] Thanks for your hard work stefan!
Comment #209
AndrackOSnack CreditAttribution: AndrackOSnack commentedHi,
thank you so much for your work stefan.r. But there is still one little problem on my instance (maybe only me, heh ?).
I have a feed importer that will check a distant folder via an API request. Long story short : If the xml that I retrieve is empty but my importer contains nodes, it will bypass the suppression and just tell me that "there is no new node". I've worked around this by commenting this (quick&dirty, I need this asap :)) :
It appears that the $state->removeList is empty, resulting that initEntitiesToBeRemoved is never called.
If I understood well, the node contained in my feed importer will be erased by the clean method.
One again, I don't know if this will only apply on me or anyone else but, you know, just to be sure :)
Comment #210
stefan.r CreditAttribution: stefan.r commentedI am not sure I understand. If you receive an empty feed, you want all items to be deleted? In that case you'd want:
Maybe when this patch is committed we can turn that into an option, in a separate issue?
Comment #211
MegaChriz CreditAttribution: MegaChriz commented@AndrackOSnack
I noticed back in #181 too that no items are deleted/unpublished when the source is completely empty, but I would say this is intentional. I think that it is rare to be wanting to import a blank source. For example, it could happen that something went wrong during fetching the source, ultimately resulting in a blank source. You don't want all items to be deleted in such case.
Comment #213
twistor CreditAttribution: twistor commentedCongratulations everybody! That was a long time coming.
Comment #214
JKingsnorth CreditAttribution: JKingsnorth commentedFantastic! Thanks especially to stefan.r.
Should we now open a follow-up issue for the re-publishing feature that was removed in comment #178?
Comment #215
yannickooYeah after 2,5 years this issue is finally fixed <3 Thanks to everyone who made this possible!
Comment #216
mxwright CreditAttribution: mxwright commentedThis is great news. Congrats to everyone!
Comment #217
manuelBS CreditAttribution: manuelBS commentedYeah thats really great news for the weekend ;-) Thaks to everybody who helped getting this done.
Comment #218
bendev CreditAttribution: bendev commentedThis is very good news.
I am looking forward to test this.
Just for my understanding, what is the best approach ? use the .dev version or wait for the next alpha release (Any idea when this would be available) ? Or create a patch for the current alpha8 version ?
thanks anyway for the good work!
Comment #219
Pol@bendev: it will be ready when it's ready :)
For the moment, the best is to use the dev version which is quite stable already.
Comment #221
jenlamptonokay, so I see that this is in now, but thought I'd post my results here anyway.
I'm now running 7.x-2.x-dev https://www.drupal.org/node/927032
I updated the module and ran drush updb.
I then deleted all the items that are imported with feeds (in my case, calendar events).
Then I went back to run the import, and in the UI I see the system crank for about 2 minutes, then I get this error:
The website encountered an unexpected error. Please try again later.
In the error log, I get this error:
Recoverable fatal error: Object of class FeedsSource could not be converted to string in DatabaseStatementBase->execute() (line 2171 of drupal-7/includes/database/database.inc).
Anyway, I'm still suffering from the problem over at #1363088: Feeds batch process not fully implemented - needed for large imports
I have a site I'm supporting that needs it feeds_process_limit increased by 200 each quarter, so that the import will complete on cron as they add more data. I'd love to see a suggestion / recommendation on how to solve that problem over there, if anyone has one.
Comment #222
JKingsnorth CreditAttribution: JKingsnorth commentedHi jenlampton, it's probably best to create a new issue as a bug report for this.
At over 220 comments I think it's best we put this issue to bed, since the feature request is now in the module.
Comment #223
Shane Birley CreditAttribution: Shane Birley commentedSince this could be used to delete users (entities) -- is UID1 protected if it isn't included in a CSV?
Comment #224
JKingsnorth CreditAttribution: JKingsnorth commentedHi Shane, can you open a new support request (or bug if you can demonstrate the issue) for this, it may not get picked up on the end of a closed issue. Thanks.
Comment #225
andrea.brogi CreditAttribution: andrea.brogi commentedGreat,
thanks to everybody.
Using dev version the new option "Delete non-existent nodi" is working correctly.
Comment #226
jenlamptonThanks @John.K
New issue opened over at #2445533: Fatal error: Object of class FeedsSource could not be converted to string
Comment #227
Tabestan CreditAttribution: Tabestan commentedI need the patch for 6.x (yes I know). Where can I find it? All of these are for D7.
Comment #228
ucscholar CreditAttribution: ucscholar commentedWould I have to apply the patch from #204 since I'm using feeds 7.x-2.0-alpha9. I'm unsure because I can't find this commit in the CHANGELOG.txt nor release notes
Comment #229
JKingsnorth CreditAttribution: JKingsnorth commentedFrom the release notes for the latest alpha: ' This release only contains security fixes, no additional bug fixes or features.'
So yes, the patch needs to be applied, or use the latest dev release.
Comment #230
shaneod CreditAttribution: shaneod commentedLooks like this has been added to the latest release feeds 7.x-2.0-beta1
https://www.drupal.org/node/2531428
Thanks and well done to everyone!
Comment #231
rudetrue CreditAttribution: rudetrue commentedThis functionality isn't working for me in 7.x-2.0-beta1. Can anyone else confirm?
Comment #232
rudetrue CreditAttribution: rudetrue commentedI found out I'm only having issues when the feed is empty. I've created a new issue: https://www.drupal.org/node/2553331
Comment #233
andreboc CreditAttribution: andreboc commentedAnyone with a fix to the unpublish nodes with the 7.x-2.0-beta1 ???
Mine isn't working as well @rudetrue.
Comment #234
alinouman CreditAttribution: alinouman commentedYes, not working with 7.x-2.0-beta1. Does any one able to make it with 7.x-2.0-beta1 ?
Comment #235
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedThis feature is backed up with automated tests and should still be working (else the tests would report a failure). If it is not, please open a new issue with steps provided to reproduce the issue. The only case were unpublishing or deleting items does not happen is when the source is completely empty. This is by design as in most cases you don't want to wipe out everything when an empty source is provided. There is an issue open with a request to make that behaviour configurable: #2333667: Add an option to delete all nodes when the feed source is empty.
Comment #236
alinouman CreditAttribution: alinouman commented@MegaChriz, Sorry I was using this Commerce Feeds https://www.drupal.org/node/2270797 and it was using above patch but not working with Feeds beta now. So this issue is related to Commerce feeds. Feeds 7.x-2.0-beta1 is fine i have tested it. Thanks.
Comment #237
fuerst CreditAttribution: fuerst commentedFor anyone who still uses Feeds 6.x - attached patch is an updated version of #19 which applies to Feeds 6.x-1.0-beta13.
Comment #238
nielsvoo CreditAttribution: nielsvoo commentedAfter trying for ours and ours using all kinds off approaches i need some help with this.
This is my goal:
After importing a feed containing users, i want to block all the accounts in the site that are missing in the imported feed. Here i don't want to make any difference between previous imported or not previous imported.
In my case users are created not by the feed itself but by users opening the site having ldap connections.
Thanks in advance
Nielsvoo
Comment #239
dmitrii CreditAttribution: dmitrii at DrupalSquad commented@fuerst
I created a D6 sandbox module based on your patch.
https://www.drupal.org/sandbox/dmitrii/2787943