Can FeedAPI automatically delete items when they are removed from the feed?
| Project: | FeedAPI |
| Version: | 6.x-1.7-beta2 |
| Component: | Code feedapi_node |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
I'm looking for a way for FeedAPI to delete items as soon as they fall off the feed, so that the stored items are exactly the same as what the feed was when last read.
We use Vocus, which lets our media officers arbitrarily pick news stories to be published on an RSS feed that we post to our site (in a block), so we need old stories to immediately drop off whenever they choose something new. On the other hand, we can't really use "Delete Items Older Than," because some stories may stay on the feed for a week or two.
But the same might be true of a flickr group or the feed for a del.icio.us tag.
Here's a solution, but it's not pretty, so perhaps you can think of a better way in a future version. The problem is the "unique" functionality is designed to compare a new item to previously read items in the database, but not the other way around. So at a high level (above the processor), it's impossible to identify items that aren't on the current feed and delete them. That's why this is built into feedapi_aggregator.module:
1) Extend _feedapi_aggregator_unique to optionally return the matching fiid (for internal use)
2) Extend _feedapi_aggregator_expire to pre-scan items from the feed and delete ones in the database that don't match.
--- feedapi_aggregator.module 30 Jul 2008 15:39:37 -0000 1.2
+++ feedapi_aggregator.module 30 Jul 2008 16:01:14 -0000
@@ -1,5 +1,5 @@
<?php
-// $Id: feedapi_aggregator.module,v 1.2 2008/07/30 15:39:37 cvsroot Exp $
+// $Id: feedapi_aggregator.module,v 1.1 2008/05/27 18:04:23 devseed Exp $
/**
* @abstract This module emulates aggregator module with the feedapi framework.
@@ -194,6 +194,12 @@
'#default_value' => 3,
'#options' => drupal_map_assoc(array(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)),
);
+ $form['delete_missing'] = array(
+ '#type' => 'checkbox',
+ '#title' => t('Delete missing feed items'),
+ '#description' => t('If checked, previously read feed items will be removed when the feed is refreshed, if they are no longer in the feed (even if they haven\'t expired).'),
+ '#default_value' => 1,
+ );
$categories_result = db_query('SELECT cid, title FROM {feedapi_aggregator_category}');
$categories = array();
while ($category = db_fetch_object($categories_result)) {
@@ -510,7 +516,7 @@
/**
* Is this feed item created?
*/
-function _feedapi_aggregator_unique($feed_item, $feed_nid, $settings = array()) {
+function _feedapi_aggregator_unique($feed_item, $feed_nid, $settings = array(), $return_id=FALSE) {
$entry = FALSE;
if ($feed_item->options->guid) {
$entry = db_fetch_object(db_query("SELECT iid FROM {feedapi_aggregator_item} WHERE feed_nid = %d AND guid = '%s'", $feed_nid, $feed_item->options->guid));
@@ -522,6 +528,11 @@
else {
$entry = db_fetch_object(db_query("SELECT iid FROM {feedapi_aggregator_item} WHERE feed_nid = %d AND title = '%s'", $feed_nid, $feed_item->title));
}
+
+ if( $return_id ) {
+ return is_object($entry) ? $entry->iid : null;
+ }
+
return is_object($entry) ? FALSE : TRUE;
}
@@ -552,6 +563,25 @@
$count++;
}
}
+
+ $processor_settings = $settings['processors']['feedapi_aggregator'];
+ if( $processor_settings['delete_missing'] ) {
+ $items_to_keep = array();
+ foreach( $feed->items as $index => $item) {
+ if( $iid = module_invoke('feedapi_aggregator', 'feedapi_item', 'unique', $item, $feed->nid, $processor_settings, TRUE) ) {
+ $items_to_keep[] = $iid;
+ }
+ }
+
+ if( $items_to_keep ) {
+ $result = db_query('SELECT * FROM {feedapi_aggregator_item} WHERE feed_nid=%d AND iid NOT IN (%s)', $feed->nid, implode(',', $items_to_keep));
+ while( $item = db_fetch_object($result) ) {
+ $item->fiid = $item->iid;
+ feedapi_expire_item($feed, $item);
+ $count++;
+ }
+ }
+ }
return $count;
}I also had to change feedapi.module so that new items are read before the call to feedapi_expire, so the latter can see new items:
--- feedapi.module 27 May 2008 18:04:23 -0000 1.1
+++ feedapi.module 30 Jul 2008 16:00:30 -0000
@@ -1125,16 +1125,17 @@
}
$settings = feedapi_get_settings(NULL, $feed->nid);
- // Step 1: Force processors to delete old items and determine the max. create elements.
- $counter['expired'] = feedapi_expire($feed);
-
- // Step 2: Get feed.
+ // Step 1: Get feed.
$nid = $feed->nid;
$hash_old = $feed->hash;
$feed = _feedapi_call_parsers($feed, $feed->parsers, $feed->half_done);
if (is_object($feed)) {
$feed->hash = md5(serialize($feed->items));
}
+
+ // Step 2: Force processors to delete old items and determine the max. create elements.
+ $counter['expired'] = feedapi_expire($feed);
+
// Step 3: See, whether feed has been modified.
if ($feed === FALSE || $hash_old == $feed->hash) {
// Updated the checked field in any case.
#1
Subscribing. This would be extremely useful, as we want to use RSS to mirror content from another site. I haven't yet had time to see if the patch applies against 1.4...
#2
This would be great. I'd love a way to set a max number of feed items for a given feed. Or even just a way to set a max number to be fetched at cron or on refresh.
Possible already???
#3
This is not possible at the moment. Also unfortunately I don't plan to add new features to the 5.x branch. If someone step up with a patch, i happily review it.
#4
Are maxes and or throttling planned for V6?
I'm about to start on a feed aggregation project, that might enable me to spend time on this! :)
#5
Subscribing
#6
I've created a patch for the 6.x branch that I think provides the functionality that the original poster mentioned (thanks to therzog for some of the code).
Basically, there is a new option under the FeedApi Processor settings to delete old nodes that drop off the feed (see attachment for screenshot). If that box is checked then when a feed is refreshed (manually or through cron) any items that have fallen off the feed will be deleted.
It's working well for a couple of feeds I'm using at the moment, but definitely needs some testing and review.
#7
Noticed that the changes to feedapi.module in the last patch only apply to the node processor, which might not be used in all cases. I don't like the idea of storing feed items as nodes and am working on a lighter weight processor, and am having the same "dropped" feeds issue.
#8
This is the code I ended up writing within my own processor to make this work. It's called during the expire operation. Works well in preliminary testing.
if ($settings['processors']['feedapi_toolbox']['delete_missing']) {$guids_to_keep = array();
foreach ($feed->parsers as $parser) {
$result = module_invoke($parser, 'feedapi_feed', 'parse', $feed);
foreach($result->items as $item){
if(!empty($item->options->guid)){
$guids_to_keep[] = "'" . $item->options->guid . "'";
}
}
}
if (!empty($guids_to_keep)) {
$result = db_query("SELECT * FROM {feedapi_toolbox_item} WHERE nid = %d AND guid NOT IN (".implode(", ", $guids_to_keep).")", $feed->nid);
while ($item = db_fetch_object($result)) {
// We callback feedapi for deleting
feedapi_expire_item($feed, $item);
}
}
}
NOTE: Code snipped updated since original post.
#9
Has any of this been committed to the dev branch? It seems like mirroring the RSS feed should be a relatively trivial operation...
#10
@chrism2671 - I do not believe so
#11
Subscribe
#12
We've recently had a similar requirement. I'm way to paranoid to actually delete items when they are not found on the feed (what happens if the feed blips?) - so I opted for unpublishing them. I wrote it as separate processor and called it FeedAPI garbage collector :)
Here's the module. If somebody wants to run with it ad maintain it on d.o. as separate project, I'm all for it.
#13
I have another problem i that when I hit refresh on the feed to start creating nodes from a google calendar. I get a blank white screen.
#14
#12 works for me. Simple and clean.
#15
Sweet! it works. and if i change line 45 in "feedapi_gc.module" from
<?phpdb_query('UPDATE {node} SET status = 0 WHERE nid = %d', $feed_item->nid);
?>
to
<?phpnode_delete($feed_item->nid);
?>
It deletes those offending nodes. Thanks for this module.