Active
Project:
FeedAPI
Version:
5.x-1.2
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
19 Jul 2007 at 22:45 UTC
Updated:
28 May 2008 at 18:22 UTC
Jump to comment: Most recent file
Hi
Please to implement option to add node ased on selected keywords.
So I will be have rss from e.g. Computer & Media, but just those feeds will be converted to node and added into my site, in which will be selected/predefined keyword(s) (e.g. "AMD" or "Pentium", or "Intel, Pentium, AMD"...)
I am sure that feature as this is really missing in drupal. With this feature you can more detailed watch various rss feeds and add just these feeds, which are important/relevant for you.
Thanks for reading my post and I will be hope that something like this will be implementated.
Igor
www.somvprahe.sk
| Comment | File | Size | Author |
|---|---|---|---|
| #13 | feedapi_keyword_filter.zip | 2.15 KB | milosh |
Comments
Comment #1
aron novakThe FeedAPI project won't implement this feature (because I would like to keep it as simple as possible), but because it's an API, you're encouraged to create another feed processor with this functionality.
A detailed FeedAPI developer's guide will be created within my SoC project, so stay tuned!
Comment #2
aron novakHere is the developer docs, it really describes this issue. Take a look especially the feedapi_after_parse hook.
http://groups.drupal.org/node/5301
Comment #3
djsdjs commentedI am also very interested in this functionality and would like to add to the feature request "negative keywords"
So if I define the keywords "AMD" or "Intel" and a negative keyword "-motorola" then even if an article contained "AMD" and "Motorola" it would NOT be imported.
This is an important area because:
Comment #4
djsdjs commentedWould like to add to this that it would be nice to have the option to populate a field in the node instead of just include or not include the node. I think specifically of node moderation fields - so that content can be reviewed by a moderator before being released.
Maybe this is more of a "Positive and Negative Keyword Filter" followed by an "Action" such as "Populate Field (moderation)", "Do Not Import", "add keywords to taxonomy" with multiple filters / action configurations being able to be defined.
In a way the existing yahoo_terms functionality is an implementation of this since it does keyword filtering, but only does one action "add terms to taxonomy" - but I'm not sure that this support is implemented in a way where a module plugged in this way could tell feedapi to not add the item.
Comment #5
alex_b commentedI saw this functionality being requested over and over again: on the leech issue queue, on the simplefeed issue queue and during the early feedapi coding phase.
For getting this, we would need some conditionality in _feedapi_item_save() - we should use node api:
In the last lines of _feedapi_item_save() the node is being created after calling node_object_prepare():
If we added a test on the node object returned by node_object_prepare(), we could keep it from being saved:
This could be a way to go.
I ve got a problem with this approach though:
hook_prepare http://api.drupal.org/api/function/hook_prepare/5 is to be called before the node is presented on the add/edit form. We would be misusing this hook. I am not quite sure why this hook is called here anyway. Whats best-practice for creating nodes programmatically anyway?
Comment #6
alex_b commentedBetter title.
Comment #7
eyecon-1 commentedPerhaps another approach is to provide the necessary tokenization to pass the work off to workflow_ng which can compare the token to RegEx.
The feature really is essential in order to provide feed content consistent with the site's purpose.
Comment #8
alex_b commentedYet another approach would be to use a feedapi secondary parser to remove feed items from the parsed feed that shouldnt be created. Performance wise, this might be the most effective method.
On a more general note: Up to a certain number of items to suck down, it doesn't make any sense to worry about conditional aggregation - just show only the nodes with the features you are interested in (e. g. taxonomy term A is present, taxonomy term B is NOT present). Or am I missing sth here?
Comment #9
eyecon-1 commentedDoes anyone have this working at the feedapi-after_parse hook? Any code to share?
Thanks!
Comment #10
mustafau commented@alex_b
Following comment is written inside node.module file:
The function to call is http://api.drupal.org/api/function/drupal_execute/5
Comment #11
alex_b commented@mustafau: Unfortunately, going through FAPI is pretty slow. I recommend also reading up the recent discussion on the devel list in regards to this topic. There is actually a pretty neat confusion around this topic - but that's an aside and related to another issue somewhere here on the queue.
I would like to add to this discussion that lazy node instantiation is a very similar feature and possibly something that some people on this thread are looking for: http://drupal.org/node/232587
Comment #12
elgreg commentedI hacked a little bit at feedapi_node.module to get my own keyword filter working. First, I added a keyword field to the UI in the feedapi_node_feedapi_settings_form, second, I altered feedapi_node_save to have it check for the settings filters and loop through them. I doubt it works perfectly, but it's working for what I need to do. Hacking the module is bad, so I suppose my next step will be to copy the whole feedapi_node module, rename a lot of stuff and then use that module instead, right? I have yet to investigate how this would work if one were to add a keyword to a feed and then try to requery (my guess is that with the feed item already created, it would think that it was a duplicate.
Also, this is only searching the feed description (well, really the $node->body, which is set to the feed's description).
Here's the code - definitely not a patch :)
then down to the node_save
Comment #13
milosh commentedI have just created a separate module that will do the filtering based on developer suggestion to use feedapi_after_parse hook (see attachment). No changes in the core code is required. The unnecessary items will be droped from item's list, so the method should be independent on processors/parsers used.
The set-up possibilities will appear on every feed settings page and negative keywords as well as phrases can be used.
Unresolved known bug is that the settings page appear also on FeedAPI settings page, but this does not seem to affect the module functionality.
This module has not been tested very well, so expect surprises.
Please drop here some comments also, whether it should be separate project instead -- if this would be the case then I would try to upload it as an official module.EDIT: I have moved the discussion about separate module to http://groups.drupal.org/node/11220
Possible ToDo for the future: Storing keyword-sets together with the feed settings was the easiest solution. Ideally keyword-sets should be stored separately in order to make it possible to reuse the same set of keywords on several feeds. Separately stored keyword-sets would also make it possible to use plugin-like structure for the filtering keywords i.e. each feed could apply one or several number of keyword-sets at the same time. (Imagine that you have defined one general keyword set for negative words and several fine-tuned sets for positive words and then have a possibility to apply on your feed set no 1 + set no 3 + set no 7).
Comment #14
milosh commentedI have uploaded a FeedAPI Item Filter module into Drupal that solves the problem.
See the module here: http://drupal.org/project/feedapi_itemfilter
Comment #15
eyecon-1 commentedThis looks like a great idea. I have a few questions or concerns. These will help me to make some decisions.
Currently, I am filtering with an external script that runs after each cron run is completed. All feed items are unpublished. The script publishes the items that match. I am passing RegEx directly to mysql. I have update turned off. My assumption is that mysql is faster than php.
The downside is obvious. With a large number of feeds, I am storing a large quantity of unpublished items that are of no interest. The upside (I think) is that I am only downloading the items once. Would I be correct that the filtering module will continue to download, parse and filter the same unwanted items until they expire from the feed?
Have you tested this against a large number of feeds to determine if it will cause the cron run to time out?
Comment #16
milosh commentedI moved this discussion to here:
http://groups.drupal.org/node/11220#comment-38370