Hi

Please to implement option to add node ased on selected keywords.
So I will be have rss from e.g. Computer & Media, but just those feeds will be converted to node and added into my site, in which will be selected/predefined keyword(s) (e.g. "AMD" or "Pentium", or "Intel, Pentium, AMD"...)

I am sure that feature as this is really missing in drupal. With this feature you can more detailed watch various rss feeds and add just these feeds, which are important/relevant for you.

Thanks for reading my post and I will be hope that something like this will be implementated.
Igor
www.somvprahe.sk

CommentFileSizeAuthor
#13 feedapi_keyword_filter.zip2.15 KBmilosh

Comments

aron novak’s picture

Status: Active » Postponed

The FeedAPI project won't implement this feature (because I would like to keep it as simple as possible), but because it's an API, you're encouraged to create another feed processor with this functionality.
A detailed FeedAPI developer's guide will be created within my SoC project, so stay tuned!

aron novak’s picture

Status: Postponed » Active

Here is the developer docs, it really describes this issue. Take a look especially the feedapi_after_parse hook.
http://groups.drupal.org/node/5301

djsdjs’s picture

I am also very interested in this functionality and would like to add to the feature request "negative keywords"

So if I define the keywords "AMD" or "Intel" and a negative keyword "-motorola" then even if an article contained "AMD" and "Motorola" it would NOT be imported.

This is an important area because:

  • Many blogs written by individuals cover personal and non-personal information. An aggregation site does not want stories about someone's dog or latest vacation running along side their core content focus (like the latest microprocessors example above).
  • Making such selections at import time is more efficient than for storage, indexing, etc.
  • But probably the most relevant reason is that this allows an aggregation site to do much deeper plumbing of it's focus area. If I had this feature I could include 20-30 feeds that might have occasional (but extremely relevant) individual feed items and then my aggregation site is seen as high value add because it seems to be very smart about finding content that is off the beaten path.
  • djsdjs’s picture

    Would like to add to this that it would be nice to have the option to populate a field in the node instead of just include or not include the node. I think specifically of node moderation fields - so that content can be reviewed by a moderator before being released.

    Maybe this is more of a "Positive and Negative Keyword Filter" followed by an "Action" such as "Populate Field (moderation)", "Do Not Import", "add keywords to taxonomy" with multiple filters / action configurations being able to be defined.

    In a way the existing yahoo_terms functionality is an implementation of this since it does keyword filtering, but only does one action "add terms to taxonomy" - but I'm not sure that this support is implemented in a way where a module plugged in this way could tell feedapi to not add the item.

    alex_b’s picture

    Version: 5.x-0.x-dev » 5.x-1.x-dev

    I saw this functionality being requested over and over again: on the leech issue queue, on the simplefeed issue queue and during the early feedapi coding phase.

    For getting this, we would need some conditionality in _feedapi_item_save() - we should use node api:

    In the last lines of _feedapi_item_save() the node is being created after calling node_object_prepare():

    function _feedapi_item_save($feed_item, $feed_nid, $settings = array()) {
      // ...
      node_object_prepare($node);
      if (!isset($feed_item->nid)) {
        node_save($node);
        $feed_item->nid = $node->nid;
        db_query("INSERT INTO {feedapi_node_item} (feed_nid, nid, url, timestamp, arrived, guid) VALUES (%d, %d, '%s', %d, %d, '%s')", $feed_nid, $feed_item->nid, $feed_item->options->original_url, $feed_item->options->timestamp, time(), $feed_item->options->guid);
      }
      else {
        $node->nid = $feed_item->nid;
        node_save($node);
        db_query("UPDATE {feedapi_node_item} SET url = '%s', timestamp = %d, guid = '%s' WHERE fiid = %d", $feed_item->options->original_url, $feed_item->options->timestamp, $feed_item->options->guid, $feed_item->fiid);
      }
      return $feed_item;
    
    

    If we added a test on the node object returned by node_object_prepare(), we could keep it from being saved:

    function _feedapi_item_save($feed_item, $feed_nid, $settings = array()) {
      // ...
      node_object_prepare($node);
      if (!isset($feed_item->nid)) {
        node_save($node);
        $feed_item->nid = $node->nid;
        db_query("INSERT INTO {feedapi_node_item} (feed_nid, nid, url, timestamp, arrived, guid) VALUES (%d, %d, '%s', %d, %d, '%s')", $feed_nid, $feed_item->nid, $feed_item->options->original_url, $feed_item->options->timestamp, time(), $feed_item->options->guid);
      }
      // Here we add an additional condition
      else if (!isset($node->feedapi_item->skip)){
        $node->nid = $feed_item->nid;
        node_save($node);
        db_query("UPDATE {feedapi_node_item} SET url = '%s', timestamp = %d, guid = '%s' WHERE fiid = %d", $feed_item->options->original_url, $feed_item->options->timestamp, $feed_item->options->guid, $feed_item->fiid);
      }
      return $feed_item;
    
    

    This could be a way to go.

    I ve got a problem with this approach though:

    hook_prepare http://api.drupal.org/api/function/hook_prepare/5 is to be called before the node is presented on the add/edit form. We would be misusing this hook. I am not quite sure why this hook is called here anyway. Whats best-practice for creating nodes programmatically anyway?

    alex_b’s picture

    Title: add feed as node only if inside it is predefined keyword(s) » Add feed item node conditionally (e. g. if matches keyword(s))

    Better title.

    eyecon-1’s picture

    Perhaps another approach is to provide the necessary tokenization to pass the work off to workflow_ng which can compare the token to RegEx.

    The feature really is essential in order to provide feed content consistent with the site's purpose.

    alex_b’s picture

    Yet another approach would be to use a feedapi secondary parser to remove feed items from the parsed feed that shouldnt be created. Performance wise, this might be the most effective method.

    On a more general note: Up to a certain number of items to suck down, it doesn't make any sense to worry about conditional aggregation - just show only the nodes with the features you are interested in (e. g. taxonomy term A is present, taxonomy term B is NOT present). Or am I missing sth here?

    eyecon-1’s picture

    Version: 5.x-1.x-dev » 5.x-1.0-beta3

    Does anyone have this working at the feedapi-after_parse hook? Any code to share?

    Thanks!

    mustafau’s picture

    @alex_b

    Whats best-practice for creating nodes programmatically anyway?

    Following comment is written inside node.module file:

     /**
     * @file
     * The core that allows content to be submitted to the site. Modules and scripts may
     * programmatically submit nodes using the usual form API pattern.
     */ 

    The function to call is http://api.drupal.org/api/function/drupal_execute/5

    alex_b’s picture

    @mustafau: Unfortunately, going through FAPI is pretty slow. I recommend also reading up the recent discussion on the devel list in regards to this topic. There is actually a pretty neat confusion around this topic - but that's an aside and related to another issue somewhere here on the queue.

    I would like to add to this discussion that lazy node instantiation is a very similar feature and possibly something that some people on this thread are looking for: http://drupal.org/node/232587

    elgreg’s picture

    Version: 5.x-1.0-beta3 » 5.x-1.2

    I hacked a little bit at feedapi_node.module to get my own keyword filter working. First, I added a keyword field to the UI in the feedapi_node_feedapi_settings_form, second, I altered feedapi_node_save to have it check for the settings filters and loop through them. I doubt it works perfectly, but it's working for what I need to do. Hacking the module is bad, so I suppose my next step will be to copy the whole feedapi_node module, rename a lot of stuff and then use that module instead, right? I have yet to investigate how this would work if one were to add a keyword to a feed and then try to requery (my guess is that with the feed item already created, it would think that it was a duplicate.

    Also, this is only searching the feed description (well, really the $node->body, which is set to the feed's description).

    Here's the code - definitely not a patch :)

    /**
     * Implementation of hook_feedapi_settings_form().
     * If a module provides parsers and processors it MUST evaluate the $type variable
     * to return different forms for parsers and processors.
     * There might be a better term for parsers and processors than $type.
     */
    function feedapi_node_feedapi_settings_form($type) { 
      switch ($type) {
        case 'processors':
    /*... down to line 145ish */
          $form['filters'] = array(
          	'#type' => 'textfield',
          	'#title' => t('Filters'),
          	'#description' => t('A list of words (comma separated) from the main body of the RSS feed that will be searched for. For example: #blog'),
          	'#default_value' => '',
          );
    /*...*/
    }
    

    then down to the node_save

    /**
     * Create a node from the feed item
     * Store the relationship between the node and the feed item
     */
    function _feedapi_node_save($feed_item, $feed_nid, $settings = array()) {  
    /* ... down to about line 277 */
      if (isset($feed_item->feedapi_node->duplicates)) {
        foreach ($feed_item->feedapi_node->duplicates as $fi_nid => $f_nids) {
          $feed_item_node = node_load($fi_nid);
          $feed_item_node->feedapi_node->feed_nids[$feed_nid] = $feed_nid;
          node_object_prepare($feed_item_node);
          node_save($feed_item_node);
        }
      }
    /* this is the new part */
      elseif($settings['filters']) {
    	  $filters = explode(',',$settings['filters']);
      	foreach ($filters as $filter) {
    	  	if(stripos($node->body,$filter)) {
    	  		node_save($node);  
    	  	}
    	  }
      }
      else {
    		node_save($node);  
      }
      return $feed_item;
    }
    
    
    milosh’s picture

    StatusFileSize
    new2.15 KB

    I have just created a separate module that will do the filtering based on developer suggestion to use feedapi_after_parse hook (see attachment). No changes in the core code is required. The unnecessary items will be droped from item's list, so the method should be independent on processors/parsers used.

    The set-up possibilities will appear on every feed settings page and negative keywords as well as phrases can be used.

    Unresolved known bug is that the settings page appear also on FeedAPI settings page, but this does not seem to affect the module functionality.

    This module has not been tested very well, so expect surprises.

    Please drop here some comments also, whether it should be separate project instead -- if this would be the case then I would try to upload it as an official module.

    EDIT: I have moved the discussion about separate module to http://groups.drupal.org/node/11220

    Possible ToDo for the future: Storing keyword-sets together with the feed settings was the easiest solution. Ideally keyword-sets should be stored separately in order to make it possible to reuse the same set of keywords on several feeds. Separately stored keyword-sets would also make it possible to use plugin-like structure for the filtering keywords i.e. each feed could apply one or several number of keyword-sets at the same time. (Imagine that you have defined one general keyword set for negative words and several fine-tuned sets for positive words and then have a possibility to apply on your feed set no 1 + set no 3 + set no 7).

    milosh’s picture

    I have uploaded a FeedAPI Item Filter module into Drupal that solves the problem.
    See the module here: http://drupal.org/project/feedapi_itemfilter

    eyecon-1’s picture

    This looks like a great idea. I have a few questions or concerns. These will help me to make some decisions.

    Currently, I am filtering with an external script that runs after each cron run is completed. All feed items are unpublished. The script publishes the items that match. I am passing RegEx directly to mysql. I have update turned off. My assumption is that mysql is faster than php.

    The downside is obvious. With a large number of feeds, I am storing a large quantity of unpublished items that are of no interest. The upside (I think) is that I am only downloading the items once. Would I be correct that the filtering module will continue to download, parse and filter the same unwanted items until they expire from the feed?

    Have you tested this against a large number of feeds to determine if it will cause the cron run to time out?

    milosh’s picture

    I moved this discussion to here:

    http://groups.drupal.org/node/11220#comment-38370