import rss feeds as nodes

bb - October 24, 2002 - 13:15

hi,
i'm looking for a way to import rss feeds as nodes (linked to specific termes of the taxonomy) into my community. i've come across several posts discussing this matter and was wondering about the status quo and if there is already someone working on this.
one post mentioned a module called rss_html, but i wasn't able to find any further information about it.

thanx,
bb

Needs to be done

kika - October 24, 2002 - 14:38

If nobody beats me to it

Kjartan - October 25, 2002 - 09:53

If nobody beats me to it improving the aggregation module will be my next project after the node api, download page update, project module upgrade and phpdoc integration into Drupal. Too bad the code just doesn't write itself

--
Kjartan

I've done a bit of work on RS

rowanboy - February 28, 2003 - 22:15

I've done a bit of work on RSS into nodes in the LiveJournal import module I've written. Email me. tom@rowan.me.uk

done - needs more review

moshe weitzman - March 1, 2003 - 02:18

see breyten's wonderful import module. please post your comments/bug fixes as this module is nearing CVS readiness. the idea is to replace the current import module with these modules.

no luck here

Anonymous - May 3, 2003 - 08:34

I've been trying to get this to work, but for some reason it won't add a feed correctly. It just never completes the task.

I've been working with the feed.module for about an hour now, and I can't figure out why it's not adding the feed I submit. No errors anywhere.

Any suggestions are welcome, though I realize this is still very much in development.

John

One Solution

Robert Castelo - September 5, 2003 - 00:34

Here's how I've added RSS feed bundles as nodes:

WARNING - This is not a prity solution....

1.) Create RSS bundle of feeds

2.) note what number the bundle is by going to "news by topic", "more" of the bundle you just created, make a note of the bundle number in the URL (e.g. import/bundle/3)

3.) Create Taxonomy for your news pages

4.) Create book page node for news page

5.) Set this to PHP instead of HTML

6.) Paste the following code into the Body (change the $bid to the number of your bundle)

$bid = 3;
  $bundle = db_fetch_object(db_query("SELECT * FROM bundle WHERE bid = %d", $bid));

  echo "<h1>News</h1><h2>$bundle->title</h2>";

  $keys = explode(",", $bundle->attributes);

    
  foreach ($keys as $key) $where[] = "i.attributes LIKE '%". trim($key) ."%'";
 
  $result = db_query_range("SELECT i.*, f.title AS ftitle, f.link AS flink <br>FROM item i, feed f WHERE (". implode(" OR ", $where) .") <br>AND i.fid = f.fid ORDER BY iid DESC", 0, variable_get("import_page_limit", 75));

 
  while ($item = db_fetch_object($result)) {
 
    if (module_exist("blog") && user_access("maintain personal blog")) {
   
      $blog_icon = " ". l("<img src=\"". theme("image", "blog.gif") ."\" <br>alt=\"". t("blog it") ."\" title=\"". t("blog it") ."\" />", <br>"node/add/blog&iid=$item->iid", array("title" => <br>t("Comment on this news item in your personal blog."), "class" => "blog-it")) ." ";

    }

    if ($item->title) {
    $output .= "<h3>$item->title</h3>";
    }
   
    $output .= "<p>";
   
    if ($item->description) {
    $output .= "$item->description";
    }
     
     $output .= "<br><a href=\"$item->link\">Read More</a> | <a href=\"<br>$item->flink\">$item->ftitle</a> $blog_icon</p>";

  }

  theme("box", $bundle->title, $header);
  theme("box", t("Latest news"), $output);

<!--break-->

---- I post therefore I am ----

some eval of items as nodes module in contrib

cel4145 - September 11, 2003 - 06:08

For the last few days, I ran the
import module in contrib
which stores RSS items as nodes. Great concept.
And I'm glad that I installed it. But I'm going to have to remove it until it's
more mature. Still, I'm really looking forward to using it in the future.

With that in mind, below is some feedback which I hope will be of help in it's
futher development. One thought that informs this analysis is that a news aggregator
which can promote feed items into the regular posts is a novel concept; it's
important to consider that to successfully avoid confusion among less frequent
site users, these nodes are going to have to be very clearly demarcated.

  • The first big issue I had was getting the module to run with cron.php. I
    never successfully got the login process to work correctly as described in
    the install. As a workaround, I created a user with the correct permissions
    and hard coded the user's name and password into the code where feed.module
    checks the user. At this point, the module updated the feeds regularly according
    to the cron settings.
  • Once the cron started processing regularly, I found a few issues with the
    effect updates were having on site content. The cron run overrides any changes
    that I might have made to an individual node, such as adding a teaser break
    comment to change the way that the item displays or tagging the item manually
    with taxonomy terms. This is an interesting issue since on the other hand,
    if an incoming item has changed on the originating site, then Drupal should
    update it. And it's good that site admins cannot easily change text incoming
    via RSS; those texts are copyrighted and controlled by the originating site
    and should, I think, be controlled by them. This avoids potential intellectual
    property and misrepresentation problems.
  • When an author is specified in the RSS, Drupal applies it to the node, using
    it in place of where a site user is normally specified as the author with
    each post. Otherwise, Drupal uses "Anonymous." From a users standpoint,
    since the node doesn't clearly indicate the originating site, this could be
    confusing, especially when the incoming feed is a community site where many
    different authors could be included in the feed items. I think it would be
    better to use the RSS feed title (which can be modified by the admin). If
    an author is specified in the feed, that is additional information which should
    be included somehow either preceding or following the node content.
  • When the RSS item is long enough that it sets off the automatica teaser
    implementation, the node gets two "read more" links, one going to
    the full node view and one back to the post on the original site (this is
    mentioned in BUGS in contrib). One quick solution in the short term could
    be to change the link to the original site to "original post."
  • Depending on the type of site, the node listing display of feed items may
    actually be more cumbersome than the curent box display for the core import
    module. When comparing both, I much prefer the more compact version in the
    standard news aggregation display for my daily news reading. Perhaps this
    should be the default still for import.module, then there should be some other
    way of accessing the node listing view. Easy enough for a site admin to do
    as it is now by merely tagging all items with a taxonomy term and then using
    the taxonomy url.
  • Kairosnews is subscribing to 19 RSS
    feeds, some of which are pretty active. All of the item nodes are picked up
    by tracker module. View recent posts is fairly useless with incoming RSS items
    great outnumbering the updates to the site from site users. Seems that there
    might need to be away of specifying which node types tracker module tracks.
  • Notify.module is also picking up the RSS items. Probably needs similar configuration
    option.
  • Now, being able to remove items from tracker and notify creates a particular
    problem for feeds which are promoting their feeds, or have been promoted by
    the admin, to the front page. One would assume that front page promoted items
    are deemed important by the site admin and should probably still be included
    in tracker displays and notifications. One solution might be to convert items
    that are promoted to stories, rather than listing them as items. Of course,
    then they would no longer show up in item displays. Or the tracker and notification
    modules could be configured to include posts on the front page and any other
    node types specified. Sort of a difficult issue whichever way that it's looked
    at.
  • On drupal-devel, I suggested that feeds
    should be stored as users
    , not nodes. The obvious advantage is that permissions
    could then be applied to feeds. This potentially allows permissions to be
    set which would allow Drupal sites to to exchange nodes and have them posted
    as the specific node type: forums, stories, events, images, etc. Or allow
    taxonomy terms to be applied. Meanwhile, it would also automate the process
    of having the feed name listed in the author field link directly to a feed
    information page.

Any thoughts?

Alternative approach

Dries - September 11, 2003 - 07:43

Thanks for sharing your experience.

For sake of curiosity; would life be better when news items were not nodes, but when there was a "nodify"-link (so to speak)? The "nodify"-link would transform a 'RSS news item' into a 'news item node'.

One could either transform all news items to nodes or just a subset like the news that you want your users to comment on or that you feel is worth promoting to the front page. For one, the tracker page and the node administration page would be less polluted, and the notify module would not need to be altered.

Furthermore, Drupal could add a 'synchronize news item'-link to the node when it detects that the news item has been altered in the RSS feed.

I haven't really given this much thought, I'm just thinking up load, but I think this approach could counter quite a few of the shortcomings you mentioned. Give it some thought.

re: Alternative approach

cel4145 - September 11, 2003 - 14:17

I think you are right. I think for many sites, keeping the functionality of the exisiting import module while creating a method for promoting certain items to nodes would be the best way to go. Specific feeds could be tagged for conversion automatically and there could be a radio button or checkbox on the Administration » content syndication » news aggregation » tag news items menu.

And instead of thinking along the lines that converted nodes should be promoted to the front page automatically, maybe they should be converted to nodes and then be subject to the default content publishing settings. This would better parallel the existing node posting/publishing system and process that sites are already following. Some sites might want those nodes to enter moderation, or might not want comments, to be consistent with posts by members on site. Then, once they are converted, an admin could always change those settings on a per node basis, in the same way that they can be changed now.

But then which type of nodes? It almost seems like their should be a choice between stories or forum topics since these are the two community posting node types available with Drupal. Choosing one or the other would exclude different community sites. Incorporating that thinking also leaves room for eventually specifying the node type in incoming feeds from other Drupal sites.

Then, what to be done on the import page? Should promoted nodes still be listed as news items? Or should their be a place holder with a link to their node url? I'm in favor of a link with the news item to the node.

As for synchronizing imported nodes, I almost think this should be a global default setting. Admins/Drupal communities are probably going to want to control this depending on their own views about depublishing. And, in many cases, an imcoming RSS feed item is itself a teaser. So in most cases, the converted node becomes a permanent item in the site, and provides opportunity for commenting, moderation, taxonomy, etc., but users are typically still going to need to follow the feed item link to the original post on the originating site (does this make sense?). Now it might be nice to have an update notice which appears with the node when the original post has been updated, but of course this creates overhead for Drupal to watch the nodes and feed items, as well as can this be watched indefinitely?

One other thought. Probably good to have a global switch in the aggregator admin to be able to ignore teasers. This plays into the idea that incoming RSS items are often teasers--why should they be reteasered again? Also would help usability since it eliminates that "read more" "read more" duplication if site admins wish.

unpublished nodes

moshe weitzman - September 11, 2003 - 19:26

I think you get the same effect (no bad interaction with Tracker and Notify) by importing new items as unpublished nodes.

thank

majianlinwz - May 13, 2008 - 17:45

thank you

__________________
http://wehosting.blogspot.com

 
 

Drupal is a registered trademark of Dries Buytaert.