I have been playing with drupal for a long time. I was primarily drawn to it by it being able to pull in and display rss feeds from other sites. When I was running version 4.2 with the import contributed module it had a way to expire entries on it's own and remove those entries from the database. I see no such function in the 4.4 release candidate. How is drupal managing the databases on these entries.

I have want to scale my install to be able to pull in nearly 400 feeds the 4.2 version could not handle more than 100 feeds so I have only been playing around with a handful till I feel I can count on the package to clean up after itself.

This application is currently the only application available today that may be capable of allowing small sites like mine to be able to pull a select number of targeted feeds that go along with my website theme to be able to bridge my Movable Type weblog and drupal seamlessly. Sites like www.newsisfree.com and others do not have commercial software applications available.

I look forward to your feedback

Comments

Dries’s picture

I haven't tried syndicating that many feeds but scalability should be better as of Drupal 4.4.0 as we added support for conditional GETs (If-None-Match/Etag + If-Modified-Since/Last-Modified).

Secondly, day-to-day maintenance should be less of a burden as Drupal respects various HTTP headers now. For example, when a 301 (permanent redirect), 302 (temporary redirect) or a 307 (temporary redirect) is detected, the feed's URL is updated in the database instead of being broken.

Last, Drupal 4.4.0 can parse more flavors of RSS and can pick up some minimal Dublin Core (DC) metadata.

Please report back your experiences/measurements so we can further improve Drupal's built-in aggregator.

adamg’s picture

Is there a way in 4.4 to pull items in as nodes? I gather this is not something most people would want to do, but I'm using 4.3.2 to build a regional aggregator, in part to let people search through reams of blog postings. Alternatively, is there a way to extend the search module to the aggregator database in 4.4? Apologies if the answers are obvious!

Thanks!

Dries’s picture

The import module you are refering to is a fork of the import module in core -- I'm not not the one maintaining it. I don't think it has been updated.

It is not possible to search news items with the aggregator that is part of core but I could look into adding search support ... I also want to make the (composite) feed pages pageable so people may choose to keep more items in the database.

geeknews’s picture

My experience has been that when you pull nearly 100 feeds as I did in my initial test that we quickly had a database fill up with nearly 20,000 news entries in less than a week. There has to be a better way to delete old news.

There should be a variable that allows us to keep the news for x number of days on a feed by feed basis, then it is automatically purged from the database. With RSS stories the majority are permalinked and with most webhosts limiting the size of mysql databases you quickly run out storage space.

Being I am not at all a mysql expert I would hate to go into the database and delete data by hand I am unfamiliar with how each database entry is cross referenced. This could lead to database corruption.

Dries’s picture

Drupal will keep at most 50 new items per feed so if you pull 100 feeds, you will have no more than 5,000 news items instead of 20,000 (given everything behaves as it is supposed to behave).

geeknews’s picture

I was refering to the 4.2 version have not reached that high a number yet with the current version. Thanks for the clarification I had not seen that written anywhere.

When displaying the data would it not make more sense to display newest to oldest. The news is now being displayed oldest to newest. This is using the pushbutton xtemplate.

Dries’s picture

The incorrect sort order has been fixed one or two days ago. Overwrite your aggregater module with the one from a recent Drupal 4.4.0 RC tarball. It should be fixed.