I am building a site with a lot of feeds that will be updated every few hours. Using the cron job this would mean that all feeds will be updated in one batch when running the cron possibly leading to a spike in server load.
Is it possible to have cron run every 15 minutes and update only part of the feeds. For example, i have 100 feeds that update every 2 hours. COuld I make some kind of adjustment to have cron update 1/8 of the 100 feeds every 15 minutes.
Whether this is necessary also depends on the process that feedapi and cron currently use. Does it split the load already by working with batches to decrease server load or does it update all feeds in one batch leaving the server in distress when some kind of server limit is hit?
How do you guys handle server load if your drupal installation handles many feeds in a cron job?
Comments
Comment #1
zarkinfrood commentedIt seems in the settings the only option is to adjust how much of the cron job is dedicated to updating feedapi settings.
We have 40 feeds and backed down the FeedAPI cron to 25% and now run the cron more frequently, about every 15 minutes. This seems to optimize the overall performance without affecting other usability.
However, it does seem there is an issue with some of our feeds, not sure if it is related to these settings. Empty nodes are created and tends to cause these feeds to hang up and need to be completely reset to accept any new information.
Comment #2
alex_b commentedFeedAPI updates as many feeds as possible in the given time (% setting on admin/settings/feedapi). It does not update a feed more often than once in 30 minutes.
You can run cron even more often than once in 15 minutes. We built systems with more than 2000 news feeds for managingnews.com . We moved search to an external indexer (Apache Solr) and wrote a simple shell script that hits cron.php as often as possible. When there is nothing new to aggregate, cron.php returns fairly quickly.
At these heights, aggregation becomes computationally expensive. We noticed bottlenecks when writing to the database. Up to a certain point you can throw hardware at the problem, beyond that, you should probably think of an aggregation architecture outside of Drupal.
It might be worth checking out FeedAPI Flat&Fast https://svn3.cvsdude.com/vvc/devseed/sandbox/alex/feedapi_fast/ - a single record per feed item implementation of a FeedAPI processor. We're not using it yet in production, but it should work fine.
Comment #3
malukalu commented"It might be worth checking out FeedAPI Flat&Fast https://svn3.cvsdude.com/vvc/devseed/sandbox/alex/feedapi_fast/ - a single record per feed item implementation of a FeedAPI processor. We're not using it yet in production, but it should work fine."
Would LOVE to find out more about this. The link seems to be dead? Im trying to build a site aggregating 1000s of feeds and cant even get my site to complete a cron run with about 40 feeds. Im running cron every 30 mins on one of my sites and every hour on the other. I constantly get message in my logs about attempting to run cron while its already running, cron has been running for more than an hour, etc.
Would appreciate any help you could provide...
Comment #4
alex_b commentedSorry, we restructured our repository. Here you go: http://svn3.cvsdude.com/vvc/devseed/sandbox/drupal-6/feedapi_fast
If you're trying to aggregate many items, I really recommend using feedapi_fast instead of feedapi_node. We've done massive aggregation with http://www.managingnews.com . You'll find yourself quickly in a situation where you need serious server hardware. The bottleneck is going to be your DB server. Make sure you're working on a well tuned machine with lots of memory.
Comment #5
alex_b commentedClosing due to extended period of inactivity and no actionable items left. Also, this thread may be a duplicate of #370318: Keep exceeding cron time with FeedAPI
Comment #6
Plazmus commentedIf I would like to aggregate a big number of items from 2000+ feeds you suggest to use feedapi_fast which is no doubt faster, but what about all possibilities that nodes give us like: flagging, voting, commenting, using taxonomy ?
One of the solutions I think might be promoting fast items to nodes after meeting certain criteria or just manually selected items like "Aggregator item promotion" module does.
Also how can I display fast items? Should I use custom module as it's not exposed in views currently?
Having 2000+ feeds I think there should be also page that will allow to manage all of them, at the moment feedapi doesn't show which feeds are i.e. broken with statuses 404,500 etc or which feeds were not updated for a long time (probably feed was abandoned).
How do you cope with managing feeds on managingnews.com Alex, website looks absolutely brilliant and if it does what it says it's just amazing job?
Currently I have a test machine with 2800 feeds (I'm sure some of them are broken,but as there's no facility to check it I have no numbers) all items are saved as nodes I don't have a problem with memory as it consumes max 100MB per cron run, but it looks like it only updates 1500 feeds, so it doesn't go through all of them. I have a "Cron time for FeedAPI" set to 75% and I run cron every 4 minutes.
Comment #7
Plazmus commentedI understand that this is closed thread, but maybe someone can give me a feedback, please.