Hello gang.
I have a website running Drupal 5.1. It's a portal site that aggregates RSS feeds from ca. 100 different websites.
It was sucessfully running on 4.7. since last fall without a hassle, but as soon as I upgraded the website it slowed down immencely. But more than that everyday there are duplicate entries recorded in the database.
I keep on deleting them, but new ones appear (not from the same entries as last time).
I'm thinking it might be because the website is so slow after the upgrade, leading to more than one cron job running simultaneously.
I'm using a shell script to run the cron.php file outside of wget, as I was informed that *that* might ease the task of updating that many different feeds.
the cron task runs every 10mins and the update frequency of the different RSS feeds differ from 30mins upto 12hrs. But probably ca 75% of the feeds are updated every 3hrs or faster.
oh and the website: http://tidarandinn.is (note, it's in Icelandic, so you won't understand much, but you will see the duplicates and feel the lack of speed).
BTW, the slow speed was ocurring before I changed the theme to display the feed title of any given feed (that required a small hack of the aggregator.module).
Any suggestions?
Comments
I'd say there is a very good
I'd say there is a very good chance that your crontabs are "overlapping". And when this happens, processes have the potential to "pile up". I would suggest dropping that to running once per hour, and that should help.
If you're running cron.php via a shell script (which I'm assuming is called via UNIX's cron facility), you need to implement some sort of locking in that script to keep multiple instances running.
-- Doug
--
Douglas Muth, Philadelphia, PA
http://www.dmuth.org/
Unacceptable time delay
Thank you Doug,
but running cron only once per hour would yield in two things:
a) there would be an unacceptable time delay of publishing the most recent entries from the feeds
b) there would be much higher percentage of the feeds running per each cron run, which again would put too high strains on the server.
All of this was running fine before the upgrade to 5.0. And at that time the site became slower than the most patient men will ever handle. Is this just a case of acceptance, that 5.x is *that* much slower than 4.7??
I mean, running a cron tab every ten minutes, which updates ca. 15 different websites most of the time (some updates are of course more busy than others) ... with a regular speed of things - there really shouldn't be any overlapping.
I used to run fewer times pr. hour, but that would result in too high number of feeds per cron run, so my sysadmin recommended that I'd do this, more freuqent and distribute the update frequency.
The shell script is calling cron.php directly, rather than via wget.
if you want, i can post the actual lines in the cron script if that matters at all (only three lines).
/elfur.is
/elfur.is
Seeing this myself -
Seeing this myself - sometimes not repeated, usually twice and sometimes in triplicate! Does anyone have an answer for this (it seems to have been posted about quite often, but I cannot see a solution posted).
-- http://www.inventionmail.com --
-- http://www.inventionmail.com --
Hey there I'm developing a
Hey there I'm developing a similar site like the one you have on the link and I'm experiencing the same problem, where you able to fix the issue? And also if you don't mind would like to hear how you where able to include different feeds and have so many for different subjects, did you use views? I'm currently using the panels module and I just included one view with show latest feed items selected, what I would like to do is include 3 different feeds sources for each subject and have 4 subjects.
Thanks for any help you may provide.
Try setting different update intervals
To solve most of this issue, you can set different update intervals with 2 hours for the most updated sites. And give other feeds 3 hours or more.
If you set all feeds to update at the same intervals, you will get these duplicate content issues.
Settings each feed at different intervals reduces the possibility especially using poormansron module.
I had this issue at lot when I first started aggregating back in 2005 and now I seldom see them. Although they do occur from time to time. But a lot less.
I have had for quite some time 40 different feeds in Drupals built in aggregator about Danish politics. The most updated get updated every 2 hours and the least every 24 hours. The rest are spread out in intervals between these two settings.
Hope this helps :)
Even a turtle reaches it´s goal...
I know it's not perfect, i
I know it's not perfect, i know i shouldn't hack core modules, but i was facing the same problem and i finally solved it by editing aggregator.module on line 290:
i just added "GROUP BY guid" to the SQL SELECT statement:
$result = db_query_range('SELECT * FROM {aggregator_item} WHERE fid = %d GROUP BY guid ORDER BY timestamp DESC, iid DESC', $feed->fid, 0, $feed->block);duplicate feed titles for one of my feeds and not the other
Had duplicate feed titles for one of my feeds and not the other. Duplicates appeared today, after 5 days of looking fine.
Removed all items. Reset update interval to 3 hours (the other feed is for 2 hours). Cleared cache. Updated feed manually.
So far, duplicates are gone - feeds look fine. Will be able to tell if this worked in a few days.