I've been using Simplefeed (with a few hacks) to aggregate blog feeds to Blogotariat but was plagued by an unacceptably high level of duplicate items. There will always be some, for example when the original author updates the post, but most were simply duplicates.
I found this reference today http://drupal.org/node/150972 about wget calling cron.php up to 20 times in a row. It seems that a call may not be finished before it is called again and that this will generate lots of duplicate items.
I modified my server host cron job to:
wget -O - -q -t 1 http://www.example.com/cron.php
So far everything seems to have settled down and duplicate items have stopped flooding in.
Just thought people might want to know this trick if they are experiencing abnormal numbers of duplicates. Worth a try at least.
Comments
Comment #1
shane birley commentedI have been experiencing duplicate items as well. It only started with the latest commit from what I can tell.
Comment #2
m3avrck commentedThanks! I have updated the readme to have this tip too.
Also, as of Oct 3 all feed checking has been significantly rewritten to not import duplicates anymore, thanks!
Comment #3
(not verified) commented