I've been using Simplefeed (with a few hacks) to aggregate blog feeds to Blogotariat but was plagued by an unacceptably high level of duplicate items. There will always be some, for example when the original author updates the post, but most were simply duplicates.

I found this reference today http://drupal.org/node/150972 about wget calling cron.php up to 20 times in a row. It seems that a call may not be finished before it is called again and that this will generate lots of duplicate items.

I modified my server host cron job to:

wget -O - -q -t 1 http://www.example.com/cron.php

So far everything seems to have settled down and duplicate items have stopped flooding in.

Just thought people might want to know this trick if they are experiencing abnormal numbers of duplicates. Worth a try at least.

Comments

shane birley’s picture

I have been experiencing duplicate items as well. It only started with the latest commit from what I can tell.

m3avrck’s picture

Status: Active » Fixed

Thanks! I have updated the readme to have this tip too.

Also, as of Oct 3 all feed checking has been significantly rewritten to not import duplicates anymore, thanks!

Anonymous’s picture

Status: Fixed » Closed (fixed)