Each time cron runs, 'leech_cron_count' number of feeds get parsed.

This means, that the rate at which leech visits feeds depends on the rate of cron runs and - worse - on the number of feeds that are in the DB.

If a feed only has articles that are already stored in the DB, it is only detected after the feed is parsed.

Overall, feeds that change seldomly get "overvisited" and feeds that change frequently get "undervisited". This poor allocation of resources results in empty parsing cycles on the one hand or the possibility of data loss on the other hand, depending on the configuration of leech.

Solution:

  • Throttle the rate of visits per feed based on the average number of new items per feed.
  • Allow definition of minimum rate of visits and maximum rate of visits on the leech settings page (e. g.: min: once a day, max: every 10 minutes).
  • Recommendation of cron run rate on leech settings page (sth like once in 5 minutes).
  • Warning on watchdog and leech settings page, if timeouts or too few calls of the cron hook result in possible loss of data.