(Note: all FeedAPI 6.x versions are affected by this).
FeedAPI Processing is not fault-tollerant. A problem in any one feed can cause the entire engine to stall for days. This is because FeedAPI is creating a queue and won't move ahead until it processes every item, one by one, in the queue.
Let's say FeedAPI ran out of memory while trying to fetch or parse a feed (this actually is not uncommon, especially when using common syndication parser with drupal_http_request(). The latter seems to have some memory issues). When such event happens, two immediate issues arrise:
- Drupal cron will notice that cron run did not complete and will prevent cron to run for next hour (waaaay too long, by the way).
- As soon as cron gets permission to run again, FeedAPI will check to see what was the last feed it tried to fetch. That feed will be the one it ran out of memory on, since the fetching process never happened. FeedAPI will try to fetch the feed again, fail again and cron will stall for another hour. Same thing will happen after the next hour and so on.
You get the picture.
In our experiments something like above can halt FeedAPI for days, or even weeks, without single feed item making it through on cron run. Usually, it's one item in the feed causing the trouble and until that item is out of top 15 (if 15 is the length of the feed), it will keep causing FeedAPI to explode. If you are really unlucky - your system may be frozen for months.
In any case, we need a way to skip a feed if it causes entire system to freeze. I personally, need it ASAP, so I have two choices:
1) Contribute a patch if you guys have bandwidth to get it in real soon (preferably in several days, but definitely: before the next release, so I don't have to re-patch).
2) Write an external module that runs before FeedAPI and cleans-up its mess.
In practice (2) is easiest for me, since I could do that and move on, but since it is a pretty fundamental FeedAPI problem, I don't want to be selfish. So I guess I am mostly asking - if you guys have time to help me get the patch in, in a reasonable time, if I work on the patch, itself?
Thank you
| Comment | File | Size | Author |
|---|---|---|---|
| #6 | feedapi.module.patch | 1.74 KB | irakli |
Comments
Comment #1
alex_b commentedIrakli - there is a third option - use Feeds http://github.com/lxbarth/Feeds instead. Feeds is a full fledged successor of FeedAPI that should take care of many of FeedAPI's limitations and architectural problems.
I am working full steam on Feeds but it will still take a a couple of days to get to the finish line. At that point, not all of FeedAPI's functionality will be available, but much of it. Depending on what your requirements are, I would encourage you to use Feeds instead of FeedAPI. Poke me on IRC if you have more questions.
Comment #2
irakli commentedHey Alex,
thanks for the quick response. I saw Feeds - looks very promising. Are you planning to use the same nodes data-model as in FeedAPI or provide an update script?
Either way, not really an option, for me, right now. Considering usual release cycle of modules it will probably take couple months till even an RC, so for now I need to patch FeedAPI one way or another.
So assume - not really interested in a FeedAPI patch submission?
Thanks,
Irakli
Comment #3
alex_b commented"So assume - not really interested in a FeedAPI patch submission?"
As much as I'd like you to get on for Feeds right away, I'd love to keep refactoring jobs for FeedAPI as simple as possible. If this fix turns out to be a simple patch, more than welcome. If it turns out to be a larger task, I'd encourage a custom module.
Comment #4
alex_b commentedSorry, did not answer some questions:
"Are you planning to use the same nodes data-model as in FeedAPI"
Slightly different: configurations will be attached to a content type or not. So you can import feeds by creating feed nodes or just by using a simple import form.
"provide an update script?"
planned.
Comment #5
irakli commentedThank you, Alex.
Comment #6
irakli commentedAttached is the simplest (most minimalistic) patch that addresses the issue. This code will detect a problem with a feed, will try to process once more after an issue and if move on to the next feed, instead of stalling.
Thx