In working with Feeds Atom, we have stumbled across a number of significant problems in the FeedsBatch class and hierarchy. They only seem to manifest themselves when processing a batch that is the same size as the feeds_max_batch_size or larger, and only when using the default FeedsNodeProcessor or a child class of it, but the bugs are actually in FeedsBatch and FeedsImportBatch.
There are two bugs in particular. First, is FeedsImportBatch::__construct(). It initializes $this->process to have 3 steps, all of which default to complete. However, it does not initialize the total items for any of those steps. But in FeedsBatch::getProgress(), then, which is called from FeedsSource::import() (line 149 in beta10), does not specify a phase. It wants to know about the entire process. getProgress() then treats the progress value for each step as a value, not as a completion flag. So if, say, you have 50 incoming items, you will get $batch->totals adding up to 50 and $batch->progress adding up to 52. 1.0 / 50 * 52 (line 81) is not == FEEDS_BATCH_COMPLETE (1), so getProgress() effectively returns "false, we're not done", because it is not calculating properly.
The workaround here is to, in your processor, set the total for FEEDS_FETCHING and FEEDS_PARSING to be the same as the progress for those values. That results in the calculation computing 52/52, as if the fetch and parse steps both had one step. This works because the batch object is being created long after fetching and parsing are being used anyway.
The real solution is to get rid of that declaration in the constructor, as it is simply flat out wrong. getProgress() could also be simplified as the logic in it is all kinds of weird. It is confusing 1 as a count of items with 1 as the value of FEEDS_BATCH_COMPLETE, which indicates the internal structure is just badly written.
The second bug is in getProgress() as well. Note this last line:
return $progress == FEEDS_BATCH_COMPLETE ? 0.999 : $progress;
That is, if we're done ($progress is 1 from the previous calculation) return... 0.99. Else return the percent done that we are. That is, calling getProgress() without a stage will never, ever return FEEDS_BATCH_COMPLETE (1), and therefore never be "done". FeedsSource::import() will compare the return value against FEEDS_BATCH_COMPLETE, get false, and never mark the processing as completed.
I have no idea what the real solution is, since that line makes no sense to me whatsoever. The workaround is to, in your processor, check to see if you have hit the magic condition where your incoming data has FEEDS_MAX_BATCH_SIZE elements in it and force the progress to be 1.
We've done both of the above workarounds in Feeds Atom in #1139360: Workaround bugs in Feeds batch handling. (We'll, we're working on them as I type this. It should be committed soon.) Both really should be fixed directly in Feeds, though.
Hopefully this bug report makes it to someone with commit access. :-)
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | feeds-batch-processing-fix-1139376-4.patch | 3.56 KB | twistor |
Comments
Comment #1
derhasi commentedPosted a comment in #712304: Batch import does not continue where it left off, instead starts from the beginning, to direct to this issue. There I allready posted a patch some time ago, but had no time to fix it or compare it with this issue ... hm ...
Comment #2
eelke commentedAny progress on this issue? This renders the module pretty useless in my opinion, because batch operations are completely broken at the moment.
Comment #3
gregglesDowngrading to normal, "makes it pretty useless" seems like a bold claim when there are nearly 15,000 people using the 6.x version alone.
If your goal was to get it fixed faster then a path toward what Larry suggests might be more valuable.
Comment #4
twistor commentedThis should fix the problems listed here while still being backwards compatible. It also greatly simplifies the logic in getProgress().
Comment #5
Crell commented