Download & Extend

Batch API never terminates if you set $context['finished'] > 1

Project:Drupal core
Version:6.x-dev
Component:base system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed (fixed)
Issue tags:dc2010 code sprint, Needs tests

Issue Summary

While working on #597484: Use the Queue API to fetch available update data, I discovered that Batch API freaks out if $context['finished'] > 1. Hat tip to chx for asking in IRC "wonder what our little batch does if you set finished to 102%" (paraphrase). Anyway, the answer is "it never terminates". ;) Our Batch API unit tests are woefully insufficient to catch this. All they do is ensure that _batch_api_percentage() returns valid output on valid input. Great. Hurray for unit testing! Testing individual functions is so much more important than seeing if the end-to-end system works -- and it's easier to debug this way!... yeah, right. ;)

Anyway, this isn't just a case of "core shouldn't baby sit broken code". If you're using batches for draining queues, and queues can be populated in other threads, this is a potentially common case. It'd be nice if unexpected input didn't trigger an infinite loop, especially since there's basically 0 performance cost in getting this right.

Comments

#1

Status:active» needs review

Here's the trivial patch to fix the bug. It's 4:25am here, so I don't have time to try to roll a test. That can wait until post 10/15, anyway...

AttachmentSizeStatusTest resultOperations
600836-1.batch-never-finishes.patch933 bytesIdlePassed: 13643 passes, 0 fails, 0 exceptionsView details

#2

Status:needs review» reviewed & tested by the community

Well, that's just good. Testing indeed can wait, and often while writing upgrades I was wondering what happens here.

#3

Status:reviewed & tested by the community» needs work

Um. Why can testing wait? This is exactly the kind of edge case that screams for a test.

#4

Issue tags:+dc2010 code sprint

At 4/22 code sprint. Writing test.

#5

Status:needs work» needs review

Re-rolled patch against HEAD; added test for $context['finished'] > 1 to batch.test.

AttachmentSizeStatusTest resultOperations
batch_large_percentage.patch7.14 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,944 pass(es).View details

#6

Re-submitting cleaner patch

AttachmentSizeStatusTest resultOperations
batch_large_percentage.patch6.92 KBIdlePASSED: [[SimpleTest]]: [MySQL] 19,939 pass(es).View details

#7

Status:needs review» reviewed & tested by the community

The batch test finishes nice and quick. Tests look good.

#8

Status:reviewed & tested by the community» needs work

+++ includes/batch.inc 22 Apr 2010 21:06:14 -0000
@@ -203,7 +203,7 @@ function _batch_progress_page_nojs() {
-    if ($percentage == 100) {
+    if ($percentage >= 100) {

I don't think this is actually related. This is the no-js equivalent of the == 100 check in progress.js

Powered by Dreditor.

#9

re-rolled based on yched's comment; also modified all the paths in hook_menu() for style.

It was noticed that the tests use usleep(). This is necessary because the batch needs time to finish for the test to work correctly. The total amount of time this adds is something on the order of 100ms; both necessary and negligible.

AttachmentSizeStatusTest resultOperations
600836-timcosgrove-batch-never-finishes.patch17.77 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 600836-timcosgrove-batch-never-finishes.patch.View details

#10

+++ modules/simpletest/tests/batch_test.callbacks.inc 22 Apr 2010 23:53:06 -0000
@@ -12,6 +12,7 @@
   // No-op, but ensure the batch take a couple iterations.
+  // batch needs time to run for the test, so sleep a bit.

@@ -52,6 +54,21 @@ function _batch_test_callback_2($start,
+  // batch needs time to run for the test, so sleep a bit.

Capitalization (2 occurrences of this comment)

+++ modules/simpletest/tests/batch_test.callbacks.inc 22 Apr 2010 23:53:06 -0000
@@ -32,7 +33,8 @@ function _batch_test_callback_2($start,
   for ($i = 0; $i < $limit && $context['sandbox']['count'] < $total; $i++) {
-    // No-op, but ensure the batch take a couple iterations.
+  // No-op, but ensure the batch take a couple iterations.
+  // batch needs time to run for the test, so sleep a bit.

wrong indent :-)

+++ modules/simpletest/tests/batch_test.callbacks.inc 22 Apr 2010 23:53:06 -0000
@@ -52,6 +54,21 @@ function _batch_test_callback_2($start,
+  $context['finished'] = 3.14; ¶

no space at the end of lines

+++ modules/simpletest/tests/batch_test.module 22 Apr 2010 23:53:06 -0000
@@ -52,41 +52,49 @@ function batch_test_menu() {
-}
+} ¶

same

Powered by Dreditor.

#11

Status:needs work» needs review

Reroll with a few style changes. There was some code in color.module in the patch from #9 - I removed that assuming that it got in there by mistake :)

AttachmentSizeStatusTest resultOperations
600836_11.patch16.05 KBIdlePASSED: [[SimpleTest]]: [MySQL] 25,365 pass(es).View details

#12

Status:needs review» reviewed & tested by the community

Thanks, looks good to me.

#13

#11: 600836_11.patch queued for re-testing.

#14

#11: 600836_11.patch queued for re-testing.

#15

Status:reviewed & tested by the community» fixed

Committed to CVS HEAD. Thanks.

#16

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#19

Version:7.x-dev» 6.x-dev
Status:closed (fixed)» patch (to be ported)

I know this is a big no-no to re-open a closed issue but I'm frankly surprised it was never discussed that this needed to be backported to Drupal 6, especially considering that I've now encountered this bug with XML sitemap. I double checked and we should apply this to D6 since the node access rebuild op could potentially run in an infinite loop.

#20

Status:patch (to be ported)» needs review

Re-rolled for D6.

AttachmentSizeStatusTest resultOperations
600836-batch-prevent-infinite.patch480 bytesIdlePASSED: [[SimpleTest]]: [MySQL] 190 pass(es).View details

#21

Status:needs review» reviewed & tested by the community

sure.

#22

Status:reviewed & tested by the community» fixed

Looks simple and good, committed, thanks.

#23

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.