Tests 173099, 173104, 173129 and 173134 appear to be "stuck". They keep getting resubmitted and canceled over and over.

Comments

pillarsdotnet’s picture

Title: Tests 173129 and 173134 appear to be "stuck" in a submit/cancel/submit/cancel loop. » Tests 173099, 173104, 173129, and 173134 appear to be "stuck" in a submit/reset/submit/reset loop.

Here's the test log for one of them. The rest are very similar.

2,212,024 Requested by test client #659. 09/17/2011 - 02:10:28
2,212,019 Test reset by client request. 09/17/2011 - 02:10:28
2,211,989 Requested by test client #659. 09/17/2011 - 01:33:43
2,211,984 Test reset by client request. 09/17/2011 - 01:33:43
2,211,954 Requested by test client #659. 09/17/2011 - 00:57:19
2,211,949 Test reset by client request. 09/17/2011 - 00:57:19
2,211,914 Requested by test client #659. 09/17/2011 - 00:20:44
2,211,609 Requested by test client #654. 09/16/2011 - 23:28:44
2,211,579 Test reset by client request. 09/16/2011 - 23:27:28
2,211,514 Requested by test client #654. 09/16/2011 - 22:50:18
2,211,509 Test reset by client request. 09/16/2011 - 22:50:18
2,211,359 Requested by test client #654. 09/16/2011 - 22:13:42
2,211,319 Test reset by client request. 09/16/2011 - 22:00:01
2,211,244 Requested by test client #664. 09/16/2011 - 21:21:03
2,211,234 Test request received. 09/16/2011 - 21:20:42
pillarsdotnet’s picture

jthorson is working on this -- he said on IRC:

[22:37] <jthorson> Looks like it was only running each test suite once ... but with 100k assertions, I think one of  the tests must have been looping.
[22:38] <jthorson> NodeTypePersistenceTestCase and PathTaxonomyTermTestCase were the bad suites.
[22:38] <pillarsdotnet> neither of those were the tests I was working on.
[22:39] <jthorson> I'll try running them through a local testbot and see what it does.
[22:39] <pillarsdotnet> so -- It didn't have anything to do with my patch -- I just was unlucky?
[22:40] <jthorson> Not sure ... It tried testing multiple times, but never completed before your re-test requests ... could be related to the patch.  I'll run it and should know within an hour if it does the same thing on my testbot.
jthorson’s picture

Status: Closed (fixed) » Active

The 'exclude' tests were up around 103k assertions when I reset the testbot ... test logs had some bizarre errors in them ...

Node title 20 passes, 0 fails, and 0 exceptions
PHP Fatal error:  Maximum execution time of 500 seconds exceeded in /var/lib/drupaltestbot/sites/default/files/checkout/modules/simpletest/drupal_web_test_case.php on line 0
FATAL NodeTokenReplaceTestCase: test runner returned a non-zero error code (255).
Node title XSS filtering 20 passes, 0 fails, and 0 exceptions

...

Enable/disable modules 1335 passes, 0 fails, and 0 exceptions
PHP Fatal error:  Maximum execution time of 500 seconds exceeded in /var/lib/drupaltestbot/sites/default/files/checkout/includes/errors.inc on line 273
FATAL PathLanguageUITestCase: test runner returned a non-zero error code (255).
Path aliases with translated nodes 64 passes, 0 fails, and 0 exceptions

Because of the high number of assertions (probably related to the two errors listed above), I believe that the test was simply taking forever to run; and was not able to complete before the re-test requests came in.

Interestingly enough, the errors don't appear to be in the same test suite each time ... but they do occur within the same vicinity each time.

173099 and 171129 jammed up testbots 664 and 659, and 173104 jammed up testbot 654 ... which may explains why the queue started backing up. Killed all three tests to free up the testbots, and then had to kill 173134 as well when 664 picked it up. All three are now reviewing other patches, so the queue should clean itself up within a couple hours.

jthorson’s picture

Unfortunately, I can't retest this on my local testbot ... yet. But you did help uncover an issue in the testbot 'run tests locally' form! ;)

pillarsdotnet’s picture

Status: Active » Closed (fixed)

Don't re-run the patches -- I figured out what I did wrong.

rfay’s picture

Status: Active » Closed (fixed)

@pillarsdotnet, when you "figure out what is wrong" please always post it in the issue! I think you know that :-)

The classic reason for this kind of thing is tests referring to environment 0.

pillarsdotnet’s picture

Sorry. The patches allowed file_scan_directory() to recurse to the parent directory if the nomatch pattern didn't exclude it. What was happening, I am now certain, is that other code was calling file_scan_directory with a nomatch option and it was generating an infinitely large list of files until PHP ran out of time or memory.

pillarsdotnet’s picture

Issue summary: View changes

Correction -- the test queue isn't overly large; just the log for each test.