Problem/Motivation

I had a random test fail in ManageGitIgnoreTest yesterday, during standard patch review with PHP 7.1 & MySQL 5.7 -- and when I hit retest, the tests all passed.

Here's the failing job:
https://www.drupal.org/pift-ci-job/1347693

In case it goes away, here are the results:

fail: [Other] Line 0 of sites/default/files/simpletest/phpunit-555.xml:
PHPunit Test failed to complete; Error: PHPUnit 6.5.14 by Sebastian Bergmann and contributors.

Testing Drupal\Tests\Component\Scaffold\Functional\ManageGitIgnoreTest
..F                                                                 3 / 3 (100%)Command 'install --no-ansi --no-scripts' - Expected exit code: 0, actual exit code: 1


Time: 11.23 seconds, Memory: 287.00MB

There was 1 failure:

1) Drupal\Tests\Component\Scaffold\Functional\ManageGitIgnoreTest::testUnmanagedGitIgnoreWhenGitNotAvailable
Failed asserting that file "/tmp/composer-scaffold-test-bdbb6af8cfb6c321e9f0053d2e8e63f6/drupal-composer-drupal-project/docroot/index.php" exists.

/var/www/html/core/tests/Drupal/Tests/Component/Scaffold/Functional/ManageGitIgnoreTest.php:186

Proposed resolution

TBD

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#50 3067768-50.patch7.35 KBKrzysztof Domański
#50 interdiff-45-50.txt2.18 KBKrzysztof Domański
#45 3067768-45.patch7.32 KBgreg.1.anderson
#43 3067768-43.patch9.19 KBgreg.1.anderson
#43 3067768-43.patch9.19 KBgreg.1.anderson
#43 3067768-43.patch9.19 KBgreg.1.anderson
#43 3067768-43.patch9.19 KBgreg.1.anderson
#43 3067768-40-to-43-interdiff.txt598 bytesgreg.1.anderson
#40 3067768-38.patch9.25 KBMixologic
#40 3067768-38.patch9.25 KBMixologic
#40 3067768-38.patch9.25 KBMixologic
#40 3067768-38.patch9.25 KBMixologic
#38 3067768-33-to-38-interdiff.patch2.09 KBgreg.1.anderson
#38 3067768-38.patch9.25 KBgreg.1.anderson
#37 3067768-33-to-37-interdiff.txt5.49 KBgreg.1.anderson
#37 3067768-37.patch13.7 KBgreg.1.anderson
#33 3067768-25-to-33-interdiff.txt1.97 KBgreg.1.anderson
#33 3067768-33.patch9.01 KBgreg.1.anderson
#27 3067768-2-25.patch8.41 KBMixologic
#27 3067768-2-25.patch8.41 KBMixologic
#27 3067768-2-25.patch8.41 KBMixologic
#27 3067768-2-25.patch8.41 KBMixologic
#23 3067768-2-23.patch8.41 KBalexpott
#23 19-23-interdiff.txt772 bytesalexpott
#19 3067768-18.patch8.17 KBMixologic
#19 3067768-18.patch8.17 KBMixologic
#17 3067768-17.patch1.86 KBMixologic
#16 3067768-16.patch1.79 KBMixologic
#7 3067768-6.patch6.31 KBgreg.1.anderson
#7 5-to-6-interdiff.txt4.46 KBgreg.1.anderson
#5 3067768-5.patch2.13 KBalexpott
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhodgdon created an issue. See original summary.

jhodgdon’s picture

Issue summary: View changes

Noting PHP version.

greg.1.anderson’s picture

I'm not sure why the scaffold operation would sometimes fail to run. We could make this easier to diagnose by adding another assert against the output of the scaffold operation. If we asserted against the status result code + stdout + stderr all in a single assert, then we could get more data when the random fail happened again.

alexpott’s picture

@greg.1.anderson also we probably at some point want to add -vvv to the composer commands as that really helps with debugging what's going on too.

alexpott’s picture

Status: Active » Needs review
FileSize
2.13 KB

One thing that does look at bit interesting is the use of md5 + 1. The whole point of that is generate something unique BUT I have the feeling the way we're adding in time we're actually more likely to generate a clash!!! So let's use uniqid() instead. This still has problems of not always being unique but at least it is the right tool for the job.

Mixologic’s picture

11:40:17
11:40:17
11:40:17 [Symfony\Component\Console\Exception\CommandNotFoundException]
11:40:17 There are no commands defined in the "composer" namespace.
11:40:17
11:40:17
11:40:18 Drupal\Tests\Component\Scaffold\Functional\ManageGitIgnoreTe 0 passes 1 fails

^this is what was showing on the console log. seems odd that it'd throw this kind of exception.

greg.1.anderson’s picture

Additional improvements.

Mixologic’s picture

Additionally I checked all the logs since this commit, and so far this is the only instance of this test failing.

greg.1.anderson’s picture

@mixologic: That's the failure that I would expect. 'No commands defined in the "composer" namespace' can be explained by the composer install failing. I'm not sure why it failed, but such a failure could certainly be explained by conflicting temporary directories. My patch adds output to diagnose that situation, although I suspect that @alexpott's patch fixes the problem.

jhodgdon’s picture

Lucky me, I won the random test fail lottery! :)

alexpott’s picture

+++ b/core/tests/Drupal/Tests/Component/Scaffold/Functional/ManageGitIgnoreTest.php
@@ -179,10 +180,38 @@ public function testUnmanagedGitIgnoreWhenGitNotAvailable() {
+    $expected = <<<EOT
+0
+
+Scaffolding files for fixtures/drupal-assets-fixture:
+  - Copy [web-root]/.csslintrc from assets/.csslintrc
+  - Copy [web-root]/.editorconfig from assets/.editorconfig
+  - Copy [web-root]/.eslintignore from assets/.eslintignore
+  - Copy [web-root]/.eslintrc.json from assets/.eslintrc.json
+  - Copy [web-root]/.gitattributes from assets/.gitattributes
+  - Copy [web-root]/.ht.router.php from assets/.ht.router.php
+  - Skip [web-root]/.htaccess: overridden in fixtures/drupal-composer-drupal-project
+  - Copy [web-root]/sites/default/default.services.yml from assets/default.services.yml
+  - Skip [web-root]/sites/default/default.settings.php: overridden in fixtures/scaffold-override-fixture
+  - Copy [web-root]/sites/example.settings.local.php from assets/example.settings.local.php
+  - Copy [web-root]/sites/example.sites.php from assets/example.sites.php
+  - Copy [web-root]/index.php from assets/index.php
+  - Skip [web-root]/robots.txt: overridden in fixtures/drupal-composer-drupal-project
+  - Copy [web-root]/update.php from assets/update.php
+  - Copy [web-root]/web.config from assets/web.config
+Scaffolding files for fixtures/scaffold-override-fixture:
+  - Copy [web-root]/sites/default/default.settings.php from assets/override-settings.php
+Scaffolding files for fixtures/drupal-composer-drupal-project:
+  - Skip [web-root]/.htaccess: disabled
+  - Copy [web-root]/robots.txt from assets/robots-default.txt
+EOT;
+    $this->assertEquals($expected, $status . "\n\n" . implode("\n", $output));

Funnily enough I was thinking of adding a test like this for the iterable collection change I did at the very end of the scaffold patch so +1 for adding this. As I said before we really should consider adding -vvv to the composer commands because this gives way more helpful output if something does go wrong in one of our commands.

greg.1.anderson’s picture

The problem with `-vvv` in Composer commands is that I think it would be hard to do exact matches against the output, because there are a lot of local filesystem paths and variable output in very verbose mode.

For tests that do not assert on the output, or that only assert 'contains' on the output, then `-vvv` becomes viable to use.

greg.1.anderson’s picture

Tests are green; anyone care to RTBC this, or should we do additional work here?

Mixologic’s picture

Turns out this is affecting Drupal\Tests\Component\Scaffold\Integration\ScaffoldFileCollectionTest::testCreate as well.

https://www.drupal.org/pift-ci-job/1356921
https://www.drupal.org/pift-ci-job/1354328
https://www.drupal.org/pift-ci-job/1354324
https://www.drupal.org/pift-ci-job/1353539
https://www.drupal.org/pift-ci-job/1353256

We dont really have a great way of proving that a random failure is no longer randomly failing, short of running the old test a bunch of times hoping to get it to rando fail, then running the new patch a bunch of times hoping that it doesnt.

Should we take those steps here?

Mixologic’s picture

Assigned: Unassigned » Mixologic

Yeah, Im going to take those steps of reproducing the randofail, and hoping it goes away with the fix.. I'd much rather have some level of confidence that the fix we're attempting actually helps.

Mixologic’s picture

Heres just the scaffold tests, running 20 times each on 32 processors.

This should hopefully fail.

Mixologic’s picture

This one is a little bit more focused. Its *just* the Drupal\Tests\Component\Scaffold\Integration\ScaffoldFileCollectionTest

Status: Needs review » Needs work

The last submitted patch, 17: 3067768-17.patch, failed testing. View results

Mixologic’s picture

Status: Needs work » Needs review
FileSize
8.17 KB
8.17 KB

Okay, great. We've proven that we can reproduce. Lets try it with the patch from #7 and up the number of times we run this to 10x and see what happens.

The last submitted patch, 19: 3067768-18.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 19: 3067768-18.patch, failed testing. View results

Mixologic’s picture

okay, so thats a bummer. We've still got an issue even with #7.

alexpott’s picture

Status: Needs work » Needs review
FileSize
772 bytes
8.41 KB

@mixologic++ thanks for adding the testing. Let's see if this does it.

greg.1.anderson’s picture

I don't think that the extra 'ensureDirectoryExists' is necessarily helpful. If this is a race condition, it is only a race condition because two unrelated tests chose the same temporary directory name.

The temporary directory is currently named as follows:

$tmpDir = sys_get_temp_dir() . '/composer-scaffold-test-' . md5($extraSalt . microtime());

It is impossible that the same process should ever produce the same name on subsequent calls of this function. If there are parallel processes creating temporary directories, and they make a hundred in a second, then there is a one-in-ten-thousand chance that two processes will run at exactly the same microtime(). While I am not sure exactly how many temporary directories are created by these tests, or how long they take to run, it's probably something along those lines.

Maybe we should shorten the invariant part ('composer-scaffold-test-') and append some data from mt_rand() or some other source here. The chance that two pseudo-random sequences will both line up exactly should be about zero.

Whoops I wasn't looking at the patched version of the file. The current random name is generated via:

$tmpDir = sys_get_temp_dir() . '/composer-scaffold/' . uniqid($prefix, TRUE);
greg.1.anderson’s picture

Looking at the docs for uniqid, it seems that uniqid($prefix) is way less random than md5($prefix . microtime()). However, uniqid($prefix, true) appends another random value on the end of the output. The thing is, though, that the main randomness of uniqid('', false) is very poor, with very similar values being produced from one call to the next. Maybe what we should try instead is something like:

uniqueid(md5($prefix . microtime()), true);

Thoughts?

greg.1.anderson’s picture

Hm, but #23 passed. I am skeptical that it is only fixing the symptom - the temporary directory being deleted by another process, caused by the other process choosing the same tmp directory name. There could still be other issues if the same tmp directory are used by different processes.

Maybe there is an environment variable set by Drupal CI with a unique job number that could also be used here. If this environment variable were unset in other environments, it should not matter, as other environments are not likely to be as concurrent. Not sure what is available, but I imagine there is something.

Mixologic’s picture

Yeah, I think it must come down to two different scaffold tests both creating tmpdirs at the same time.

I sure wish composer didnt swallow the mkdir error so we can see whats actually happening.

if (!@mkdir($directory, 0777, true)) {
                throw new \RuntimeException(
                    $directory.' does not exist and could not be created.'
                );
            }

But #23 seems to fix it, though even at 12800 times #19 only failed 3 times, so Im gonna resubmit a few more just to be sure.

Drupalci doesnt really know what iteration of the test is running.. that'd be something run-tests.sh should pass in to its child tests.

Mixologic’s picture

Assigned: Mixologic » Unassigned
greg.1.anderson’s picture

The other thought would be for DrupalCI to set up sys_get_temp_dir() such that every test gets a separate tmp directory.

greg.1.anderson’s picture

Failing #29, the other thing we could do is have the scaffold test make itself a single unique / random prefix to use when generating all tmp directories. It would be very unlikely for two concurrent scaffold tests to get the exact same random prefix, and even if they did, it would then be only two processes in contention for directory collisions, rather than all of them.

The last submitted patch, 27: 3067768-2-25.patch, failed testing. View results

greg.1.anderson’s picture

Yep, #23 made failures less damaging, but did not prevent collisions. Working on another version of the patch.

greg.1.anderson’s picture

This patch implements the ideas from #30 and #25. I also backed out the remediation in #23 to make collisions easier to detect, as ideally we will avoid collisions rather than attempt to repair collisions.

Mixologic’s picture

So each test that is executed by run-tests.sh gets assigned a unique test-id that it gets from the database.

That test-id gets passed to the test class constructor: $test = new $class_name($test_id);

Which means, in a run-tests.sh environment we could potentially avoid intra test collisions by using that unique test_id in paths.

Either we've thought of this before, and it didnt work, or we wonder why we never thought of it before.

Mixologic’s picture

oh, the constructor assigns that $test_id to the name attribute of the TestCase base class.

So $this->name might just give us what we need.

greg.1.anderson’s picture

Updating tests to pass $this->name through to the fixtures.

greg.1.anderson’s picture

Updated per #36.

greg.1.anderson’s picture

It turns out that $this->name is not passed all the way through from drupal.org CI to phpunit, so #37 is not helpful. Here's something that should be more stable.

greg.1.anderson’s picture

Please pay no attention to "patch failed to apply", I named my interdiff wrong. :p

Mixologic’s picture

Heres another thrashing. If this is 4x green, then it 差不多 for me.

Mile23’s picture

Status: Needs review » Needs work

Just the one thing from #38:

+++ b/core/tests/Drupal/Tests/Component/Scaffold/Fixtures.php
@@ -237,31 +242,48 @@ public function destinationPath($destination, Interpolator $interpolator = NULL,
-    $filesystem->ensureDirectoryExists($tmpDir);
+    @$filesystem->ensureDirectoryExists($tmpDir);

Let's fail early on this instead of silencing the error.

In fact, mkTmpDir() might be a protected function.

greg.1.anderson’s picture

Ah I think that was a change that was treating symptoms that was left in by mistake. I agree it should come out.

greg.1.anderson’s picture

#40 with #41 fixed, tested four times.

Mile23’s picture

Status: Needs review » Needs work

LGTM. Now we need to revert the drupalci.yml parts and take care of this:

+++ b/core/tests/Drupal/Tests/Component/Scaffold/Fixtures.php
@@ -237,29 +242,46 @@ public function destinationPath($destination, Interpolator $interpolator = NULL,
+  /**
+   * Generates a persistent prefix to use with all of our temporary directories.
+   *
+   * The presumption is that this should reduce collisions in highly-parallel tests.
+   * We prepend the process id to play nicely with phpunit process isolation.
+   *
+   * @returns string
+   *   A random string that will remain the same throughout the execution of the tests.

Wrap docblock to 80. Probably other CS things, too.

greg.1.anderson’s picture

Status: Needs work » Needs review
FileSize
7.32 KB

Back to ordinary testing. Fixed wordwrap CS in #44.

Mixologic’s picture

Status: Needs review » Reviewed & tested by the community

This is solid, and should be pretty impervious to accidental collisions on the testbot in parallel environments. There was a head breaking test today on commit, so, getting this in before more disruptions happen would be good.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/tests/Drupal/Tests/Component/Scaffold/Fixtures.php
@@ -237,29 +242,47 @@ public function destinationPath($destination, Interpolator $interpolator = NULL,
+      static::$randomPrefix = getmypid() . md5(microtime());

We need to silence any errors in getmypid() - https://www.php.net/manual/en/function.getmypid.php - doens't work on windows apparently. I think this is okay but we just need to be make sure this doesn't break things. Highly concurrent testing on windows is not necessary to support.

Mixologic’s picture

I cant find anywhere that indicates that getmypid() doesnt work on windows?

greg.1.anderson’s picture

Status: Needs work » Reviewed & tested by the community

I tried it on appveyor:

https://ci.appveyor.com/project/greg-1-anderson/site-alias/builds/262389...

Test line:

$this->assertEquals('my pid', getmypid());

Result:

Failed asserting that 2028 matches expected 'my pid'.

I don't see any reason to protect getmypid(), but please LMK if you'd like changes here anyway.

Krzysztof Domański’s picture

FileSize
2.18 KB
7.35 KB

1. The $output is an array.

-    $output = '';
+    $output = [];
     exec('git --help', $output, $status);
$this->assertEquals($expected, $status . "\n\n" . implode("\n", $output));

function exec ($command, array &$output = null, &$return_var = null) {}

2. Fixed coding standard and typo.

greg.1.anderson’s picture

+1 on #50 thanks for the improvements.

larowlan’s picture

Status: Reviewed & tested by the community » Fixed

thanks folks, this was a curly one indeed

Committed 3a6e13d and pushed to 8.8.x. Thanks!

  • larowlan committed 3a6e13d on 8.8.x
    Issue #3067768 by Mixologic, greg.1.anderson, alexpott, Krzysztof...
Mixologic’s picture

Issue tags: +Composer initiative

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.