Problem/Motivation
private function prepareDatabasePrefix() {
$suffix = mt_rand(1000, 1000000);
$this->siteDirectory = 'sites/simpletest/' . $suffix;
$this->databasePrefix = 'simpletest' . $suffix;
This code in no way guarantees that the test suffix is unique. When running with a concurrency of 8 and a mixed of DrupalUnitTestBase (fast) and WebTestBase (slow) classes we have a high chance of collision.
See https://drupal.org/comment/8469693#comment-8469693 for examples of where this might have occurred... specifically https://qa.drupal.org/pifr/test/724593
rmdir(sites/simpletest/488385): Directory not empty<pre class="backtrace">rmdir('sites/simpletest/488385') drupal_rmdir('sites/simpletest/488385') file_unmanaged_delete_recursive('sites/simpletest/488385', Array) Drupal\simpletest\TestBase->restoreEnvironment() Drupal\simpletest\TestBase->run() simpletest_script_run_one_test('268', 'Drupal\field\Tests\reEnableModuleFieldTest')
Proposed resolution
Using the locking system in the parent side to guarantee uniqueness.
Remaining tasks
Review patch.
User interface changes
None
API changes
None
Comments
Comment #1
bzrudi71 commented@alexpot Do you think it's possible to shorten the 'simpletest' prefix to something like 'st' while working on this? E.g.
$this->databasePrefix = 'st' . $suffix;It should have no impact on the uniqueness of databasePrefix but could/might help because of #998898: Make sure that the identifiers are not more the 63 characters on PostgreSQL which leads to broken PostgreSQl tests. This will not fix all and everything in PostgreSQL test, but may help a bit ;-)
Comment #2
bzrudi71 commentedOn the other hand, patch(es) from #998898: Make sure that the identifiers are not more the 63 characters on PostgreSQL doesn't look that bad. I think I will do a D8 port and see if it works to prevent the 63 chars limit all over...
Comment #3
berdirWrote a quick script to simulate a test run with a concurrency of 8 and then checking how long it takes until it conflicts:
https://gist.github.com/Berdir/8934380
While the mean value is usually between 100-150k loops for 100 runs, There are cases with <1000 loops and I had one that conflicted already after 58 loops.
So yes, this a problem and apparently more likely than you'd think.
Code looks fine, let's try if this helps with those weird, random missing table test cases and similar cases.
Note: This runs in a single process, not separate ones but I doubt that would change much?
Comment #4
sunWhy all this lock stuff?
We can simply check whether the generated test site directory exists already in a while loop.
Comment #5
sunHere. :)
Comment #6
sunIn addition, I always wondered why the range of possible random values varies so much in the first place?
1,000-1,000,000 → 100,000-999,999
That certainly contributes to the issue here, since often times I'm getting test prefixes of just 5 digits.
Comment #8
berdirWe just discussed that the dynamic prefixes might actually be the biggest problem here, not direct conflicts.
If one test gets the prefix "5000", then once that is completed, it will do the wildcard table delete things, which will match any suffix that starts with 5000, which means 5000* and 5000**, so it can conflict with up to 110 other suffixes. My guess is that those are the missing table cases, and "already installed" that sometimes also happen would be the exact match. But that's just a guess :)
Comment #9
sunIndeed, this "edge-case" apparently seems to happen way more often than we imagined — just in https://qa.drupal.org/pifr/test/725078 only, it happened 4x times.
The latest/final patch resolves two discrete problems:
The generated random database table suffix is not necessarily unique due to concurrency.
→ Check whether a test site directory exists already, and if so, generate a new one.
The possible range of random numbers can result in a shorter database table prefix, but since
TestBase::restoreEnvironment()uses the value as a wildcard prefix for finding database tables, the database tables of other tests (using the same prefix * 10 or * 100) are possibly deleted, too.→ Ensure to use a fixed length of digits for the database prefix.
Would be great to commit this patch ASAP.
Comment #10
berdirAgreed, this is an easier fix and the mt_rand() change is actually the most important part here. Sometimes I wonder why it takes us so long to figure out stuff like that ;)
Yes, let's get this in ASAP.
Comment #11
alexpottI disagree with checking the file system as a means of guaranteeing uniqueness. Whether this actually works in a concurrent testing situation will depend on what file system you use. There is a reason will libraries such as this exist https://github.com/unlox775/File-NFSLock-php.
Patch incoming.
Comment #12
alexpottPatch which makes the same change to
mt_rand().Comment #14
alexpottAfter a long discussion in IRC @sun and specifically @Berdir convinced me that my concerns about clustered file systems are going to be a lot bigger than just this. The patch in #6 is RTBC.
Comment #15
webchickLooks good here. Here's to fewer random-ass test failures. :)
Committed and pushed to 8.x. Thanks!