Problem/Motivation

A client project is using Drupal\FunctionalJavascriptTests\WebDriverTestBase to test a React Application which has a Javascript backend. The test frequently fails on the CI service and it appears to be random.

Debugging this has shown that this is because ChromeDriver is emit emitting a stale element reference: element is not attached to the page document error.

It turns out that the retry process in \Drupal\FunctionalJavascriptTests\WebDriverCurlService gets in the way of wait functions in \Drupal\FunctionalJavascriptTests\JSWebAssert(). Here's why:

  • If the waitFor* functions we call a function like $page->getText()
  • The getText() method end up making 2 curl calls - the first to find the element and the second to get the text from the element.
  • The first call is successful and returns an element locator to use for the second request
  • In the meantime JS has replaced this element with different one that still matches the old selector
  • Then the second call fires and retries 10 times - there's no way the element can come back and then triggers a CurlExec exception
  • Since there's an exception the wait won't try again because we're process the exception and bailing from the test.

Proposed resolution

  • Disable retries in WebDriverCurlService when waiting
  • Catch CurlExec exceptions when waiting

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alexpott created an issue. See original summary.

alexpott’s picture

Status: Active » Needs review
FileSize
2.93 KB
alexpott’s picture

I need this for 8.9.x so posting a patch here to share...

alexpott’s picture

Adding the logging from D9 to the D8 patch.

alexpott’s picture

Backporting more of the improvements for D9.

longwave’s picture

Status: Needs review » Reviewed & tested by the community

Looks like a sensible fix, hopefully it will solve the random stale element errors we have seen in core as well.

  • catch committed 0f2313a on 9.2.x
    Issue #3211164 by alexpott: Random errors in Javascript Testing
    

  • catch committed 2cfb999 on 9.1.x
    Issue #3211164 by alexpott: Random errors in Javascript Testing
    
    (cherry...

  • catch committed 15d5418 on 8.9.x
    Issue #3211164 by alexpott: Random errors in Javascript Testing
    
catch’s picture

Version: 9.2.x-dev » 8.9.x-dev
Status: Reviewed & tested by the community » Fixed

Committed 0f2313a and pushed to 9.2.x, cherry-picked to 9.1.x. Also committed the 8.9.x patch since makes sense to improve test stability there too. Thanks!

alexpott’s picture

Status: Fixed » Needs work

I think we should revert this for two reasons. One, it has broken PHP 7.0 on Drupal 8 and two...

After much sleuthing finally worked out what is going on. It turns out that the retry process in \Drupal\FunctionalJavascriptTests\WebDriverCurlService gets in the way of wait functions in \Drupal\FunctionalJavascriptTests\JSWebAssert(). Here's why:

  • If the waitFor* functions we call a function like $page->getText()
  • The getText() method end up making 2 curl calls - the first to find the element and the second to get the text from the element.
  • The first call is successful and returns an element locator to use for the second request
  • In the meantime JS has replaced this element with different one that still matches the old selector
  • Then the second call fires and retries 10 times - there's no way the element can come back and then triggers a CurlExec exception
  • Since there's an exception the wait won't try again because we're process the exception and bailing from the test.

The solution is to catch the exception in the wait functions and retry - without doing our retries in WebDriverCurlService as these are pointless because we're ready to wait already.

This change results in a test suite for a site I'm working on going from an almost certain random fail to no random fails on repeated runs of the JS tests.

Bumping this to critical because I'm pretty sure that this affects core too - as per the linked issues.

alexpott’s picture

It broke PHP 7.0 because

> Drupal\Core\Composer\Composer::upgradePHPUnit
PHP Parse error:  syntax error, unexpected 'const' (T_CONST), expecting variable (T_VARIABLE) in /var/www/html/core/tests/Drupal/FunctionalJavascriptTests/WebDriverTestBase.php on line 45

Thanks @mglaman for pointing this out.

  • catch committed e5c0783 on 9.2.x
    Revert "Issue #3211164 by alexpott: Random errors in Javascript Testing...

  • catch committed 3f05465 on 9.1.x
    Revert "Issue #3211164 by alexpott: Random errors in Javascript Testing...

  • catch committed 5b07dc1 on 8.9.x
    Revert "Issue #3211164 by alexpott: Random errors in Javascript Testing...
tim.plunkett’s picture

Priority: Major » Critical

Per the last line of #11

alexpott’s picture

Version: 8.9.x-dev » 9.1.x-dev
Status: Needs work » Needs review
FileSize
5.46 KB

This patch attached fixes random fails on my client project and makes a suite of JS tests that previously were very unreliable pass even on slow infrastructure.

alexpott’s picture

alexpott’s picture

Issue summary: View changes

Updating issue summary with the actual bug and solution.

The last submitted patch, 18: 3211164-2-18.test-only.patch, failed testing. View results

alexpott’s picture

Hmmm.... so this will not fix #3203712: [random test failure] EntityDisplayTest::testExtraFields() or #3099427: [random test failure] FieldLayoutTest::testEntityView() unfortunately. Those fails are not part of the wait system. They are similar though... they are both occurring as part of \Behat\Mink\Element\NodeElement::dragTo() ... this runs a series of webdriver commands and eventually the element is no longer there for some reason. We need to check the destination element we're using on those issues and try to work out why it might be replaced.

The reason the client project was failing so much is because we're testing a React application that makes a lot of updates to the DOM including replacing the entire tree. This change will make testing such applications with Drupal's test framework and as the test shows this change makes our waits much more robust.

alexpott’s picture

longwave’s picture

Priority: Critical » Major
Status: Needs review » Reviewed & tested by the community

The test proves that this does solve the problem even if it's not one that seems to affect core, it might still make our tests a bit more reliable.

Downgrading from critical as this doesn't affect the core tests that we thought it did.

The last submitted patch, 18: 3211164-2-18.test-only.patch, failed testing. View results

catch’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/tests/Drupal/FunctionalJavascriptTests/WebDriverCurlService.php
    @@ -14,6 +14,31 @@
     
    +  /**
    +   * Flag that indicates if retries are are enabled.
    +   *
    

    Nit: are are

  2. +++ b/core/tests/Drupal/FunctionalJavascriptTests/WebDriverCurlService.php
    @@ -14,6 +14,31 @@
    +   * This is useful if the caller is implementing it's own waiting process.
    +   */
    +  public static function disableRetry() {
    +    static::$retry = TRUE;
    +  }
    +
    

    Shouldn't this be FALSE?

  3. +++ b/core/tests/Drupal/FunctionalJavascriptTests/WebDriverCurlService.php
    @@ -22,7 +47,7 @@ public function execute($requestMethod, $url, $parameters = NULL, $extraOptions
         $retries = 0;
    -    while ($retries < 10) {
    +    while ($retries < (static::$retry ? 10 : 1)) {
           try {
    

    Think it'd be a bit clearer to assign this to a $max_retries variable outside the while.

alexpott’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
1.17 KB
8.51 KB

Thanks for the review @catch. The disable retries is to allow us to return to the wait in the \Behat\Mink\Element\Element::waitFor() earlier and not wait twice. So it is an optimisation that is hard to test.

Fixed everything else. If I'd done #25.3 I would have saved a few hours of my life wondering wtf was going on :)

As #25 is nits and not really changing logic setting back to RTBC.

longwave’s picture

RTBC again for #26.

  • catch committed 525c3d9 on 9.3.x
    Issue #3211164 by alexpott, catch: Random errors in Javascript Testing
    

  • catch committed 21733d2 on 9.2.x
    Issue #3211164 by alexpott, catch: Random errors in Javascript Testing...

  • catch committed bfda344 on 9.1.x
    Issue #3211164 by alexpott, catch: Random errors in Javascript Testing...
catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed/pushed to 9.3.x and cherry-picked to 9.2.x, thanks!

Also cherry-picked to 9.1.x on the basis that if there turns out to be an unexpected core tests it's easy to revert on 9.1.x for DrupalCI, but if it fixes contrib tests then it's not very useful in 9.1.x except for a patch release. Hopefully this is the right way 'round in reality.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.