Symptom: Tests on testbots complete, but then start over, cycling endlessly.
Root Cause: Testbot is not able to send results to qa.d.o.
- Testbot watchdog log contains: “Failed to send result: Request Entity Too Large”
- This is tripped on line 95 of pifr_client_xmlrpc_send_result() in pifr_client.xmlrpc.inc
- called by pifr_client_review_run() in pifr_client.review.inc
- Passed $review->get_result(), defined at line 355 of pifr_simpletest.client.inc
- contains a large array of results for a given test run.
Additionally, after 3-4 test cycles on the same patch following this pattern, apache segfaults.
Short-term resolution:
There is a views debug() statement which contributes 10k assertions to every test … clearing it out should open up the communications channel again ... at least for the average test. Patch is at https://drupal.org/node/1822048#comment-7479778
- [May 31st, 1am CST] This patch has been applied, and D8 tests are now processing successfully will now process successfully once D8 HEAD is unbroken.
Medium-term resolution:
Identify the network/server element which is causing the HTTP 413 response, and reconfigure it to allow the communication of large payloads.
Long-term resolution:
Refactor testbot communications to support parallel test processing, intermediate batch results, and greatly reduce the amount of information which needs to be transferred between the testbots and qa.d.o in each communications exchange.
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | 2008626-1.patch | 1.03 KB | jthorson |
Comments
Comment #1
jthorson commentedComment #2
jthorson commentedTemporary workaround.
Comment #3
jthorson commentedhttps://drupal.org/node/1822048#comment-7479778 committed ... trying that first; as it should confirm the diagnosis.
Comment #4
jthorson commentedHTTP 413 responses still seen this morning on a 2MB request ... but they are no longer being sent by the proxy (where they were seen earlier ... updating the max_client_body_size parameter appears to have unplugged things there).
We're now looking at 1 in 100 tests, rather than every single test; which makes this harder to debug (but easier on my blood pressure). Next time we run into it, I'd suggest upping the limits on the testbot side of things, to see if that might be where the limit is being applied.
Comment #5
jthorson commentedInfra ticket opened at #2009884: Nginx blocking large requests, including testbot communications and file uploads.
Leaving this open to see if we might be able to enhance PIFR to better handle the failure scenario.
Comment #6
jthorson commented#2009884: Nginx blocking large requests, including testbot communications and file uploads closed, and #2010482: Compress test results before sending over to qa.d.o opened for the long-term resolution.
Comment #7.0
(not verified) commentedUpdated issue summary