It seems like my latest status of Python CIs was already one year ago!
Since last year, Zachary Ware (with the help of others, but I forgot names, sorry!) migrated our buildbot server from buildbot 0.8 (Python 2.7) to buildbot 0.9 (Python 3.4). The new buildbot version has a very different web UI:
It took me time to get used to it, but now I prefer the new UI especially to see the result of a single build. The page loads faster and it's easier to access data. I also like the readable list of all builders:
The buildbot "warnings" step now contains test failures and test errors for a quick overview of bugs. Example:
FAIL: test_threads (test.test_gdb.PyBtTests) Re-running failed tests in verbose mode Re-running test 'test_gdb' in verbose mode FAIL: test_threads (test.test_gdb.PyBtTests)
I also modified libregrtest (our test runner: python3 -m test) to display a better tests summary at the end, especially when there is a least one failure. Truncated example: --- == Tests result: FAILURE then SUCCESS ==
378 tests OK.
10 slowest tests: - test_multiprocessing_spawn: 1 min 57 sec - test_concurrent_futures: 1 min 36 sec - test_nntplib: 30 sec 275 ms - (...)
28 tests skipped: test_crypt (...)
1 re-run test: test_threading
Total duration: 4 min 59 sec Tests result: FAILURE then SUCCESS ---
"FAILURE then SUCCESS" means that at least one test failed, but then all re-run tests succeeded. "1 re-run test: test_threading" is the list of tests that failed previously. That's also a new feature.
Last May, we worked hard to fix many random test failures on all CIs before Python 3.7 final release. Today, the number of tests which fail randomly is *very* low. Since the the beginning of the year, I fixed bugs in more than 35 test files. The most complex issues were in multiprocessing tests: the most common random failures should now be fixed.
Many memory and reference leaks have been fixed. I also started to fix leaks of Windows handles:
I added new keys to test.pythoninfo: Py_DEBUG, C compiler version, gdbm version, memory allocator, etc.
The test.bisect tool has been optimized to be usable on test_asyncio, one of the test which has the most test cases and methods.
I spent a lot of time to fix each test failure even when a test only failed once on one specific CI on a specific pull request. I increased many timeouts to make fragile tests more "reliable" (reduce the risk of failures on slow buildbots). Some timeouts are just too strict for no good reason.
Python CIs are not perfect, but random failures should now be more rare.
Mailing list for email notifications when a buildbot fails. That's my main source to detect regressions and tests which fail randomly: https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/
Buildbot builders: http://buildbot.python.org/all/#/builders
Travis CI build history: https://travis-ci.org/python/cpython/builds
AppVeyor build history: https://ci.appveyor.com/project/python/cpython/history
My notes on Python CIs: http://pythondev.readthedocs.io/ci.html
Thanks Zachary Ware for maintaining our buildbot servers, thanks Pablo Galindo Salgado who helped me to triage buildbot failures (on the buildbot-status mailing list), thanks all other developers who helped me to fix random test failures and make our test suite more stable!
On Wed, Jul 18, 2018 at 3:16 PM Barry Warsaw firstname.lastname@example.org wrote:
On Jul 18, 2018, at 13:54, Victor Stinner email@example.com wrote:
Last May, we worked hard to fix many random test failures on all CIs before Python 3.7 final release.
Thank you thank you thank you Victor for work on keeping the buildbots happy!
Yes, thank you Victor (and friends). Your work on this makes a concrete difference.