[Python-Dev] Status of Python CIs (buildbots, Travis CI, AppVeyor): july 2018

Wed Jul 18 16:54:46 EDT 2018

Hi,

It seems like my latest status of Python CIs was already one year ago!

   https://mail.python.org/pipermail/python-dev/2017-June/148511.html

Since last year, Zachary Ware (with the help of others, but I forgot
names, sorry!) migrated our buildbot server from buildbot 0.8 (Python
2.7) to buildbot 0.9 (Python 3.4). The new buildbot version has a very
different web UI:

   https://buildbot.python.org/all/#/

It took me time to get used to it, but now I prefer the new UI
especially to see the result of a single build. The page loads faster
and it's easier to access data. I also like the readable list of all
builders:

   https://buildbot.python.org/all/#/builders

The buildbot "warnings" step now contains test failures and test
errors for a quick overview of bugs. Example:

    FAIL: test_threads (test.test_gdb.PyBtTests)
    Re-running failed tests in verbose mode
    Re-running test 'test_gdb' in verbose mode
    FAIL: test_threads (test.test_gdb.PyBtTests)

I also modified libregrtest (our test runner: python3 -m test) to
display a better tests summary at the end, especially when there is a
least one failure. Truncated example:
---
== Tests result: FAILURE then SUCCESS ==

378 tests OK.

10 slowest tests:
- test_multiprocessing_spawn: 1 min 57 sec
- test_concurrent_futures: 1 min 36 sec
- test_nntplib: 30 sec 275 ms
- (...)

28 tests skipped:
    test_crypt (...)

1 re-run test:
    test_threading

Total duration: 4 min 59 sec
Tests result: FAILURE then SUCCESS
---

"FAILURE then SUCCESS" means that at least one test failed, but then
all re-run tests succeeded. "1 re-run test: test_threading" is the
list of tests that failed previously. That's also a new feature.

Last May, we worked hard to fix many random test failures on all CIs
before Python 3.7 final release. Today, the number of tests which fail
randomly is *very* low. Since the the beginning of the year, I fixed
bugs in more than 35 test files. The most complex issues were in
multiprocessing tests: the most common random failures should now be
fixed.

Many memory and reference leaks have been fixed. I also started to fix
leaks of Windows handles:

    https://github.com/python/cpython/pull/7827

I added new keys to test.pythoninfo: Py_DEBUG, C compiler version,
gdbm version, memory allocator, etc.

The test.bisect tool has been optimized to be usable on test_asyncio,
one of the test which has the most test cases and methods.

I spent a lot of time to fix each test failure even when a test only
failed once on one specific CI on a specific pull request. I increased
many timeouts to make fragile tests more "reliable" (reduce the risk
of failures on slow buildbots). Some timeouts are just too strict for
no good reason.

Python CIs are not perfect, but random failures should now be more rare.

Mailing list for email notifications when a buildbot fails. That's my
main source to detect regressions and tests which fail randomly:
https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/

Buildbot builders:
http://buildbot.python.org/all/#/builders

Travis CI build history:
https://travis-ci.org/python/cpython/builds

AppVeyor build history:
https://ci.appveyor.com/project/python/cpython/history

My notes on Python CIs:
http://pythondev.readthedocs.io/ci.html

Thanks Zachary Ware for maintaining our buildbot servers, thanks Pablo
Galindo Salgado who helped me to triage buildbot failures (on the
buildbot-status mailing list), thanks all other developers who helped
me to fix random test failures and make our test suite more stable!

Victor