[Python-Dev] How to watch buildbots?

Wed May 30 06:01:32 EDT 2018

Hi,

I would like to delegate the maintenance task "watch buildbots", since
I'm already very busy with many other maintenance tasks. I'm looking
for volunteers to handle incoming emails on buildbot-status. I already
started to explain to Pablo Galindo Salgado how to do that, but it
would be great to have at least two people doing this task. Otherwise,
Pablo wouldn't be able to take holiday or just make a break for any
reason. Buildbots are evil beast which require care every day.
Otherwise, they quickly turn red and become less useful :-(

It seems like the first blocker issue is that we have no explicit
documentation "how to deal with buildbots?" (the devguide
documentation is incomplete, it doesn't explain what I'm explaining
below). Let me start with a few notes of how I watch buildbots.

I'm getting buildbot notifications on IRC (#python-dev on Freenode)
and on the buildbot-status mailing list:
https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/

When a buildbot fails, I look at tests logs and I try to check if an
issue has already been reported. For example, search for the test
method in title (ex: "test_complex" for test_complex() method). If no
result, search using the test filename (ex: "test_os" for
Lib/test/test_os.py). If there is no result, repeat with full text
searchs ("All Text"). If you cannot find any open bug, create a new
one:

* The title should contain the test name, test method and the buildbot
name. Example: " test_posix: TestPosixSpawn fails on PPC64 Fedora
3.x".
* The description should contain the link to the buildbot failure. Try
to identify useful parts of tests log and copy them in the
description.
* Fill the Python version field (ex: "3.8" for 3.x buildbots)
* Select at least the "Tests" Component. You may select additional
Components depending on the bug.

If a bug was already open, you may add a comment to mention that there
is a new failure: add at least a link to buildbot name and a link to
the failure.

And that's all! Simple, isn't it? At this stage, there is no need to
investigate the test failure.

To finish, reply to the failure notification on the mailing list with
a very short email: add a link to the existing or the freshly created
issue, maybe copy one line of the failure and/or the issue title.

Recent bug example: https://bugs.python.org/issue33630

--

Later, you may want to analyze these failures, but I consider that
it's a different job (different "maintenance task"). If you don't feel
able to analyze the bug, you may try to find someone who knows more
than you about the failure.

For better bug reports, you can look at the [Changes] tab of a build
failure, and try to identify which recent change introduced the
regression. This task requires to follow recent commits, since
sometimes the failure is old, it's just that the test fails randomly
depending on network issues, system load, or anything else. Sometimes,
previous tests have side effects. Or the buildbot owner made a change
on the system. There are many different explanation, it's hard to
write a complete list. It's really on a case by case basis.

Hopefully, it's now more common that a buildbot failure is obvious and
caused by a very specific recent changes which can be found in the
[Changes] tab.

--

If you are interested to help me on watching our CIs: please come on
the python-buildbot at python.org mailing list! Introduce yourself and
explain how do you plan to help. I may propose to mentor you to assist
you the first weeks.

As I wrote, maybe a first step would be to write down a documentation
how to deal with buildbots and/or update and complete existing
documentations.

https://devguide.python.org/buildbots/

Victor