[Python-Dev] How to watch buildbots?

Wed May 30 08:30:39 EDT 2018

On 30.05.2018 13:01, Victor Stinner wrote:
> Hi,
>
> I would like to delegate the maintenance task "watch buildbots", since
> I'm already very busy with many other maintenance tasks. I'm looking
> for volunteers to handle incoming emails on buildbot-status. I already
> started to explain to Pablo Galindo Salgado how to do that, but it
> would be great to have at least two people doing this task. Otherwise,
> Pablo wouldn't be able to take holiday or just make a break for any
> reason. Buildbots are evil beast which require care every day.
> Otherwise, they quickly turn red and become less useful :-(
>
> It seems like the first blocker issue is that we have no explicit
> documentation "how to deal with buildbots?" (the devguide
> documentation is incomplete, it doesn't explain what I'm explaining
> below). Let me start with a few notes of how I watch buildbots.
>
> I'm getting buildbot notifications on IRC (#python-dev on Freenode)
> and on the buildbot-status mailing list:
> https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/
>
> When a buildbot fails, I look at tests logs and I try to check if an
> issue has already been reported. For example, search for the test
> method in title (ex: "test_complex" for test_complex() method). If no
> result, search using the test filename (ex: "test_os" for
> Lib/test/test_os.py). If there is no result, repeat with full text
> searchs ("All Text"). If you cannot find any open bug, create a new
> one:
>
> * The title should contain the test name, test method and the buildbot
> name. Example: " test_posix: TestPosixSpawn fails on PPC64 Fedora
> 3.x".
> * The description should contain the link to the buildbot failure. Try
> to identify useful parts of tests log and copy them in the
> description.
> * Fill the Python version field (ex: "3.8" for 3.x buildbots)
> * Select at least the "Tests" Component. You may select additional
> Components depending on the bug.
>
> If a bug was already open, you may add a comment to mention that there
> is a new failure: add at least a link to buildbot name and a link to
> the failure.
>
> And that's all! Simple, isn't it? At this stage, there is no need to
> investigate the test failure.
>
> To finish, reply to the failure notification on the mailing list with
> a very short email: add a link to the existing or the freshly created
> issue, maybe copy one line of the failure and/or the issue title.
>
> Recent bug example: https://bugs.python.org/issue33630
>
> --
>
> Later, you may want to analyze these failures, but I consider that
> it's a different job (different "maintenance task"). If you don't feel
> able to analyze the bug, you may try to find someone who knows more
> than you about the failure.
>
> For better bug reports, you can look at the [Changes] tab of a build
> failure, and try to identify which recent change introduced the
> regression. This task requires to follow recent commits, since
> sometimes the failure is old, it's just that the test fails randomly
> depending on network issues, system load, or anything else. Sometimes,
> previous tests have side effects. Or the buildbot owner made a change
> on the system. There are many different explanation, it's hard to
> write a complete list. It's really on a case by case basis.
>
> Hopefully, it's now more common that a buildbot failure is obvious and
> caused by a very specific recent changes which can be found in the
> [Changes] tab.
>
> --
>
> If you are interested to help me on watching our CIs: please come on
> the python-buildbot at python.org mailing list! Introduce yourself and
> explain how do you plan to help. I may propose to mentor you to assist
> you the first weeks.
>
> As I wrote, maybe a first step would be to write down a documentation
> how to deal with buildbots and/or update and complete existing
> documentations.
>
> https://devguide.python.org/buildbots/
>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/vano%40mail.mipt.ru

What's the big idea of separate buildbots anyway? I thought the purpose 
of CI is to test everything _before_
it breaks the main codebase. Then it's the job of the contributor rather 
than maintainer to fix any breakages.

So, maybe making them be driven by Github checks would be a better time 
investment.
Especially since we've got VSTS checks just recently, so whoever was 
doing that still knows how to interface with this Github machinery.

If the bots cancel a previous build if a new one for the same PR 
arrives, this will not lead to a significant load difference 'cuz the 
number of
actively developed PRs is stable and roughly equal to the number of 
merges according to the open/closed tickets dynamics.

-- 
Regards,
Ivan