
Hi,
I would like to delegate the maintenance task "watch buildbots", since I'm already very busy with many other maintenance tasks. I'm looking for volunteers to handle incoming emails on buildbot-status. I already started to explain to Pablo Galindo Salgado how to do that, but it would be great to have at least two people doing this task. Otherwise, Pablo wouldn't be able to take holiday or just make a break for any reason. Buildbots are evil beast which require care every day. Otherwise, they quickly turn red and become less useful :-(
It seems like the first blocker issue is that we have no explicit documentation "how to deal with buildbots?" (the devguide documentation is incomplete, it doesn't explain what I'm explaining below). Let me start with a few notes of how I watch buildbots.
I'm getting buildbot notifications on IRC (#python-dev on Freenode) and on the buildbot-status mailing list: https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/
When a buildbot fails, I look at tests logs and I try to check if an issue has already been reported. For example, search for the test method in title (ex: "test_complex" for test_complex() method). If no result, search using the test filename (ex: "test_os" for Lib/test/test_os.py). If there is no result, repeat with full text searchs ("All Text"). If you cannot find any open bug, create a new one:
* The title should contain the test name, test method and the buildbot name. Example: " test_posix: TestPosixSpawn fails on PPC64 Fedora 3.x". * The description should contain the link to the buildbot failure. Try to identify useful parts of tests log and copy them in the description. * Fill the Python version field (ex: "3.8" for 3.x buildbots) * Select at least the "Tests" Component. You may select additional Components depending on the bug.
If a bug was already open, you may add a comment to mention that there is a new failure: add at least a link to buildbot name and a link to the failure.
And that's all! Simple, isn't it? At this stage, there is no need to investigate the test failure.
To finish, reply to the failure notification on the mailing list with a very short email: add a link to the existing or the freshly created issue, maybe copy one line of the failure and/or the issue title.
Recent bug example: https://bugs.python.org/issue33630
--
Later, you may want to analyze these failures, but I consider that it's a different job (different "maintenance task"). If you don't feel able to analyze the bug, you may try to find someone who knows more than you about the failure.
For better bug reports, you can look at the [Changes] tab of a build failure, and try to identify which recent change introduced the regression. This task requires to follow recent commits, since sometimes the failure is old, it's just that the test fails randomly depending on network issues, system load, or anything else. Sometimes, previous tests have side effects. Or the buildbot owner made a change on the system. There are many different explanation, it's hard to write a complete list. It's really on a case by case basis.
Hopefully, it's now more common that a buildbot failure is obvious and caused by a very specific recent changes which can be found in the [Changes] tab.
--
If you are interested to help me on watching our CIs: please come on the python-buildbot@python.org mailing list! Introduce yourself and explain how do you plan to help. I may propose to mentor you to assist you the first weeks.
As I wrote, maybe a first step would be to write down a documentation how to deal with buildbots and/or update and complete existing documentations.
https://devguide.python.org/buildbots/
Victor

On 30.05.2018 13:01, Victor Stinner wrote:
Hi,
I would like to delegate the maintenance task "watch buildbots", since I'm already very busy with many other maintenance tasks. I'm looking for volunteers to handle incoming emails on buildbot-status. I already started to explain to Pablo Galindo Salgado how to do that, but it would be great to have at least two people doing this task. Otherwise, Pablo wouldn't be able to take holiday or just make a break for any reason. Buildbots are evil beast which require care every day. Otherwise, they quickly turn red and become less useful :-(
It seems like the first blocker issue is that we have no explicit documentation "how to deal with buildbots?" (the devguide documentation is incomplete, it doesn't explain what I'm explaining below). Let me start with a few notes of how I watch buildbots.
I'm getting buildbot notifications on IRC (#python-dev on Freenode) and on the buildbot-status mailing list: https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/
When a buildbot fails, I look at tests logs and I try to check if an issue has already been reported. For example, search for the test method in title (ex: "test_complex" for test_complex() method). If no result, search using the test filename (ex: "test_os" for Lib/test/test_os.py). If there is no result, repeat with full text searchs ("All Text"). If you cannot find any open bug, create a new one:
- The title should contain the test name, test method and the buildbot
name. Example: " test_posix: TestPosixSpawn fails on PPC64 Fedora 3.x".
- The description should contain the link to the buildbot failure. Try
to identify useful parts of tests log and copy them in the description.
- Fill the Python version field (ex: "3.8" for 3.x buildbots)
- Select at least the "Tests" Component. You may select additional
Components depending on the bug.
If a bug was already open, you may add a comment to mention that there is a new failure: add at least a link to buildbot name and a link to the failure.
And that's all! Simple, isn't it? At this stage, there is no need to investigate the test failure.
To finish, reply to the failure notification on the mailing list with a very short email: add a link to the existing or the freshly created issue, maybe copy one line of the failure and/or the issue title.
Recent bug example: https://bugs.python.org/issue33630
--
Later, you may want to analyze these failures, but I consider that it's a different job (different "maintenance task"). If you don't feel able to analyze the bug, you may try to find someone who knows more than you about the failure.
For better bug reports, you can look at the [Changes] tab of a build failure, and try to identify which recent change introduced the regression. This task requires to follow recent commits, since sometimes the failure is old, it's just that the test fails randomly depending on network issues, system load, or anything else. Sometimes, previous tests have side effects. Or the buildbot owner made a change on the system. There are many different explanation, it's hard to write a complete list. It's really on a case by case basis.
Hopefully, it's now more common that a buildbot failure is obvious and caused by a very specific recent changes which can be found in the [Changes] tab.
--
If you are interested to help me on watching our CIs: please come on the python-buildbot@python.org mailing list! Introduce yourself and explain how do you plan to help. I may propose to mentor you to assist you the first weeks.
As I wrote, maybe a first step would be to write down a documentation how to deal with buildbots and/or update and complete existing documentations.
https://devguide.python.org/buildbots/
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/vano%40mail.mipt.ru
What's the big idea of separate buildbots anyway? I thought the purpose of CI is to test everything _before_ it breaks the main codebase. Then it's the job of the contributor rather than maintainer to fix any breakages.
So, maybe making them be driven by Github checks would be a better time investment. Especially since we've got VSTS checks just recently, so whoever was doing that still knows how to interface with this Github machinery.
If the bots cancel a previous build if a new one for the same PR arrives, this will not lead to a significant load difference 'cuz the number of actively developed PRs is stable and roughly equal to the number of merges according to the open/closed tickets dynamics.

On 30 May 2018 at 22:30, Ivan Pozdeev via Python-Dev python-dev@python.org wrote:
What's the big idea of separate buildbots anyway? I thought the purpose of CI is to test everything _before_ it breaks the main codebase. Then it's the job of the contributor rather than maintainer to fix any breakages.
So, maybe making them be driven by Github checks would be a better time investment. Especially since we've got VSTS checks just recently, so whoever was doing that still knows how to interface with this Github machinery.
If the bots cancel a previous build if a new one for the same PR arrives, this will not lead to a significant load difference 'cuz the number of actively developed PRs is stable and roughly equal to the number of merges according to the open/closed tickets dynamics.
There are a few key details here:
1. We currently need to run post-merge CI anyway, as we're not doing linearised commits (where core devs just approve a change without merging it, and then a gating system like Zuul ensures that the tests are run against the latest combination of the target branch and the PR before merging the change) 2. Since the buildbots are running on donated dedicated machines (rather than throwaway instances from a dynamic CI provider), we need to review the code before we let it run on the contributed systems 3. The buildbot instances run *1* build at a time, which would lead to major PR merging bottlenecks during sprints if we made them a gating requirement 4. For the vast majority of PRs, the post-merge cross-platform testing is a formality, since the code being modified is using lower level cross-platform APIs elsewhere in the standard library, so if it works on Windows, Linux, and Mac OS X, it will work everywhere Python runs 5. We generally don't *want* to burden new contributors with the task of dealing with the less common (or harder to target) platforms outside the big 3 - when they do break, it often takes a non-trivial amount of platform knowledge to understand what's different about the platform in question
Cheers, Nick.
P.S. That said, if VSTS or Travis were to offer FreeBSD as an option for pre-merge CI, I'd suggest we enable it, at least in an advisory capacity - it's a better check against Linux-specific assumptions creeping into the code base than Mac OS X, since the latter is regularly different enough from other *nix systems that we need to give it dedicated code paths.

On 30.05.2018 16:36, Nick Coghlan wrote:
On 30 May 2018 at 22:30, Ivan Pozdeev via Python-Dev <python-dev@python.org mailto:python-dev@python.org> wrote:
What's the big idea of separate buildbots anyway? I thought the purpose of CI is to test everything _before_ it breaks the main codebase. Then it's the job of the contributor rather than maintainer to fix any breakages. So, maybe making them be driven by Github checks would be a better time investment. Especially since we've got VSTS checks just recently, so whoever was doing that still knows how to interface with this Github machinery. If the bots cancel a previous build if a new one for the same PR arrives, this will not lead to a significant load difference 'cuz the number of actively developed PRs is stable and roughly equal to the number of merges according to the open/closed tickets dynamics.
There are a few key details here:
- We currently need to run post-merge CI anyway, as we're not doing
linearised commits (where core devs just approve a change without merging it, and then a gating system like Zuul ensures that the tests are run against the latest combination of the target branch and the PR before merging the change)
This is the only point here that looks valid (others can be refuted). This technique limits the achievable commit rate by 1/testing_time . Our average rate probably fits into this, though it still means delays.
- Since the buildbots are running on donated dedicated machines
(rather than throwaway instances from a dynamic CI provider), we need to review the code before we let it run on the contributed systems 3. The buildbot instances run *1* build at a time, which would lead to major PR merging bottlenecks during sprints if we made them a gating requirement 4. For the vast majority of PRs, the post-merge cross-platform testing is a formality, since the code being modified is using lower level cross-platform APIs elsewhere in the standard library, so if it works on Windows, Linux, and Mac OS X, it will work everywhere Python runs 5. We generally don't *want* to burden new contributors with the task of dealing with the less common (or harder to target) platforms outside the big 3 - when they do break, it often takes a non-trivial amount of platform knowledge to understand what's different about the platform in question
Cheers, Nick.
P.S. That said, if VSTS or Travis were to offer FreeBSD as an option for pre-merge CI, I'd suggest we enable it, at least in an advisory capacity - it's a better check against Linux-specific assumptions creeping into the code base than Mac OS X, since the latter is regularly different enough from other *nix systems that we need to give it dedicated code paths.
-- Nick Coghlan | ncoghlan@gmail.com mailto:ncoghlan@gmail.com | Brisbane, Australia

On Wed, May 30, 2018 at 7:36 AM, Nick Coghlan ncoghlan@gmail.com wrote:
There are a few key details here:
- We currently need to run post-merge CI anyway, as we're not doing
linearised commits (where core devs just approve a change without merging it, and then a gating system like Zuul ensures that the tests are run against the latest combination of the target branch and the PR before merging the change)
This is more of a concern when non-conflicting PRs against the same (or related) code are active at the same time. For the CPython code base this isn't as much of a problem, right? Under normal circumstances [fresh] active PRs typically do not run afoul of each other. Furthermore, during peak-activity events (like sprints) folks tend to keep a closer eye on the buildbots. I suppose old-but-still-active PRs that previously passed CI could cause a problem. However, it would be unlikely for such a PR to sit for a long time without needing changes before merging, whether to address reviewer concerns or to resolve merge conflicts.
So post-merge CI (or merge gating) doesn't seem like much of a factor for us. In that regard I'd consider the buildbots more that sufficient.
- Since the buildbots are running on donated dedicated machines (rather
than throwaway instances from a dynamic CI provider), we need to review the code before we let it run on the contributed systems 3. The buildbot instances run *1* build at a time,
...where each build incorporates potentially several merged PRs...
which would lead to major PR merging bottlenecks during sprints if we made them a gating requirement
Agreed. There's enough of a delay already when watching the buildbots post-merge (especially some of them). :)
- For the vast majority of PRs, the post-merge cross-platform testing is a
formality, since the code being modified is using lower level cross-platform APIs elsewhere in the standard library, so if it works on Windows, Linux, and Mac OS X, it will work everywhere Python runs
This is especially true of changes proposed by non-core contributors. It is also very true for buildbots with the OS/hardware combos that match CI. That said, when working with the C-API you can end up breaking things on the less common OSes and hardware platforms. So *those* buildbots are invaluable. I'm dealing with that right now.
- We generally don't *want* to burden new contributors with the task of
dealing with the less common (or harder to target) platforms outside the big 3 - when they do break, it often takes a non-trivial amount of platform knowledge to understand what's different about the platform in question
As hinted above, I would not expect new contributors to provide patches very often (if ever) that would have the potential to cause buildbot failures but not fail under CI. So this point seems somewhat moot. :)
P.S. That said, if VSTS or Travis were to offer FreeBSD as an option for pre-merge CI, I'd suggest we enable it, at least in an advisory capacity - it's a better check against Linux-specific assumptions creeping into the code base than Mac OS X, since the latter is regularly different enough from other *nix systems that we need to give it dedicated code paths.
+1
-eric

2018-05-30 14:30 GMT+02:00 Ivan Pozdeev via Python-Dev python-dev@python.org:
What's the big idea of separate buildbots anyway? I thought the purpose of CI is to test everything _before_ it breaks the main codebase. Then it's the job of the contributor rather than maintainer to fix any breakages.
I will answer more generally.
Technically, buildbots support to send emails to author of changes which introduced a regression.
But a build may test a single change or dozens of new changes.
Moreover, our test suite is not perfect: they are at least 5 known tests which fail randomly. Even if we fix these unstable tests, it's also "common" that buildbots fail for "external" reasons:
* network failure: fail to clone the GitHub repository
* functional test using an external service and the service is down. I started to list external services used by "unit" tests: http://vstinner.readthedocs.io/cpython.html#services-used-by-unit-tests
* vacuum cleaner: https://mail.python.org/pipermail/python-buildbots/2017-June/000122.html
* many other random reasons...
Since two years, I'm trying to fix all tests failing randomly, but as I just explained, it's really hard to get a failure rate of 0%.
I'm not sure that we can "require" authors of pull requests to understand buildbot failures...
So I prefer to keep the status quo: filter buildbot failures manually.
Victor
participants (4)
-
Eric Snow
-
Ivan Pozdeev
-
Nick Coghlan
-
Victor Stinner