Revert changes which break too many buildbots
Hi,
The CPython workflow was enhanced to get pre-commit CI checks. That's a huge win, thank you for that... But, sometimes, a change can still break many buildbots, bugs which weren't catched by pre-commit checks (Travis CI/Linux and AppVeyor/Windows). Buildbots cover much more different architectures and platforms.
I spend a significant amount of time to maintain the sanity of our buildbots. Sometimes, it can take me up to 3 days on a week (of 5 working days). It's annoying to see new regressions while I'm trying hard to fix old ones :-(
So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.
It doesn't mean that the commit is bad and must not be merged ever. No. It would just mean that we need time to work on fixing the issue, and it shouldn't impact other pending changes, to keep a sane master branch.
What do you think? Would you be ok with such rule?
A recent example is Nick Coghlan's implementation of the PEP 538: basically, it broke all buildbots... except of Linux and Windows :-) And it will take a few more days to fix all failures. Well, we are working on fixing these issues, so I don't want to revert this change. It's just an example of a single change which broke many buildbots. The PEP 538 depends a lot on the platform, so I'm not surprised to see different failures per platforms ;-)
By "buildbot failure", I mean a red buildbot failing because of compilation error or failed test suite. But I would prefer to ignore failures of the Refleak buildbots since these ones are not stable (even if there are less and less ref leaks in master, and these buildbots already catched recent regressions!).
If the rule is approved, I plan to announce it on python-dev to be transparent.
Victor
2017-06-14 17:40 GMT+03:00 Victor Stinner <victor.stinner@gmail.com>:
So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.
It doesn't mean that the commit is bad and must not be merged ever. No. It would just mean that we need time to work on fixing the issue, and it shouldn't impact other pending changes, to keep a sane master branch.
What do you think? Would you be ok with such rule?
I think we first should make buildbots notifying the author of a commit that broke tests or building, so his can either quickly fix the failure or revert his commit.
2017-06-14 18:38 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:
What do you think? Would you be ok with such rule?
I think we first should make buildbots notifying the author of a commit that broke tests or building, so his can either quickly fix the failure or revert his commit.
One or two months ago, I created a new buildbot-status mailing list which gets notifications of buildbot failures: https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/
I'm happy because we get less and less random failures, but I don't feel confortable yet to spam the author of changes when a buildbot fails for an unrelated reason like a network issue.
Analyzing buildbot failures is still a manual step, so I propose to make a compromise here ;-)
Reading buildbot logs is a boring task, I don't expect that everyone do it. I cannot require that all contributors take care of the CI. But I consider that it's part of my job, I'm fine with that ;-)
Victor
2017-06-14 18:38 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:
I think we first should make buildbots notifying the author of a commit that broke tests or building, so his can either quickly fix the failure or revert his commit.
Hum, I think that I should elaborate my previous email.
It's usually easy to identify a commit which introduced regressions if you compare 2 or 3 failing buildbots and their list of changes. So when I identify the commit, if I cannot fix the issue immediately, usually I leave a message on the issue (bugs.python.org). So the author but also other people who worked on the issue are well aware of the regression.
In my experiemnce, it's rare that I get any feedback in less than 24h, while slowly more and more buildbots become red. It depends on the availability of the commiter.
Since I'm trying to always keep the highest number of green buildbots, I prefer to try to fix the bug myself.
My question is what to do if I'm unable to fix the issue and the author is not available. Keep a broken CI for hours, sometimes for days. Or revert the change to reduce the pressure and get more time to write a *proper* fix?
I propose to revert to get more people at the issue to find the best option, rather than urging to push a quick & dirty fix.
Hopefully, in most cases, the bug is trivial and I consider that the fix doesn't require a review, so I push it directly.
Victor
I’m +1 on reverting the change, I’d even go so far to say I’d be +1 on doing it as a first response. It’s always possible to revert the revert once the person who committed the patch has time to investigate the failure and recommit the patch with a fix.
On Jun 14, 2017, at 5:00 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
2017-06-14 18:38 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:
I think we first should make buildbots notifying the author of a commit that broke tests or building, so his can either quickly fix the failure or revert his commit.
Hum, I think that I should elaborate my previous email.
It's usually easy to identify a commit which introduced regressions if you compare 2 or 3 failing buildbots and their list of changes. So when I identify the commit, if I cannot fix the issue immediately, usually I leave a message on the issue (bugs.python.org). So the author but also other people who worked on the issue are well aware of the regression.
In my experiemnce, it's rare that I get any feedback in less than 24h, while slowly more and more buildbots become red. It depends on the availability of the commiter.
Since I'm trying to always keep the highest number of green buildbots, I prefer to try to fix the bug myself.
My question is what to do if I'm unable to fix the issue and the author is not available. Keep a broken CI for hours, sometimes for days. Or revert the change to reduce the pressure and get more time to write a *proper* fix?
I propose to revert to get more people at the issue to find the best option, rather than urging to push a quick & dirty fix.
Hopefully, in most cases, the bug is trivial and I consider that the fix doesn't require a review, so I push it directly.
Victor
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
— Donald Stufft
On 06/14/2017 02:07 PM, Donald Stufft wrote:
I’m +1 on reverting the change, I’d even go so far to say I’d be +1 on doing it as a first response. It’s always possible to revert the revert once the person who committed the patch has time to investigate the failure and recommit the patch with a fix.
I would rather have the revert be the first response. Without a thorough understanding of the bug/feature being addressed it is entirely possible to introduce a new bug or change the semantics of the original patch.
-- ~Ethan~
On Jun 14, 2017, at 11:00 PM, Victor Stinner wrote:
Since I'm trying to always keep the highest number of green buildbots, I prefer to try to fix the bug myself.
My question is what to do if I'm unable to fix the issue and the author is not available. Keep a broken CI for hours, sometimes for days. Or revert the change to reduce the pressure and get more time to write a *proper* fix?
I propose to revert to get more people at the issue to find the best option, rather than urging to push a quick & dirty fix.
Hopefully, in most cases, the bug is trivial and I consider that the fix doesn't require a review, so I push it directly.
I agree with you that if it's easy to fix, JFDI. If not, revert it to keep the buildbots green and inform the author about the problem. For now that can be to update the issue or PR so the author gets a mention, but later it can be an autonag email.
-Barry
On Wed, 14 Jun 2017 at 07:46 Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
The CPython workflow was enhanced to get pre-commit CI checks. That's a huge win, thank you for that... But, sometimes, a change can still break many buildbots, bugs which weren't catched by pre-commit checks (Travis CI/Linux and AppVeyor/Windows). Buildbots cover much more different architectures and platforms.
There's also macOS if you look at the Travis results manually.
I spend a significant amount of time to maintain the sanity of our buildbots. Sometimes, it can take me up to 3 days on a week (of 5 working days). It's annoying to see new regressions while I'm trying hard to fix old ones :-(
So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.
It doesn't mean that the commit is bad and must not be merged ever. No. It would just mean that we need time to work on fixing the issue, and it shouldn't impact other pending changes, to keep a sane master branch.
What do you think? Would you be ok with such rule?
I'm +1 on it. We have always expected people to watch the buildbots after a commit to make sure nothing horrible happened. And leaving a branch broken just makes fixing it harder due to code change and it masks potential failures from subsequent commits that happen to manifest themselves as a similar failure.
-Brett
A recent example is Nick Coghlan's implementation of the PEP 538: basically, it broke all buildbots... except of Linux and Windows :-) And it will take a few more days to fix all failures. Well, we are working on fixing these issues, so I don't want to revert this change. It's just an example of a single change which broke many buildbots. The PEP 538 depends a lot on the platform, so I'm not surprised to see different failures per platforms ;-)
By "buildbot failure", I mean a red buildbot failing because of compilation error or failed test suite. But I would prefer to ignore failures of the Refleak buildbots since these ones are not stable (even if there are less and less ref leaks in master, and these buildbots already catched recent regressions!).
If the rule is approved, I plan to announce it on python-dev to be transparent.
Victor
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
On 15 June 2017 at 00:40, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
The CPython workflow was enhanced to get pre-commit CI checks. That's a huge win, thank you for that... But, sometimes, a change can still break many buildbots, bugs which weren't catched by pre-commit checks (Travis CI/Linux and AppVeyor/Windows). Buildbots cover much more different architectures and platforms.
I spend a significant amount of time to maintain the sanity of our buildbots. Sometimes, it can take me up to 3 days on a week (of 5 working days). It's annoying to see new regressions while I'm trying hard to fix old ones :-(
So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.
I'm not necessarily opposed to such a policy change, but if folks really want guaranteed green post-merge buildbots for all platforms (rather than just guaranteed green for Linux & Windows, sometimes red for everything else), then I think a better place to focus effort would be in making it easier for us to test things across the full BuildBot fleet in pre-merge CI.
For example, something that would be genuinely helpful would be a bot monitoring PR comments that could automate the "custom-build" dance for core developers (i.e. we'd be able to write something like "BuildBot: test custom build", and it would go away, kick off a custom BuildBot run by pushing the PR to the "custom-build" branch, and then report back the links for failed builds like http://buildbot.python.org/all/builders/x86%20Tiger%20custom/builds/5 )
That way, the reversion process would be:
- Revert the change
- Post a "BuildBot: test custom build" comment on the offending PR
- Original PR author, committer, and anyone else interested continues the issue resolution process based on the specific links posted back by the helper bot
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, 15 Jun 2017 13:31:39 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 15 June 2017 at 00:40, Victor Stinner <victor.stinner@gmail.com> wrote:
So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.
I'm not necessarily opposed to such a policy change, but if folks really want guaranteed green post-merge buildbots for all platforms (rather than just guaranteed green for Linux & Windows, sometimes red for everything else), then I think a better place to focus effort would be in making it easier for us to test things across the full BuildBot fleet in pre-merge CI.
I've long thought that reversion should be the policy.
I'm all in favor of making it easier to run the custom builders on a PR, of course, but I think the policy change is orthogonal to that. It's not like that change represents effort that would otherwise be devoted to making using the custom build easier...Victor is putting out the effort to monitor the buildbots already and clearly intends to continue to do so regardless. Being able to revert will make the job he has taken on (thank you Victor!) easier.
--David
2017-06-15 5:31 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I'm not necessarily opposed to such a policy change, but if folks really want guaranteed green post-merge buildbots for all platforms (rather than just guaranteed green for Linux & Windows, sometimes red for everything else), then I think a better place to focus effort would be in making it easier for us to test things across the full BuildBot fleet in pre-merge CI.
For example, something that would be genuinely helpful would be a bot monitoring PR comments that could automate the "custom-build" dance for core developers (i.e. we'd be able to write something like "BuildBot: test custom build", and it would go away, kick off a custom BuildBot run by pushing the PR to the "custom-build" branch, and then report back the links for failed builds like http://buildbot.python.org/all/builders/x86%20Tiger%20custom/builds/5 )
I don't think that buildbots have enough resources to run tests on each PR. We get too many PR and each PR is updated regulary. On some buildbots, a single build can take 1 hour (or longer)... Moreover, that's an idea, but we need "someone" to do the job. I don't know anyone available to do it.
I'm proposing a pragmatic solution to a concrete issue. I'm just to do by best with our limited resources (human and CPU time). Again, failures on random platforms not catched by our pre-commit CI occur, but are -hopefully- rare!
I dislike having a long list of "nice to have" list for our infra. I prefer to focus on the existing bricks and don't put too much pressure on people who give their time to care of our tooling and infra ;-)
(Yes, I would also prefer to have a fleet of buildbots at the pre-commit stage if I can get it for free :-))
Victor
On Thu, 15 Jun 2017 at 06:39 Victor Stinner <victor.stinner@gmail.com> wrote:
2017-06-15 5:31 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I'm not necessarily opposed to such a policy change, but if folks really want guaranteed green post-merge buildbots for all platforms (rather than just guaranteed green for Linux & Windows, sometimes red for everything else), then I think a better place to focus effort would be in making it easier for us to test things across the full BuildBot fleet in pre-merge CI.
For example, something that would be genuinely helpful would be a bot monitoring PR comments that could automate the "custom-build" dance for core developers (i.e. we'd be able to write something like "BuildBot: test custom build", and it would go away, kick off a custom BuildBot run by pushing the PR to the "custom-build" branch, and then report back the links for failed builds like http://buildbot.python.org/all/builders/x86%20Tiger%20custom/builds/5 )
I don't think that buildbots have enough resources to run tests on each PR. We get too many PR and each PR is updated regularly. On some buildbots, a single build can take 1 hour (or longer)... Moreover, that's an idea, but we need "someone" to do the job. I don't know anyone available to do it.
I'm proposing a pragmatic solution to a concrete issue. I'm just to do by best with our limited resources (human and CPU time). Again, failures on random platforms not caught by our pre-commit CI occur, but are -hopefully- rare!
I dislike having a long list of "nice to have" list for our infra. I prefer to focus on the existing bricks and don't put too much pressure on people who give their time to care of our tooling and infra ;-)
I at least appreciate that. :)
(Yes, I would also prefer to have a fleet of buildbots at the pre-commit stage if I can get it for free :-))
Yes, that would definitely be nice, especially if we could get it working in the cloud to handle load and a wider range of OSs. But this is all orthogonal to the question of "revert immediately or not?". It's also not worth discussing here as it's totally wishful thinking for the foreseeable future (that's what the core-workflow issue tracker is for ;) .
-Brett
Victor
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
On 15 June 2017 at 23:38, Victor Stinner <victor.stinner@gmail.com> wrote:
2017-06-15 5:31 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
For example, something that would be genuinely helpful would be a bot monitoring PR comments that could automate the "custom-build" dance for core developers (i.e. we'd be able to write something like "BuildBot: test custom build", and it would go away, kick off a custom BuildBot run by pushing the PR to the "custom-build" branch, and then report back the links for failed builds like http://buildbot.python.org/all/builders/x86%20Tiger%20custom/builds/5 )
I don't think that buildbots have enough resources to run tests on each PR. We get too many PR and each PR is updated regulary. On some buildbots, a single build can take 1 hour (or longer)...
I wasn't proposing running every PR through BuildBot, I was suggesting having a more straightforward way of triggering custom builds directly from the PR we want to test after post-merge CI has indicated it's necessary.
However, I think there's a way of handling that which doesn't require any new automation code to be written. Instead, we can have the policy be to post a link to https://docs.python.org/devguide/buildbots.html#custom-builders on the PR that is being reverted.
That way, even if folks didn't notice the problem with the buildbots themselves:
- They get a ping on a PR that they're necessarily following
- They get a reminder of how to test the revised PR across the buildbot fleet prior to merging it again
Hopefully reversions will continue to be rare (since relatively few changes are likely to be as platform dependent as PEP 538, and Windows/*nix differences are already covered in pre-merge CI), but when they do come up, the reminder of how to manually trigger pre-merge cross-platform CI is likely to be useful, and doesn't require much additional effort on the part of the folks doing the reversion.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2017-06-16 10:37 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
Hopefully reversions will continue to be rare (since relatively few changes are likely to be as platform dependent as PEP 538, and Windows/*nix differences are already covered in pre-merge CI), but when they do come up, the reminder of how to manually trigger pre-merge cross-platform CI is likely to be useful, and doesn't require much additional effort on the part of the folks doing the reversion.
I'm also tolerant. A regression on macOS Tiger matters me less than a regression on most Linux buildbots. I tried to focus on recent versions of Windows, Linux, macOS and FreeBSD.
AIX, OpenIndiana/Solaris, OpenBSD, etc. are broken for month. It's ok, I can live with that :-) But regulary, I propose to drop support for these platforms ;-) For AIX, I tried to skip a few tests known to fail on AIX. For OpenIndiana, we should simply remove this buildbot and replace it with a new IllumOS buildbot.
Victor
On Fri, 16 Jun 2017 at 03:40 Victor Stinner <victor.stinner@gmail.com> wrote:
2017-06-16 10:37 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
Hopefully reversions will continue to be rare (since relatively few changes are likely to be as platform dependent as PEP 538, and Windows/*nix differences are already covered in pre-merge CI), but when they do come up, the reminder of how to manually trigger pre-merge cross-platform CI is likely to be useful, and doesn't require much additional effort on the part of the folks doing the reversion.
I'm also tolerant. A regression on macOS Tiger matters me less than a regression on most Linux buildbots. I tried to focus on recent versions of Windows, Linux, macOS and FreeBSD.
AIX, OpenIndiana/Solaris, OpenBSD, etc. are broken for month. It's ok, I can live with that :-) But regulary, I propose to drop support for these platforms ;-) For AIX, I tried to skip a few tests known to fail on AIX. For OpenIndiana, we should simply remove this buildbot and replace it with a new IllumOS buildbot.
https://www.python.org/dev/peps/pep-0011/ backs you up in dropping support if we can't get a buildbot *and* someone to maintain the support (and it sounds like we don't have the latter ATM for those platforms). Maybe we should amend PEP 11 to say that whomever volunteers to maintain a platform must make sure that platform's buildbot is not red for longer than a month (to give volunteers some time to notice the fail and fix it in case it happens while they are e.g. on vacation)? Otherwise we will consider the platform unsupported.
On 06/16/2017 09:48 AM, Brett Cannon wrote:
Maybe we should amend PEP 11 to say that whomever volunteers to maintain a platform must make sure that platform's buildbot is not red for longer than a month [...]
How about three? Some life changes need more than a month to recover from... (death in the family, life in the family, job loss, etc.)
-- ~Ethan~
On Fri, 16 Jun 2017 at 13:24 Ethan Furman <ethan@stoneleaf.us> wrote:
On 06/16/2017 09:48 AM, Brett Cannon wrote:
Maybe we should amend PEP 11 to say that whomever volunteers to maintain a platform must make sure that platform's buildbot is not red for longer than a month [...]
How about three? Some life changes need more than a month to recover from... (death in the family, life in the family, job loss, etc.)
True. I guess as long as we are upfront that the platform's buildbot is known to be failing and the clock has started (so we all know to ignore the failure), then I'm fine with 3 months.
On 17 June 2017 at 06:26, Brett Cannon <brett@python.org> wrote:
On Fri, 16 Jun 2017 at 13:24 Ethan Furman <ethan@stoneleaf.us> wrote:
On 06/16/2017 09:48 AM, Brett Cannon wrote:
Maybe we should amend PEP 11 to say that whomever volunteers to maintain a platform must make sure that platform's buildbot is not red for longer than a month [...] How about three? Some life changes need more than a month to recover from... (death in the family, life in the family, job loss, etc.)
True. I guess as long as we are upfront that the platform's buildbot is known to be failing and the clock has started (so we all know to ignore the failure), then I'm fine with 3 months.
It's also the case of that being the time period to ping python-committers with a notification to say "Hey, my availability is likely to be intermittent for a while, so the buildbot for <platform> may be unreliable until <future date>".
That way, other folks that care about the platform may be more alert to resolving platform specific issues during that time (and may even be able to take over the lead platform support role).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 15 June 2017 at 00:40, Victor Stinner <victor.stinner@gmail.com> wrote:
A recent example is Nick Coghlan's implementation of the PEP 538: basically, it broke all buildbots... except of Linux and Windows :-) And it will take a few more days to fix all failures. Well, we are working on fixing these issues, so I don't want to revert this change. It's just an example of a single change which broke many buildbots. The PEP 538 depends a lot on the platform, so I'm not surprised to see different failures per platforms ;-)
Status update on this specific change: Mac OS X should be back to green now [1] (with some anomalous cases being skipped [2])
The tests are currently still failing on C-locale-only platforms (see [3]), and on FreeBSD specifically (see [4])
Cheers, Nick.
[1] http://buildbot.python.org/all/builders/x86%20Tiger%20custom/builds/7 [2] https://bugs.python.org/issue30672 [3] https://bugs.python.org/issue30565#msg296006 [4] https://bugs.python.org/issue30647
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 15 June 2017 at 19:35, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 15 June 2017 at 00:40, Victor Stinner <victor.stinner@gmail.com> wrote:
A recent example is Nick Coghlan's implementation of the PEP 538: basically, it broke all buildbots... except of Linux and Windows :-) And it will take a few more days to fix all failures. Well, we are working on fixing these issues, so I don't want to revert this change. It's just an example of a single change which broke many buildbots. The PEP 538 depends a lot on the platform, so I'm not surprised to see different failures per platforms ;-)
Status update on this specific change: Mac OS X should be back to green now [1] (with some anomalous cases being skipped [2])
I've just merged the change to turn off the locale coercion and compatibility warnings by default, which also included changes to:
- Temporarily disable UTF-8 as a locale coercion target (as it can break nl_langinfo on FreeBSD)
- Temporarily skip testing behaviour in the POSIX locale (as it isn't a simple alias for the C locale on *BSD systems)
So from a buildbot fleet perspective, the PEP 538 fallout should now be contained, and the above two checks will only be restored after they're passing on the custom buildbot fleet (I'm pretty sure I know how to get them both working, but it will take some experimenting with the custom build branch to confirm that).
From a local Linux development perspective, "LANG=C python3.7 ..." is now functionally equivalent to doing "LANG=C LC_CTYPE=C.UTF-8 python3.7 ..." - you have to add "PYTHONCOERCECLOCALE=warn" to see the warnings that were previously emitted by default.
Cheers, Nick.
P.S. For the benefit of future readers of PEP 538 itself, I also added an "Implementation Notes" section pointing out that we ended up *not* implementing the PEP as written, since it proved unworkable in practice.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Ok, since I spent weeks on fixing buildbots, I'm now more confident that our buildbots are super stable. Since a test_datetime change introduced a *regression* (ARMv7 started to fail), I reverted the first commit: https://github.com/python/cpython/pull/2588
Again, it's not to reject the change, just to repair the buildbot to get more time to design and test the proper fix.
Because of a bug, test_datetime only tested the C implementation of datetime. The change enables tests on Lib/datetime.py, the Python implementation. Suddenly, test_datetime started to take up to 20 minutes to run! The slowest time on some buildbots.
I don't know much more at this point.
Please join http://bugs.python.org/issue30822 if you want to help fixing this issue ;-)
Victor
2017-06-14 16:40 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Hi,
The CPython workflow was enhanced to get pre-commit CI checks. That's a huge win, thank you for that... But, sometimes, a change can still break many buildbots, bugs which weren't catched by pre-commit checks (Travis CI/Linux and AppVeyor/Windows). Buildbots cover much more different architectures and platforms.
I spend a significant amount of time to maintain the sanity of our buildbots. Sometimes, it can take me up to 3 days on a week (of 5 working days). It's annoying to see new regressions while I'm trying hard to fix old ones :-(
So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.
It doesn't mean that the commit is bad and must not be merged ever. No. It would just mean that we need time to work on fixing the issue, and it shouldn't impact other pending changes, to keep a sane master branch.
What do you think? Would you be ok with such rule?
A recent example is Nick Coghlan's implementation of the PEP 538: basically, it broke all buildbots... except of Linux and Windows :-) And it will take a few more days to fix all failures. Well, we are working on fixing these issues, so I don't want to revert this change. It's just an example of a single change which broke many buildbots. The PEP 538 depends a lot on the platform, so I'm not surprised to see different failures per platforms ;-)
By "buildbot failure", I mean a red buildbot failing because of compilation error or failed test suite. But I would prefer to ignore failures of the Refleak buildbots since these ones are not stable (even if there are less and less ref leaks in master, and these buildbots already catched recent regressions!).
If the rule is approved, I plan to announce it on python-dev to be transparent.
Victor
2017-07-05 15:51 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Ok, since I spent weeks on fixing buildbots, I'm now more confident that our buildbots are super stable. Since a test_datetime change introduced a *regression* (ARMv7 started to fail), I reverted the first commit: https://github.com/python/cpython/pull/2588
Crap. I created this PR using the [Revert] button. While the changes are fine, I didn't notice the giant commit message which is wrong and spamed me with notifications on unrelated issues :-/
Sorry for the spam...
commit 8207c17486baece8ed0ac42d9f8d69ecec4ba7e4 Author: Victor Stinner <victor.stinner@gmail.com> Date: Wed Jul 5 15:44:52 2017 +0200
Revert "bpo-30822: Fix testing of datetime module." (#2588)
* Revert "bpo-30854: Fix compile error when --without-threads (#2581)"
This reverts commit 0c3116309307ad2c7f8e2d2096612f4ab33cbb62.
* Revert "NEWS for 30777 (#2576)"
This reverts commit aaa917ff38f9869eeebe3bc9469bfee64089d826.
* Revert "bpo-21624: IDLE -- minor htest fixes (#2575)"
This reverts commit 2000150c569941584994ec4ec59171961209bec3.
* Revert "bpo-30777: IDLE: configdialog - add docstrings and
improve comments (#2440)"
This reverts commit 7eb5883ac59833bf63f0e1f7fb95671a1ac1ee08.
* Revert "bpo-30319: socket.close() now ignores ECONNRESET (#2565)"
This reverts commit 67e1478dba6efe60b8e1890192014b8b06dd6bd9.
* Revert "bpo-30789: Use a single memory block for co_extra. (#2555)"
This reverts commit 378ebb6578b9d709f38b888d23874c0b18125249.
* Revert "bpo-30845: Enhance test_concurrent_futures cleanup (#2564)"
This reverts commit 3df9dec425b0254df1cdf41922fd8d6b08bf47e4.
* Revert "bpo-29293: multiprocessing.Condition.notify() lacks
parameter n
(#2480)"
This reverts commit 48350412b70c76fa51f488cfc736c80d59b5e8eb.
* Revert "Remove outdated FOX from GUI FAQ (GH-2538)"
This reverts commit d3ed2877a798d07df75422afe136b4727e500c99.
* Revert "bpo-6691: Pyclbr now reports nested classes and
functions. (#2503)"
This reverts commit 246ff3bd00f97658e567a7087645a6b76e056491.
* Revert "bpo-29464: Rename METH_FASTCALL to
METH_FASTCALL|METH_KEYWORDS and make (#1955)"
This reverts commit 6969eaf4682beb01bc95eeb14f5ce6c01312e297.
* Revert "bpo-30832: Remove own implementation for thread-local
storage (#2537)"
This reverts commit aa0aa0492c5fffe750a26d2ab13737a1a6d7d63c.
* Revert "bpo-30764: Fix regrtest --fail-env-changed --forever (#2536)"
This reverts commit 5e87592fd12e0b7c41edc11d4885ed7298d5063b.
* Revert "bpo-30822: Deduplicate ZoneInfoTest classes in
test_datetime. (#2534)"
This reverts commit 34b54873b51a1ebee2a3c57b7205537b4f33128d.
* Revert "bpo-30822: Fix testing of datetime module. (#2530)"
This reverts commit 98b6bc3bf72532b784a1c1fa76eaa6026a663e44.
Victor
On 7/5/2017 10:03 AM, Victor Stinner wrote:
2017-07-05 15:51 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Ok, since I spent weeks on fixing buildbots, I'm now more confident that our buildbots are super stable. Since a test_datetime change introduced a *regression* (ARMv7 started to fail), I reverted the first commit: https://github.com/python/cpython/pull/2588
Crap. I created this PR using the [Revert] button. While the changes are fine, I didn't notice the giant commit message which is wrong and spamed me with notifications on unrelated issues :-/
Committers should always review checkin messages. Unless a PR is made with a single commit, the message usually needs editing.
Sorry for the spam...
Each tracker issue mentioned below got a bogus reversion message. But better the spam than an actual reversion ;-). I unlinked the ones for IDLE. You should go through the rest.
Terry
participants (9)
-
Barry Warsaw
-
Brett Cannon
-
Donald Stufft
-
Ethan Furman
-
Nick Coghlan
-
R. David Murray
-
Serhiy Storchaka
-
Terry Reedy
-
Victor Stinner