Longer term idea: consider Rust's homu for CPython merge gating

(Note: near term, this probably isn't a useful idea, as the current GitHub & GitLab proposals aren't proposing the introduction of merge gating, merely advisory testing of PRs, so folks will remain free to merge patches locally and push them up to the master repository. However, I think pre-merge testing is a good idea long term, with approving our own PRs taking the place of pushing directly to the development and maintenance branches) I recently came across an old article [1] by Rust's Graydon Hoare regarding their approach to pre-merge testing of commits using GitHub PRs and Buildbot. Following up on that brought me to a more up to date explanation of their current bot setup [2], and the http://homu.io/ service for running it against a project without hosting your own instance (in our case, if we did run a service like Homu, we'd likely still want to host our own instance, as Mozilla do for Rust and Servo). Similar to OpenStack's Zuul, Homu serialises commits, ensuring that the test suite is passing *before* code lands on the target branch. The relevant differences are that Homu is designed to use GitHub PRs rather than Gerrit reviews as the event source, and Travis CI and Buildbot rather than Jenkins as the CI backends for actually running the integration tests. (Homu is also currently missing Zuul's "speculative test pipeline" feature, which allows it to test multiple commits in parallel on the assumption that most of them will pass due to prior testing in isolation, but has a separate feature that allows PRs to be tagged for inclusion in "rollup" commits that land several changes at once after testing them as a group) The current merge queue for Rust itself can be seen at http://buildbot.rust-lang.org/homu/queue/rust Regards, Nick. [1] http://graydon2.dreamwidth.org/1597.html [2] http://huonw.github.io/blog/2015/03/rust-infrastructure-can-be-your-infrastr... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks for the link! For those that didn't look at the link, Homu is actually written in Python and not Rust (the website for the project is what's written in Rust). Berker has actually said he is beginning to look into writing a bot for our needs so he might be interested in this. A quick perusal suggests it probably needs to be modified to understand supporting multiple branches. I also don't know if it supports the fast-forwarding/rebasing commits we want, but I doubt that would be difficult to add. It probably also needs to grow an extension system somehow to support custom NEWS entries for us. But it probably would be really nice for the Python and Rust communities to share some of the infrastructure burden by working on the same tooling (especially if we even consider using their auto-assignment to help get patches reviewed). On Wed, 30 Dec 2015 at 20:21 Nick Coghlan <ncoghlan@gmail.com> wrote:
(Note: near term, this probably isn't a useful idea, as the current GitHub & GitLab proposals aren't proposing the introduction of merge gating, merely advisory testing of PRs, so folks will remain free to merge patches locally and push them up to the master repository. However, I think pre-merge testing is a good idea long term, with approving our own PRs taking the place of pushing directly to the development and maintenance branches)
I recently came across an old article [1] by Rust's Graydon Hoare regarding their approach to pre-merge testing of commits using GitHub PRs and Buildbot. Following up on that brought me to a more up to date explanation of their current bot setup [2], and the http://homu.io/ service for running it against a project without hosting your own instance (in our case, if we did run a service like Homu, we'd likely still want to host our own instance, as Mozilla do for Rust and Servo).
Similar to OpenStack's Zuul, Homu serialises commits, ensuring that the test suite is passing *before* code lands on the target branch. The relevant differences are that Homu is designed to use GitHub PRs rather than Gerrit reviews as the event source, and Travis CI and Buildbot rather than Jenkins as the CI backends for actually running the integration tests. (Homu is also currently missing Zuul's "speculative test pipeline" feature, which allows it to test multiple commits in parallel on the assumption that most of them will pass due to prior testing in isolation, but has a separate feature that allows PRs to be tagged for inclusion in "rollup" commits that land several changes at once after testing them as a group)
The current merge queue for Rust itself can be seen at http://buildbot.rust-lang.org/homu/queue/rust
Regards, Nick.
[1] http://graydon2.dreamwidth.org/1597.html [2] http://huonw.github.io/blog/2015/03/rust-infrastructure-can-be-your-infrastr...
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct

On 1 January 2016 at 06:21, Brett Cannon <brett@python.org> wrote:
Thanks for the link! For those that didn't look at the link, Homu is actually written in Python and not Rust (the website for the project is what's written in Rust).
Berker has actually said he is beginning to look into writing a bot for our needs so he might be interested in this.
A quick perusal suggests it probably needs to be modified to understand supporting multiple branches. I also don't know if it supports the fast-forwarding/rebasing commits we want, but I doubt that would be difficult to add. It probably also needs to grow an extension system somehow to support custom NEWS entries for us.
Which reminds me, Amber Brown recently wrote a NEWS file generator based on the file-per-issue model, later assembled into a NEWS file section as part of the project's release process: https://github.com/hawkowl/towncrier It's aimed more at projects with a simple linear release history rather than CPython's branching sprawl, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Larry Hastings also wrote one that's attached to the issue regarding fixing the NEWS merge problem. On Thu, 31 Dec 2015, 20:47 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 1 January 2016 at 06:21, Brett Cannon <brett@python.org> wrote:
Thanks for the link! For those that didn't look at the link, Homu is actually written in Python and not Rust (the website for the project is what's written in Rust).
Berker has actually said he is beginning to look into writing a bot for our needs so he might be interested in this.
A quick perusal suggests it probably needs to be modified to understand supporting multiple branches. I also don't know if it supports the fast-forwarding/rebasing commits we want, but I doubt that would be difficult to add. It probably also needs to grow an extension system somehow to support custom NEWS entries for us.
Which reminds me, Amber Brown recently wrote a NEWS file generator based on the file-per-issue model, later assembled into a NEWS file section as part of the project's release process: https://github.com/hawkowl/towncrier
It's aimed more at projects with a simple linear release history rather than CPython's branching sprawl, though.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Dec 31, 2015 at 1:21 PM, Brett Cannon <brett@python.org> wrote:
Thanks for the link! For those that didn't look at the link, Homu is actually written in Python and not Rust (the website for the project is what's written in Rust).
Berker has actually said he is beginning to look into writing a bot for our needs so he might be interested in this.
A quick perusal suggests it probably needs to be modified to understand supporting multiple branches. I also don't know if it supports the fast-forwarding/rebasing commits we want, but I doubt that would be difficult to add. It probably also needs to grow an extension system somehow to support custom NEWS entries for us.
But it probably would be really nice for the Python and Rust communities to share some of the infrastructure burden by working on the same tooling (especially if we even consider using their auto-assignment to help get patches reviewed).
While that's true, the ages of the projects are so different that I wonder if our workflows have that much in common. And it seems there are quite a few tools that work like this. In general I like the idea of a commit queue; we built one at Dropbox which has been very useful in keeping our master green. Based on this experience, I note that testing each commit before it's merged limits the rate at which commits can be merged, unless you play some tricks that occasionally backfire. Taking homu's brief description literally, it would seem that if it takes e.g. 10 minutes to run the tests you can't accept more than 6 commits per hour. I don't recall how long Python's tests take, but I I wouldn't be surprised if it was a lot longer than 10 minutes. Now, you can reduce the running time by parallelizing test runs, and Python doesn't see that many commits typically, so perhaps this isn't a problem. But for larger projects it may be -- it definitely was a big concern at Dropbox, and I recall hearing about some astronomical stats from the Chromium project. At Dropbox we solved this by not literally testing each commit after the last one has been merged. Instead, for each commit to be tested, we create a throwaway branch where we merge the commit with the current HEAD on the master branch, test the result, and then (if the tests pass) re-merge the commit onto master. This way we can test many commits in parallel (and yes, we pay a lot of money to Amazon for the needed capacity :-). It is *possible* that this strategy introduces a failure in master, but pretty unlikely if the final merge succeeds without conflicts (on a git merge conflict we start over). We re-test after the commit has landed on master so we will still detect such failures -- but in practice there are some other reasons why the post-merge test can fail (typically due to flaky tests) and those are much more common than "commit races" (as we've dubbed the phenomenon of two commits that each individually pass the tests, and merge without conflicts, yet introduce a test failure when combined). FWIW the tools we use at Dropbox also linearize commit history. We never keep the micro-commits used by the developer during (or before) the code review phase -- that history is saved forever in our separate code review tool. The CI tool runs tests at each code review iteration, but without the "merge with latest HEAD" step described above -- that's only done after the developer and reviewer have reached agreement on landing the commit (or when the developer explicitly rebases their short-term development branch). --Guido
On Wed, 30 Dec 2015 at 20:21 Nick Coghlan <ncoghlan@gmail.com> wrote:
(Note: near term, this probably isn't a useful idea, as the current GitHub & GitLab proposals aren't proposing the introduction of merge gating, merely advisory testing of PRs, so folks will remain free to merge patches locally and push them up to the master repository. However, I think pre-merge testing is a good idea long term, with approving our own PRs taking the place of pushing directly to the development and maintenance branches)
I recently came across an old article [1] by Rust's Graydon Hoare regarding their approach to pre-merge testing of commits using GitHub PRs and Buildbot. Following up on that brought me to a more up to date explanation of their current bot setup [2], and the http://homu.io/ service for running it against a project without hosting your own instance (in our case, if we did run a service like Homu, we'd likely still want to host our own instance, as Mozilla do for Rust and Servo).
Similar to OpenStack's Zuul, Homu serialises commits, ensuring that the test suite is passing *before* code lands on the target branch. The relevant differences are that Homu is designed to use GitHub PRs rather than Gerrit reviews as the event source, and Travis CI and Buildbot rather than Jenkins as the CI backends for actually running the integration tests. (Homu is also currently missing Zuul's "speculative test pipeline" feature, which allows it to test multiple commits in parallel on the assumption that most of them will pass due to prior testing in isolation, but has a separate feature that allows PRs to be tagged for inclusion in "rollup" commits that land several changes at once after testing them as a group)
The current merge queue for Rust itself can be seen at http://buildbot.rust-lang.org/homu/queue/rust
Regards, Nick.
[1] http://graydon2.dreamwidth.org/1597.html [2] http://huonw.github.io/blog/2015/03/rust-infrastructure-can-be-your-infrastr...
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
_______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
-- --Guido van Rossum (python.org/~guido)

On 3 January 2016 at 15:08, Guido van Rossum <guido@python.org> wrote:
In general I like the idea of a commit queue; we built one at Dropbox which has been very useful in keeping our master green. Based on this experience, I note that testing each commit before it's merged limits the rate at which commits can be merged, unless you play some tricks that occasionally backfire. Taking homu's brief description literally, it would seem that if it takes e.g. 10 minutes to run the tests you can't accept more than 6 commits per hour. I don't recall how long Python's tests take, but I I wouldn't be surprised if it was a lot longer than 10 minutes.
It's less than 10 minutes on a modern laptop (although I admit I haven't done a -uall run in a while), but longer than that on the Buildbot fleet (since some of the buildbots aren't particularly fast).
Now, you can reduce the running time by parallelizing test runs, and Python doesn't see that many commits typically, so perhaps this isn't a problem. But for larger projects it may be -- it definitely was a big concern at Dropbox, and I recall hearing about some astronomical stats from the Chromium project.
At Dropbox we solved this by not literally testing each commit after the last one has been merged. Instead, for each commit to be tested, we create a throwaway branch where we merge the commit with the current HEAD on the master branch, test the result, and then (if the tests pass) re-merge the commit onto master. This way we can test many commits in parallel (and yes, we pay a lot of money to Amazon for the needed capacity :-).
As far I'm aware, OpenStack's Zuul mergebot is currently best in class at solving this problem: * while a patch is in review, they run a "check" test against master to see if the change works at all * to be approved for merge, patches need both a clean check run and approval from human reviewers * once patches are approved for merge, they go into a merge queue, where each patch is applied atop the previous one * Zuul runs up to N test runs in parallel, merging them in queue order as they finish successfully * if any test run fails, any subsequent test runs (finished or not) are discarded, the offending patch is removed from the merge queue, and Zuul rebuilds the queue with the failing patch removed It's a lot like a CPU pipeline in that regard - when everything goes well, patches are merged as fast as they're approved, just with a fixed time delay corresponding to the length of time it takes to run the test suite. If one of the merges fails, then it's like branch prediction in a CPU failing - you have to flush the pipeline and start over again from the point of the failure. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ah, the last part there is a nice algorithm. It determines the intended merge order when patches are submitted to the queue, then runs tests for multiple patches in parallel. So, assuming most patches pass their tests, the rate at which patches can land is limited only by how fast it can run parallelized tests, and as long as the queue isn't congested, each patch lands roughly as soon as its tests succeed. On Sat, Jan 2, 2016 at 10:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 January 2016 at 15:08, Guido van Rossum <guido@python.org> wrote:
In general I like the idea of a commit queue; we built one at Dropbox which has been very useful in keeping our master green. Based on this experience, I note that testing each commit before it's merged limits the rate at which commits can be merged, unless you play some tricks that occasionally backfire. Taking homu's brief description literally, it would seem that if it takes e.g. 10 minutes to run the tests you can't accept more than 6 commits per hour. I don't recall how long Python's tests take, but I I wouldn't be surprised if it was a lot longer than 10 minutes.
It's less than 10 minutes on a modern laptop (although I admit I haven't done a -uall run in a while), but longer than that on the Buildbot fleet (since some of the buildbots aren't particularly fast).
Now, you can reduce the running time by parallelizing test runs, and Python doesn't see that many commits typically, so perhaps this isn't a problem. But for larger projects it may be -- it definitely was a big concern at Dropbox, and I recall hearing about some astronomical stats from the Chromium project.
At Dropbox we solved this by not literally testing each commit after the last one has been merged. Instead, for each commit to be tested, we create a throwaway branch where we merge the commit with the current HEAD on the master branch, test the result, and then (if the tests pass) re-merge the commit onto master. This way we can test many commits in parallel (and yes, we pay a lot of money to Amazon for the needed capacity :-).
As far I'm aware, OpenStack's Zuul mergebot is currently best in class at solving this problem:
* while a patch is in review, they run a "check" test against master to see if the change works at all * to be approved for merge, patches need both a clean check run and approval from human reviewers * once patches are approved for merge, they go into a merge queue, where each patch is applied atop the previous one * Zuul runs up to N test runs in parallel, merging them in queue order as they finish successfully * if any test run fails, any subsequent test runs (finished or not) are discarded, the offending patch is removed from the merge queue, and Zuul rebuilds the queue with the failing patch removed
It's a lot like a CPU pipeline in that regard - when everything goes well, patches are merged as fast as they're approved, just with a fixed time delay corresponding to the length of time it takes to run the test suite. If one of the merges fails, then it's like branch prediction in a CPU failing - you have to flush the pipeline and start over again from the point of the failure.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- --Guido van Rossum (python.org/~guido)

On 4 January 2016 at 07:23, Guido van Rossum <guido@python.org> wrote:
Ah, the last part there is a nice algorithm. It determines the intended merge order when patches are submitted to the queue, then runs tests for multiple patches in parallel. So, assuming most patches pass their tests, the rate at which patches can land is limited only by how fast it can run parallelized tests, and as long as the queue isn't congested, each patch lands roughly as soon as its tests succeed.
Yeah, I first saw a talk on it at linux.conf.au a couple of years ago, and was really impressed. They have a good write-up on the system here: http://docs.openstack.org/infra/zuul/gating.html For CPython though, OpenHub indicates our commit rate is on the order of 15 per day (5754 commits in the last 12 months), so even if GitHub gets us a nice uptick in landing submitted patches, we still have a lot of headroom before the main bottleneck becomes anything other than availability of reviewers. Cheers, Nick. P.S. While I think it would be overkill for CPython, folks working with distributed systems may also be interested in a neat system called ElasticRecheck that OpenStack use to identify intermittent failures based on patterns detected in log files: http://docs.openstack.org/infra/elastic-recheck/readme.html#idea Relevant dashboard for the OpenStack integrated release gate: http://status.openstack.org/elastic-recheck/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (3)
-
Brett Cannon
-
Guido van Rossum
-
Nick Coghlan