PEP 462: Workflow automation for CPython
Hey all,
Rather than leaving my ideas undocumented until the language summit in April, I wrote up what I see as the critical issues in our current workflow and how I believe Zuul could help us resolve them as a PEP: http://www.python.org/dev/peps/pep-0462/
I don't think we should *do* anything about this until after PyCon US, but wanted to publish something that clearly explained my thinking rather than surprising people with it at the summit.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Interesting. Chromium has something kind-of similar, named "commit queue", for developers without actual commit access. Once they get an LGTM, the thing rolls automatically. In fact, core developers often find it useful too because the Chromium tree is sometimes closed ("red"). We don't really do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
Eli
On Fri, Jan 24, 2014 at 9:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hey all,
Rather than leaving my ideas undocumented until the language summit in April, I wrote up what I see as the critical issues in our current workflow and how I believe Zuul could help us resolve them as a PEP: http://www.python.org/dev/peps/pep-0462/
I don't think we should *do* anything about this until after PyCon US, but wanted to publish something that clearly explained my thinking rather than surprising people with it at the summit.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers
On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote:
do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
--David
On Sat, Jan 25, 2014 at 6:14 AM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote:
do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
Non-flakiness of bots is a holy grail few projects attain. If your bots are consistently green with no flakes, it just means you're not testing enough :-)
Eli
On Sat, 25 Jan 2014 06:35:59 -0800, Eli Bendersky <eliben@gmail.com> wrote:
On Sat, Jan 25, 2014 at 6:14 AM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote:
do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
Non-flakiness of bots is a holy grail few projects attain. If your bots are consistently green with no flakes, it just means you're not testing enough :-)
How does OpenStack do it, then? I haven't actually looked at Zuul yet, though it is on my shortlist.
--David
On Jan 25, 2014, at 10:09 AM, R. David Murray <rdmurray@bitdance.com> wrote:
On Sat, 25 Jan 2014 06:35:59 -0800, Eli Bendersky <eliben@gmail.com> wrote:
On Sat, Jan 25, 2014 at 6:14 AM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote:
do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
Non-flakiness of bots is a holy grail few projects attain. If your bots are consistently green with no flakes, it just means you're not testing enough :-)
How does OpenStack do it, then? I haven't actually looked at Zuul yet, though it is on my shortlist.
--David
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers
Flaky tests have bugs assigned, if a test fails due to a bug you make a comment on the review saying to reverify with the bug number. It let’s them track which bugs are causing the most issues with the gate and such too.
Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On sam., 2014-01-25 at 06:35 -0800, Eli Bendersky wrote:
On Sat, Jan 25, 2014 at 6:14 AM, R. David Murray <rdmurray@bitdance.com> wrote: On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote: > do the latter in Python, which carries a problem we'll probably need to > resolve first - how to know that the bots are green enough. That really > needs human attention. By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
Non-flakiness of bots is a holy grail few projects attain. If your bots are consistently green with no flakes, it just means you're not testing enough :-)
There are certainly statistical ways to workaround the "necessary flakiness", but that would require someone to sit with a pen and paper a bit and figure out what the right metrics should be :-)
Regards
Antoine.
On 1/25/2014 2:55 PM, Antoine Pitrou wrote:
On sam., 2014-01-25 at 06:35 -0800, Eli Bendersky wrote:
On Sat, Jan 25, 2014 at 6:14 AM, R. David Murray <rdmurray@bitdance.com> wrote: On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote: > do the latter in Python, which carries a problem we'll probably need to > resolve first - how to know that the bots are green enough. That really > needs human attention.
By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
Non-flakiness of bots is a holy grail few projects attain. If your bots are consistently green with no flakes, it just means you're not testing enough :-)
There are certainly statistical ways to workaround the "necessary flakiness", but that would require someone to sit with a pen and paper a bit and figure out what the right metrics should be :-)
If I run the test suit twice and a particular gives different results, then it is not purely a test of CPython and not-passing is not necessarily a CPython failure. That to mean that the buildbots should not be red. Perhaps purple ;-). More seriously, an intermittent timeout failure might be recorded as an unexpected or perhaps an 'undesired' skip rather than as a test failure. A test failure should indicate that CPython needs to be patched, not that the test system, including the internet, flaked out.
Terry
On Sat, Jan 25, 2014 at 2:49 PM, Eli Bendersky <eliben@gmail.com> wrote:
Interesting. Chromium has something kind-of similar, named "commit queue", for developers without actual commit access. Once they get an LGTM, the thing rolls automatically. In fact, core developers often find it useful too because the Chromium tree is sometimes closed ("red"). We don't really do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
Another interesting (and relevant, I think) concept from the Mozilla community is the Try Server, where you can push a work-in-progress patch to see how it does on all the platforms. I.e. it runs all the same tests that build slaves run, but the repository it works against isn't accessible publicly, so you can try your work without breaking the main tree.
Cheers,
Dirkjan
On Sat, Jan 25, 2014 at 6:54 AM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
On Sat, Jan 25, 2014 at 2:49 PM, Eli Bendersky <eliben@gmail.com> wrote:
Interesting. Chromium has something kind-of similar, named "commit queue", for developers without actual commit access. Once they get an LGTM, the thing rolls automatically. In fact, core developers often find it useful too because the Chromium tree is sometimes closed ("red"). We don't really do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
Another interesting (and relevant, I think) concept from the Mozilla community is the Try Server, where you can push a work-in-progress patch to see how it does on all the platforms. I.e. it runs all the same tests that build slaves run, but the repository it works against isn't accessible publicly, so you can try your work without breaking the main tree.
Yep, Chromium has try-jobs too, thanks for reminding me. And in a previous workplace we had a similar process screwed on top of Jenkins - private test runs wherein you provide a branch to CI and the CI tests that branch. In fact, when your test may affect many different architectures, such "try jobs" are the only way to do unless you really want to build & test a branch on a few different OSes.
Once again, this almost always requires some dedicated developers for watching the tree (Chromium has sheriffs, gardeners, etc.), I'm not sure we have that for the CPython source.
Eli
On Sat, 25 Jan 2014 06:59:19 -0800, Eli Bendersky <eliben@gmail.com> wrote:
On Sat, Jan 25, 2014 at 6:54 AM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
On Sat, Jan 25, 2014 at 2:49 PM, Eli Bendersky <eliben@gmail.com> wrote:
Interesting. Chromium has something kind-of similar, named "commit queue", for developers without actual commit access. Once they get an LGTM, the thing rolls automatically. In fact, core developers often find it useful too because the Chromium tree is sometimes closed ("red"). We don't really do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
Another interesting (and relevant, I think) concept from the Mozilla community is the Try Server, where you can push a work-in-progress patch to see how it does on all the platforms. I.e. it runs all the same tests that build slaves run, but the repository it works against isn't accessible publicly, so you can try your work without breaking the main tree.
Yep, Chromium has try-jobs too, thanks for reminding me. And in a previous
So do we. We don't use them much, but that's probably because they are a relatively new feature of the buildbot farm (the 'custom' builders).
workplace we had a similar process screwed on top of Jenkins - private test runs wherein you provide a branch to CI and the CI tests that branch. In fact, when your test may affect many different architectures, such "try jobs" are the only way to do unless you really want to build & test a branch on a few different OSes.
Once again, this almost always requires some dedicated developers for watching the tree (Chromium has sheriffs, gardeners, etc.), I'm not sure we have that for the CPython source.
What do sheriffs and gardeners do?
--David
On Sat, Jan 25, 2014 at 7:13 AM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sat, Jan 25, 2014 at 6:54 AM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
On Sat, Jan 25, 2014 at 2:49 PM, Eli Bendersky <eliben@gmail.com> wrote:
Interesting. Chromium has something kind-of similar, named "commit queue", for developers without actual commit access. Once they get an LGTM,
thing rolls automatically. In fact, core developers often find it useful too because the Chromium tree is sometimes closed ("red"). We don't really do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
Another interesting (and relevant, I think) concept from the Mozilla community is the Try Server, where you can push a work-in-progress patch to see how it does on all the platforms. I.e. it runs all the same tests that build slaves run, but the repository it works against isn't accessible publicly, so you can try your work without breaking the main tree.
Yep, Chromium has try-jobs too, thanks for reminding me. And in a
On Sat, 25 Jan 2014 06:59:19 -0800, Eli Bendersky <eliben@gmail.com> wrote: the previous
So do we. We don't use them much, but that's probably because they are a relatively new feature of the buildbot farm (the 'custom' builders).
workplace we had a similar process screwed on top of Jenkins - private test runs wherein you provide a branch to CI and the CI tests that branch. In fact, when your test may affect many different architectures, such "try jobs" are the only way to do unless you really want to build & test a branch on a few different OSes.
Once again, this almost always requires some dedicated developers for watching the tree (Chromium has sheriffs, gardeners, etc.), I'm not sure we have that for the CPython source.
What do sheriffs and gardeners do?
I started replying but then remembered that it's actually all described here - http://www.chromium.org/developers/tree-sheriffs If you're interested in such things (build farms, CI, "process") that page and links from it should provide you with a lot interesting information
On Sat, 25 Jan 2014 09:35:46 -0800, Eli Bendersky <eliben@gmail.com> wrote:
On Sat, Jan 25, 2014 at 7:13 AM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sat, 25 Jan 2014 06:59:19 -0800, Eli Bendersky <eliben@gmail.com> wrote:
workplace we had a similar process screwed on top of Jenkins - private test runs wherein you provide a branch to CI and the CI tests that branch. In fact, when your test may affect many different architectures, such "try jobs" are the only way to do unless you really want to build & test a branch on a few different OSes.
Once again, this almost always requires some dedicated developers for watching the tree (Chromium has sheriffs, gardeners, etc.), I'm not sure we have that for the CPython source.
What do sheriffs and gardeners do?
I started replying but then remembered that it's actually all described here - http://www.chromium.org/developers/tree-sheriffs If you're interested in such things (build farms, CI, "process") that page and links from it should provide you with a lot interesting information
I didn't read past the first part of that, where it said "closes, throttles and opens the tree" and "tracks down people responsible for breakage". This is emphatically *not* the Zuul model, from what Nick has said. In Zull, patches don't get *in* to the tree unless the buildbots are all green with the patch applied (so, no unit-test-discovered "breakage" can occur in the tree).
Donald answered my question about flaky tests: if a flaky test causes the failure, whoever is trying to get it integrated can trigger a new run (referencing the bug that documents the flaky test), and if that run passes, the patch gets committed.
This makes much more sense to me than the 'sheriff' approach, which is essentially what we have now, albeit with no formally appointed sheriffs, and no tree closures.
--David
For the record, I find the way Openstack does it very awesome, even if I’m not a huge fan of gerrit itself. It’s also nice that repeated runs make you file a bug for flaky tests and reference it because you can see which bugs are causing the most heart ache in the test suite as far as flakiness goes. This gives you a sort of impact ordered list of things to fix up.
On Jan 26, 2014, at 10:51 AM, R. David Murray <rdmurray@bitdance.com> wrote:
On Sat, 25 Jan 2014 09:35:46 -0800, Eli Bendersky <eliben@gmail.com> wrote:
On Sat, Jan 25, 2014 at 7:13 AM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sat, 25 Jan 2014 06:59:19 -0800, Eli Bendersky <eliben@gmail.com> wrote:
workplace we had a similar process screwed on top of Jenkins - private test runs wherein you provide a branch to CI and the CI tests that branch. In fact, when your test may affect many different architectures, such "try jobs" are the only way to do unless you really want to build & test a branch on a few different OSes.
Once again, this almost always requires some dedicated developers for watching the tree (Chromium has sheriffs, gardeners, etc.), I'm not sure we have that for the CPython source.
What do sheriffs and gardeners do?
I started replying but then remembered that it's actually all described here - http://www.chromium.org/developers/tree-sheriffs If you're interested in such things (build farms, CI, "process") that page and links from it should provide you with a lot interesting information
I didn't read past the first part of that, where it said "closes, throttles and opens the tree" and "tracks down people responsible for breakage". This is emphatically *not* the Zuul model, from what Nick has said. In Zull, patches don't get *in* to the tree unless the buildbots are all green with the patch applied (so, no unit-test-discovered "breakage" can occur in the tree).
Donald answered my question about flaky tests: if a flaky test causes the failure, whoever is trying to get it integrated can trigger a new run (referencing the bug that documents the flaky test), and if that run passes, the patch gets committed.
This makes much more sense to me than the 'sheriff' approach, which is essentially what we have now, albeit with no formally appointed sheriffs, and no tree closures.
--David
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers
Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 27 January 2014 02:01, Donald Stufft <donald@stufft.io> wrote:
For the record, I find the way Openstack does it very awesome, even if I’m not a huge fan of gerrit itself. It’s also nice that repeated runs make you file a bug for flaky tests and reference it because you can see which bugs are causing the most heart ache in the test suite as far as flakiness goes. This gives you a sort of impact ordered list of things to fix up.
Elastic recheck (http://status.openstack.org/elastic-recheck/) is also spectacularly impressive, and something I would love to have for CPython. However, baby steps :)
Cheers, Nick.
P.S. https://www.youtube.com/watch?v=xGLMe2dyx6k goes into more detail on how Elastic recheck works
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I suggest that we start with automating doc-only patches.
The need is great. The backlog of Documentation patches is nearly
I am sure that the high overhead per patch has something to do with this. (I suspect I would do more if allowed to fix multiple issues in one patch, to spread the overhead.) It has been written somewhere that there is a set of doc maintainers that will turn suggestions into patches, if necessary, and apply them. On a day-to-day basis, that set seems to be empty now.
A doc patch queue seems slightly easier. Rietveld integration would not have to be part of the first cut. It should also be less impactful. It would not need exclusive access to the repository except for a short period each day (pick the least used hour of the day). On startup, the system could post a message to this list "Auto doc patching started. Repository closed." A 'repository open' message would follow when done. At other times, patches could be pre-tested on a clone and only passing patches queued for a final test and push.
Terry
On 1/25/2014 9:54 AM, Dirkjan Ochtman wrote:
On Sat, Jan 25, 2014 at 2:49 PM, Eli Bendersky <eliben@gmail.com> wrote:
Interesting. Chromium has something kind-of similar, named "commit queue", for developers without actual commit access. Once they get an LGTM, the thing rolls automatically. In fact, core developers often find it useful too because the Chromium tree is sometimes closed ("red"). We don't really do the latter in Python, which carries a problem we'll probably need to resolve first - how to know that the bots are green enough. That really needs human attention.
Another interesting (and relevant, I think) concept from the Mozilla community is the Try Server, where you can push a work-in-progress patch to see how it does on all the platforms. I.e. it runs all the same tests that build slaves run, but the repository it works against isn't accessible publicly, so you can try your work without breaking the main tree.
What would be very useful would be the ability to run a single test case function for a bug report on 2.7, 3.3, and 3.4 on mac, linux, and window in order to delineate the scope of an issue. Right now, the op or first developer mush hope that people with different systems will show up and run the bug test on the three branches.
Terry
On Jan 25, 2014, at 03:51 PM, Nick Coghlan wrote:
Rather than leaving my ideas undocumented until the language summit in April, I wrote up what I see as the critical issues in our current workflow and how I believe Zuul could help us resolve them as a PEP: http://www.python.org/dev/peps/pep-0462/
Without consideration of the amount of work involved and all the devilish details, I am totally in favor of this plan. JFDI. :)
-Barry
participants (8)
-
Antoine Pitrou
-
Barry Warsaw
-
Dirkjan Ochtman
-
Donald Stufft
-
Eli Bendersky
-
Nick Coghlan
-
R. David Murray
-
Terry Reedy