
On 1/25/2014 2:55 PM, Antoine Pitrou wrote:
On sam., 2014-01-25 at 06:35 -0800, Eli Bendersky wrote:
On Sat, Jan 25, 2014 at 6:14 AM, R. David Murray <rdmurray@bitdance.com> wrote: On Sat, 25 Jan 2014 05:49:56 -0800, Eli Bendersky <eliben@gmail.com> wrote: > do the latter in Python, which carries a problem we'll probably need to > resolve first - how to know that the bots are green enough. That really > needs human attention.
By "that needs human attention", do you mean: dealing with the remaining flaky tests, so that "stable buildbots are green" is a binary decision? We strive for that now, but Nick's proposal would mean we'd have to finally buckle down and complete the work. I'm sure we'd make some new flaky tests at some point, but in this future they'd become show-stoppers until they were fixed. I think this would be a good thing, overall :)
Non-flakiness of bots is a holy grail few projects attain. If your bots are consistently green with no flakes, it just means you're not testing enough :-)
There are certainly statistical ways to workaround the "necessary flakiness", but that would require someone to sit with a pen and paper a bit and figure out what the right metrics should be :-)
If I run the test suit twice and a particular gives different results, then it is not purely a test of CPython and not-passing is not necessarily a CPython failure. That to mean that the buildbots should not be red. Perhaps purple ;-). More seriously, an intermittent timeout failure might be recorded as an unexpected or perhaps an 'undesired' skip rather than as a test failure. A test failure should indicate that CPython needs to be patched, not that the test system, including the internet, flaked out.
Terry