<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style></head><body lang=EN-US link=blue vlink="#954F72"><div class=WordSection1><p class=MsoNormal><span lang=EN-AU>We probably have enough data on the VSTS builds by now to see whether they are comparable/faster than AppVeyor. Obviously the idea of doing that work was to be able to migrate builds if it made sense, and if we decide not to then they get ripped out (non-binding PR checks are confusing IMHO, particularly when they duplicate required checks).<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-AU><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-AU>I have no idea whether that discussion is still ongoing on core-workflow, but if it seems better then maybe it’s time? Anyone can view the VSTS build history starting from <a href="https://python.visualstudio.com/cpython/_build">https://python.visualstudio.com/cpython/_build</a> and browsing into the build definition of interest.</span></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Top-posted from my Windows 10 phone</p><p class=MsoNormal><o:p> </o:p></p><div style='mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal style='border:none;padding:0in'><b>From: </b><a href="mailto:vstinner@redhat.com">Victor Stinner</a><br><b>Sent: </b>Sunday, June 3, 2018 14:47<br><b>To: </b><a href="mailto:tjreedy@udel.edu">Terry Reedy</a><br><b>Cc: </b><a href="mailto:python-committers@python.org">python-committers</a><br><b>Subject: </b>Re: [python-committers] Wrongly stopping merges discourages merging.</p></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>2018-06-03 22:23 GMT+02:00 Terry Reedy <tjreedy@udel.edu>:</p><p class=MsoNormal>> Exhibit 1. For at least a couple of weeksin may, faults in the asyncio test</p><p class=MsoNormal>> (and another) caused the asyncio test to randomly fail about half the time.</p><p class=MsoNormal>> With one retest, each CI bot failed about 1/4 the time. At least one bot of</p><p class=MsoNormal>> the two bots failed about 1/2 the time. The AppVeyor queue ballooned.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I only told a few core developers: in February, I had a burn out and I</p><p class=MsoNormal>stopped everything related to Python during 3 months. I only restarted</p><p class=MsoNormal>slowly to contribute to Python in May. I moved to a new team inside</p><p class=MsoNormal>Red Hat, and my job is now to maintain Python for Red Hat.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>When I saw that Ned Deily had troubles to get a release two weeks ago,</p><p class=MsoNormal>I looked a the status of the CI. As I expected, the CIs became again</p><p class=MsoNormal>very unstable. A CI is a puppy, if nobody cares of it, it dies slowly</p><p class=MsoNormal>:-( I'm sure that many core devs fixed dozens of bugs, but the</p><p class=MsoNormal>annoying part are tests which only fail randomly. It's hard to spot</p><p class=MsoNormal>them and hard to debug them. If you have a single flaky test, it's</p><p class=MsoNormal>fine. When you have two or three of them, slowly the failure rate</p><p class=MsoNormal>becomes larger than 25% and then 50%...</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> One could decrease the frustration and time to success (but only partly) by</p><p class=MsoNormal>> only re-starting the bot that failed. Doing so for Travis is fairly easy.</p><p class=MsoNormal>> Doing so with AppVeyor is obscure and error prone.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Ah? From a GitHub PR, I click on the failed AppVeyor job (click on</p><p class=MsoNormal>"details"): at the top, there is a "[Re-build PR]".</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I logged into App Veyor one or two weeks ago, thanks to a cookie,</p><p class=MsoNormal>AppVeyor now remembers me :-)</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>It's just two clicks once you are logged in, no?</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Are you confused by the [New build] button? I never used that one :-)</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> I twice requested that the randomly failing tests be disabled. Victor said</p><p class=MsoNormal>> he wanted to keep monitoring what they did. I think he overly discounted</p><p class=MsoNormal>> the pain and frustration of having good merges blocked. I think either 1)</p><p class=MsoNormal>> bad tests should be disabled, or 2. the CI code should be able to ignore</p><p class=MsoNormal>> failures by bad tests, or 3. responsible core devs should be able to.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>My rationale is that once a test is disabled: everybody quickly</p><p class=MsoNormal>forgets and we will keep the test as disabled for the next 5 years.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Just one example: I skipped test_ftplib.test_check_hostname() 5 months</p><p class=MsoNormal>ago... Ok, who looked at this *failing* test?</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>There is an open issue, right:</p><p class=MsoNormal>https://bugs.python.org/issue32706</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>But who cares of this issue?</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Ok, let's come back to asyncio: one asyncio test started to fail,</p><p class=MsoNormal>right. Yury Selivanov, Andrew Sveltov and me spend a lot of time to</p><p class=MsoNormal>look at these tests. The sendfile tests were unable and have been</p><p class=MsoNormal>fixed. But the SSL test was weird. In fact, it wasn't a bug in the</p><p class=MsoNormal>test, but in asyncio directly!</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>https://bugs.python.org/issue33674</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>So the test helped us to spot a very tricky race condition. I prefer</p><p class=MsoNormal>that we suffered a few days than an user had such bug in production...</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> Exhibit 2. AppVeyor is badly broken.</p><p class=MsoNormal>><o:p> </o:p></p><p class=MsoNormal>> This morning Cheryl Sabella submitted a nice patch fixing an annoying</p><p class=MsoNormal>> behavior of IDLE's editor/shell/output windows. The CI tests passed, the</p><p class=MsoNormal>> patch worked great, it only needed expansion of the placeholder blurb. I</p><p class=MsoNormal>> was really excited.</p><p class=MsoNormal>><o:p> </o:p></p><p class=MsoNormal>> With some trepidation, I made the edit. Unfortunately, both CI bots rerun</p><p class=MsoNormal>> the code tests even when the code is unchanged. Blurb edits should be</p><p class=MsoNormal>> treated as doc-only changes, with only the blurb rechecked.</p><p class=MsoNormal>><o:p> </o:p></p><p class=MsoNormal>> My trepidation turned out to be well-founded. My excitement is gone. After</p><p class=MsoNormal>> an error, AppVeyor just quit without reporting any failure cause.</p><p class=MsoNormal>> https://ci.appveyor.com/project/python/cpython/build/3.8build16869</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>It's the first time that I see such weird behaviour on AppVeyor.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Would you mind to open a new issue to track it, please?</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> Guido once asked what is off-putting about being a core developer. This is one thing.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>There are different options:</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>* Make AppVeyor faster: can we pay to get parallel builds? can we just</p><p class=MsoNormal>ask to get more parallel builds? run less tests? (ex: as Zach wrote,</p><p class=MsoNormal>disable the "largefile" resource)</p><p class=MsoNormal>* Make AppVeyor non blocking on pull requests</p><p class=MsoNormal>* Remove AppVeyor</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>If possible, I would prefer to keep a blocking CI on pull requests.</p><p class=MsoNormal>Each time we disabled AppVeyor, Windows quickly became broken.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>VSTS might be a solution here, but I simply ignored this CI, since I</p><p class=MsoNormal>was busy enough to fix bugs on the other CIs.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Fixing CI failures is not a funny job and it's not rewarding. But it's</p><p class=MsoNormal>the price to pay to get an excellent quality.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>If you ask my opinion, I prefer that everybody stop working until the</p><p class=MsoNormal>CI is repaired. Skipped tests only slowly reduce the quality.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Victor</p><p class=MsoNormal>_______________________________________________</p><p class=MsoNormal>python-committers mailing list</p><p class=MsoNormal>python-committers@python.org</p><p class=MsoNormal>https://mail.python.org/mailman/listinfo/python-committers</p><p class=MsoNormal>Code of Conduct: https://www.python.org/psf/codeofconduct/</p><p class=MsoNormal><o:p> </o:p></p></div></body></html>