[Python-Dev] Community buildbots and Python release quality metrics
glyph at divmod.com
glyph at divmod.com
Thu Jun 26 21:32:10 CEST 2008
I do tend to ramble on, so here's an executive summary of my response:
I want python developers to pay attention to the community buildbots and
to treat breakages of existing projects as a serious issue. However, I
don't think that maintaining those projects is the core team's job, so
all I'm asking for is for core developers to:
* treat breakages of 3rd party packages as a potentially serious issue,
* if possible (i.e. if they find out about the breakage soon enough,
which should be the case in any pybots failure) revert the change that
caused the problem until the problem can be fixed, and
* notify 3rd party maintainers when it's decided that the breakage will
not be fixed.
This only applies to breakages that the core developers find out about,
which for all practical purposes means the ones on the community
builders page.
Those of you looking for point-by-point responses and some repetition of
the above points, enjoy :).
On 05:03 pm, guido at python.org wrote:
>On Thu, Jun 26, 2008 at 9:21 AM, <glyph at divmod.com> wrote:
>>On 03:33 pm, guido at python.org wrote:
>>>It needs to be decided case-by-case.
>>(misunderstanding)
>No, I just meant that we need to figure out for each 3rd party test
>that fails whether the failure is our fault (too incompatibile) or
>theirs (relied on undefined behavior) and what the best fix is (change
>our code or theirs -- note that even if it's there fault there are
>cases where the best fix is to change our code.
This is basically fine, as far as I'm concerned.
I would like to suggest, however, that these issues be dealt with as
soon as possible, rather than waiting for the release process to begin.
A lot of decisions are made on this mailing list about the supposed
properties of "average" python code, without any actual survey of said
code. Sometimes the results of that survey can be really surprising.
The end goal of any particular compatibility policy, of a distinction
between "public" and "private" APIs, and so on, is to keep code working.
>I'm sorry if your interpretation of the terminology is different, but
>this is mine and this is what we've always used, and it's not likely
>to change. (At least not for the 2.6/3.0 release.)
I have no problem with your definitions of these terms. I think that
they should probably be in PEP 101 though. Would you accept a patch
that added an edited / expanded version of this paragraph?
>>Still, I'm bringing this up now because it _is_ a beta,
>Absolutely correct. The RCs are hoped to be as good as the final
>release. *Now* is the time to bring up issue.
Well, that's good, at least :)
>But please bring up specific issues -- I don't want to have an
>extended discussion about process or quality or expectations. I just
>want the code to be fixed.
Well, one specific issue has been bumped in priority as a result of this
thread, and others are under discussion. The code is getting fixed.
>>(I just care that I stop having problems with incompatibility.)
>
>And here we seem to be parting our ways. We have a large amount of
>process already. I don't want more.
Looking at it from my perspective, I'm proposing a reduction in process.
Under the current process, if a buildbot goes red, the developer makes a
judgment call, the release manager makes a judgment call, there's
discussion on a ticket, a ticket gets filed, it gets promoted, it gets
demoted, the RM forgets to re-promote it...
My suggestion is that the process be, simply: if a buildbot (community
or otherwise) goes red, the change that broke it gets reverted. No
questions asked! It's still there in the revision history, ready to be
re-applied once the issues get worked out. Discussion can then take
place and case-by-case judgments can be applied.
>If you're talking about community buildbots (which I presume are
>testing 3rd party packages against the core release) being red, that's
>out of scope for the core developers.
I don't necessarily think that keeping the community buildbots red is
the core developers' responsibility, but I don't think it should be
entirely out of scope, either. The python test suite is, frankly, poor
- and I hope I'm not surprising you by saying that. It's full of race
conditions, tends to fail intermittently, and is frequently ignored.
Not only that, but it is quite often changed, so tests for issues that
affect real code are quite often removed. So, the community buildbots
are not just making sure that 3rd-party code still works, they are an
expanded, independently developed test suite to make sure that *python
itself* still works. Sometimes they will not fill that role
particularly well, but they are worth paying attention to.
If python had a good, exhaustive regression test suite that was
immutable between major versions, I'd probably feel differently. But
that's not the world we live in.
Right now, apparently, the *default* policy is that if the community
buildbots go red, later, before a release, someone will maybe take a
look at it. I'd suggest that the *default* policy ought to be that if a
particular changeset breaks a community buildbot, it needs further
examination before being merged to trunk.
However, this is just the way I prefer to do development; if you think
that would slow things down too much, the only thing I'm _really_ asking
for is a clear statement that says "there should be no test failures on
community buildbots that have not been explicitly accepted before a
final release". I'm not even sure what "explicitly accepted" means -
you have to sign off? the release manager, maybe? A discussion on this
list? I don't really care, as long as _somebody_ does.
Right now, my impression of the process is this:
* The community buildbot goes red; no core developer looks at it.
* If the project is Twisted, JP fixes the bug on Twisted's end.
* If the project is Django, nobody notices.
* Months later, a beta goes out. A few people try it out and report
some bugs, but don't really understand the output. A good number go un-
triaged.
* A little while later, a final release comes out. Many projects are
broken as a result.
This is not a hypothetical concern. This is what happened with 2.5;
Twisted was broken for months, and Zope *to this day* does not support
Python 2.5. 2.6 looks like it's headed for the same trajectory. To be
clear: this is with all of Python's _own_ tests passing, so it is
specific to paying attention to community buildbots. (And the community
buildbots only build django and twisted right now. I'm not talking
about a massive pan-galactic survey of all possible python projects.
I'm only talking about those popular enough to make this select list.
Which should still be a slightly longer list, but I digress...)
>Some of the
>core buildbots are red because, well, frankly, they run on a cranky
>old version of an OS that few people care about.
On Twisted, we have a distinction between "supported" and "unsupported"
platforms, to provide the ability to run on platforms which aren't
really supported and don't really run the whole suite, but we are
nevertheless interested in. I don't believe the setup is too hard and
we'll definitely help out with that if you want to do it. (I believe
Thomas Herve volunteered to do this at PyCon...)
>I hope the community buildbots can be used the same way: a red bot
>means someone needs to look into an issue. The issue could be with the
>core or with the 3rd party package being tested. I don't think a
>policy like "no community buildbots should be red" makes any sense.
These bots have been red for months. The issues exist, but have not
been looked into. As a result, Barry made a specific commitment on a
ticket (i.e. "this should block beta 1") which was not met. I think
_something_ has to be changed to encourage people to do this more
immediately or more seriously.
>Whoever made what change? You can't seriously expect core developers
>investigating issues with 3rd party packages, no matter what the
>cause. The correct process is that someone who cares about the 3rd
>party package (could be an end user, a developer of that package, or a
>core developer who happens to care) looks into the issue enough to
>diagnose it, and then either proposes a fix or files a bug with the
>likely culprit, which could be the core or the 3rd party package. If
>nobody cares, well, that's open source too.
If the breakage is calculated and expected, and the benefits clearly
oughtweigh the costs... oh well, too bad for the 3rd party people. It
would be nice if the core developers would notify the third party if
they find out about it so that it can be verified that the change in
question wasn't obscuring some *other* problem, but from what I've seen,
the breakages that I have been concerned about have not been
intentional, calculated changes, but side-effects of other things.
I'm talking about the case where the breakage reveals either a bug in
Python, or an unintentional / side-effect change in behavior, which is
surprisingly frequent.
More information about the Python-Dev
mailing list