[core-workflow] Tracker workflow proposal

Tue Apr 22 06:06:53 CEST 2014

Hi,

On Mon, Apr 21, 2014 at 7:04 PM, R. David Murray <rdmurray at bitdance.com> wrote:
>
> I'm volunteering to be the coordinator for the work, and I'm also
> volunteering to do as much of it as necessary.  That is, I'm planning
> to make this the focus of most of the "python time" I have available.
>

My Python time lately is very limited (it might get better
during/after the summer), but as one of the main maintainer of the
tracker I'll try to follow and give my input wherever is possible.  If
someone wants to work on the tracker I can also provide guidance/help
(you can ping me on IRC).

> I think that there is a way to implement this workflow or whatever
> workflow we decide on as a "new interface" while keeping the existing
> interface available, allowing us to test and refine (like, maybe do some
> usability testing? :).  Roundup has a way to dynamically specify which
> page template is used to render a particular page type, and I think we
> can leverage that to have two parallel UIs.  I could be wrong, though,
> in which case we'd need to set up a test tracker instead, which I can do.
> (Assuming it does work, there might need to be a bit of glue code to
> keep things in sync.  And there would need to be some changes in the
> data values available in the current UI, but I don't think that would
> be a bad idea anyway.)
>

This should be doable.  We can have a separate page with a completely
different template and layout (and things like js
folding/autocomplete/etc), and as long as the values (for e.g. status,
resolution, etc.) are the same everything should work fine.  If you
want to change the values then it gets a bit more tricky.
Roundup also supports Jinja templates now, so we could use them if it
makes things easier (the alternative is TAL).
Matching the www.python.org style (and update to html5) can also be
done, but that's a different issue.

> The scope of this discussion is the workflow for an "issue", which
> currently means an entry created by someone in the bug tracker at
> bugs.python.org, and goes from the creation of the issue to the resolution
> of the issue, which can but does not necessarily include committing
> something.  I won't be discussing the tooling for Zuul-like patch gating
> at this stage, I'll just assume we can figure out how to implement that.
>
> It would also be possible to build a patch gating system without
> initially integrating it with our other tools, and if we decide that's
> more important than the tracker workflow at this stage, we should switch
> our conversation to what that would look like.  Or perhaps we can work
> on the two in parallel.
>
> My own feeling is that in order to get maximum benefit out of patch
> gating, we need more clarity and utility in our *tracker* workflow first,

Having better integration between roundup and rietveld would be a good
step in that direction.
Having Roundup be more aware of patches (if they are patches or other
attachments, what files they touch, on what branches they apply
cleanly) would be another good step. I did some work on this
(especially on analyzing patches) that would be good to be integrated
with the tracker.

> but I do realize that how long it currently takes to commit a patch is
> a significant pain point, and it might be better to address that first.
>
>
>
> Goals
> -----
>
> In suggesting improvements to the existing workflow support, I'm
> starting from the fundamental idea that anything we do in the issue
> tracker and/or patch gating system should have either *operational*
> implications or *important* informational implications.  The former
> should be more heavily weighted than the latter, in my opinion.
>
> What I mean by this is that there should be no "busy work" setting
> of attributes in our workflow: when you change the state of an issue,
> it should mean that something new is *going* to happen to that issue.
> Any purely informational attribute setting should be in support of that
> new action, and wherever possible that information should *mean* something
> to some part of our tooling, not *just* to the humans looking at the issue.
>
> In general I want there to be as few clicks as practical involved in
> updating an issue.  However, I'm not addressing the actual UI design here
> (although I have ideas); I assume we are going to iterate on it for a
> bit until we have something we really like.
>
> NB: This proposal includes a number of ideas from the "desired tracker
> features" (http://wiki.python.org/moin/DesiredTrackerFeatures), but
> by no means all of them.  Some of the others are worth implementing,
> but in this document I'm focusing on primary workflow, which I think
> should happen first.
>
>
>
> Roles
> -----
>
>
> Conceptual Roles
> ~~~~~~~~~~~~~~~~
>
> In thinking about our workflow, I identify the following roles:
>
>     original reporter
>     commenter
>     triager
>     patch-producer
>     reviewer
>     core reviewer
>     committer
>
> These are in the order that the role is involved in the issue (roughly;
> variations are possible depending on the issue), and obviously a single
> person can take on multiple roles during the lifetime of a patch.
>
> I originally thought this list would have operational implications,
> but it turned out to be only an aid in thinking about the problem.
> I'm leaving it in here for exactly that purpose...it helps when thinking
> about the states and state transitions.
>
> I'd also like to add something that we currently only have informally,
> but which has been requested as a feature more than once: at every stage,
> I'd like there to be the possibility of there being a 'responsible party'.
> This is sort of like 'assigned to', except that it could be anyone with
> a tracker account, and the assignment would have a limited lifetime:
> either until the issue's state changes, or until the issue has been
> without update for some number of days (off the cuff I'd suggest 14,
> but it might also vary by state depending on what states it actually
> got used in).
>

I believe twisted uses a similar workflow, after a review the reviewer
assigns the issue to whoever made the patch, and when the patch is
updated the issue is re-assigned to the reviewer.  This also applies
if e.g. more information are needed.
The ideas is that the "assigned to" field should always point to the
person responsible for the next step that will move the issue forward.

> The idea behind this is that we have eager contributors who either
> wind up stepping on each other's toes, or rush to create and submit a
> patch before someone else does, and as a consequence of rushing, do not
> produce as good a patch as they are capable of, which actually slows
> down the issue resolution.  The ability to "take" a task and know it is
> "yours" is an important part of the new contributor process, and having a
> "responsible party" field would support that.
>

Is stepping on each other's toes a common issue?
Note that "taking" a task might have the opposite effect (it's not
difficult to find weeks/months-old messages that state "they will work
on a patch soon").  Having two patches might be better than having
zero.

>
> Tracker Roles
> ~~~~~~~~~~~~~
>
> The tracker roles important to the workflow are:
>
>     User
>     Triager ('Developer')
>     Committer
>
> Although the tracker calls the Triager role 'Developer', I'm going
> to refer to it as Triager throughout this document, for clarity as to
> its intent.
>
> The tracker does not currently have a role equivalent to 'committer', but
> we may not need to add one explicitly, since the account of a committer
> is marked as such.
>
> The important thing to understand about these roles in the context of
> this document is that anything a user can do, a triager or committer
> can do, and anything a triager can do, a committer can do.
>
>
>
> Information Fields
> ------------------
>
> An issue has an issue number and a title, and those wouldn't change.
> It also has a state, which is the subject of the second half of this
> document, and would likely be a new field in order to be able
> to have both UIs available (this field is where the glue code
> would be needed).
>
> Beyond those fields, I suggest several changes to the issue metadata;
> I will cover each one separately.
>
>
> Versions
> ~~~~~~~~
>
> Currently our versions field does double duty: we set it to the versions
> we want to fix the bug in, and then deselect versions as we fix them.

Most of the issues are fixed on all branches at the same time, so
deselecting is IME a rare occurence.

> However, I noticed when doing the "What's New" document that this
> makes things really confusing if you *do* look at the ticket later.
> You can't tell which versions got the fix without reading through the
> entire issue.  So I'd like to split this into two fields:
>
>     should be fixed in: [list of versions]
>     has been fixed in: [list of versions]
>

FWIW while triaging is sometimes select versions that are /likely/ to
be affected by a bug, without actually checking, so "should be fixed
in" might not always be accurate (unless we decide that it should be
and leave version unselected until we have verified that they are
affected).

> I'd also like patch set links to be displayed next to the versions for
> 'has been fixed'.  When a commit references an issue, it should appear
> next to the appropriate version, but it should *not* automatically
> change the version state to fixed.

It could also work the other way around: after (or before) the list of
patches, we can have a list of related changesets that is updated
automatically by the hg hook (or detected from the python-dev
messages).  Next to the changeset we could put the version, and
possibly a checkbox to indicate that the changeset fixed the issue --
however I don't think this is necessary.
If the issue has some related changesets and it's open, then more work
is required (either because not all the versions are fixed, or because
it was a preliminary set of changes that will be followed by another
set).  If it's closed, the listed changesets fixed the issue.
This is to say, I'm not sure that separating versions in two different
lists is a good idea, since it adds more fields and more work for the
triagers/committers.

>  That should require separate action,
> since we sometimes apply patches that are relevant to the issue but
> do not fix it.  When an issue is transitioned to closed, there should
> be an easy way to say "mark all versions with changesets as fixed".
> It would probably be appropriate for that to be the default.
>

This could be a solution to the "more work for triagers/committers".
We could also have a single list of versions and transition from
affected to fixed.

> These fields have direct operational meaning: they indicate a task
> to be performed or signal the completion of a task.
>
>
> Type
> ~~~~
>     documentation
>     python bug
>     interpreter crash
>     security
>     enhancement request
>
> I add 'documentation' here based on PyCon feedback from Jessica McKeller
> and Selena Deckelmann.  None of the existing types makes sense to a
> beginner for a documentation bug, and the resulting confusion can lead
> someone to abandon the effort to contribute.  It also has operational
> implications (see below).
>

This seems reasonable to me.

> I rename 'behavior' to 'python bug' for a similar reason, but I don't
> have any usability data to back that feeling, so I'm not strongly
> advocating that change.
>

Why not just bug?

> I rename 'crash' to 'interpreter crash' in an attempt to reduce the
> chances that someone will report a python traceback as a crash, something
> that otherwise happens very regularly.  I'm sure we'll never completely
> eliminate those.
>
> I drop 'compiler error', 'resource usage' and 'performance'.  All of
> these are bugs in the minds of the reporters, and classifying them
> separately in the 'type' field does not, as far as I can see, lead to
> any operational difference in the way the issue is treated.
>
> What I mean by that: most bugs are either 'documentation' or 'python bug'
> or 'enhancement request'.

Just floating an idea: what if we only keep these 3 (or maybe just
bug/enhancements) and use tags for things like performance, security,
crash, documentation, etc?
We could have a crash that can turn in a security issue, or resource
usage that causes bad performances and with tags this could be
described easily (and we could use them instead of components too).
This is quite a big change, but other bug tracking systems use them
quite successfully (to the point of having tags (possibly divided in
categories and using different color codes) tracking versions, types,
priority, etc.).

> Differentiating these is important, as doc
> bugs are handled very differently from python bugs (documentation fixes
> do not get NEWS entries, for example), and bug vs enhancement determines
> which versions we fix things in.  'compiler error', 'resource usage', and
> 'performance', on the other hand, are all handled with the same workflow
> that applies to bugs.  You might think that there is a difference for
> resource usage and performance, in that we don't in general backport
> those fixes.  The key there, though, is "in general".  The decision as to
> which versions to apply the fix is made based on the magnitude of the bug,
> and there really are only two cases: we mostly don't backport, as is the
> case for enhancements, but we sometimes do backport, just like we would
> a bug fix.  The same applies in reverse to compile bugs: we mostly fix
> those in all versions, but we don't always.  So the only *operational*
> effect of having these as distinct types would be to confuse things,
> since we couldn't tell from the type whether or not this was something
> that should be applied to all versions or only default.  Instead we
> should set (or, more likely, triage) them as either 'python bug' or
> 'enhancement', which have the correct operational implications.
>
> There *is* an operational reason for having security and interpreter
> crash as separate types.  In both of these cases the versions we fix
> it in is always the same (all active for crashers, all active + all
> security-only for security bugs), and the issue priority should default
> to either high or release blocker.  In addition, security bugs should
> be automatically reported to the PSRT, and arguably the report should
> be be hidden from all but the PSRT and the original reporter.
>

See http://psf.upfronthosting.co.za/roundup/meta/issue393 and
http://psf.upfronthosting.co.za/roundup/meta/file275/issue393.png )
for a meta-tracker patch that warns about the PSRT while reporting
security issues.

>
> Priority
> ~~~~~~~~
>
>     low
>     normal
>     high
>     deferred blocker
>     release blocker
>
> The only change here is to eliminate 'critical'.  I'm not wedded to
> that, but if we have both 'critical' and 'high' priorities, 'high'
> ends up getting treated as pretty much equivalent to either 'normal' or
> 'critical', depending on the person.  I would argue that anything that
> is severe enough to be marked 'critical' should in fact be a release
> blocker, and anything that is not a release blocker is effectively only
> 'high' priority.
>

SGTM.

> The priority is currently operational only in that one can sort issues by
> priority.  I propose that we make them much more operational, by posting
> a count and/or list of the bugs with more than normal priority somewhere
> public on a regular basis, such as python-dev, python-committers,
> and/or #python-dev.  (Well, definitely #python-dev.)
>
> I'd like to see us strive keep those queues clear, so that promoting a
> bug to high or release blocker means it *will* get acted on reasonably
> promptly.  (This raises the issue of what to do about bugs we currently
> mark as "release blocker" as a *reminder* to the release manager.  I don't
> have a proposal for that at the current time, as release management is
> out of scope for this document, but we'll need an answer if we are going
> to implement this.)
>

The release managers of the affected versions are already
automatically added to the nosy list of the issue if the priority is
set to release blocker (I think the script should be updated for 3.5).

>
> Component
> ~~~~~~~~~
>
> I propose that we completely change the nature of this item.  I think we
> should make it a text field that is filled in by autocomplete like the
> nosy field can be, and that the value be any of the components listed
> in the experts file.  This would further respect how the experts have
> listed themselves in that file by autonosying those that are willing
> for that to happen.  The field should still be a multilink (that is,
> can contain more than one value).
>

This sounds similar to the tag system I mentioned above.

> This change would mean that it would be possible to search for issues
> based on module, which is an often-requested feature.
>

I'm still not sure having a "tag" for each module would be a good
idea.  Taking them from the experts index would at least solve the
problem of modules that are not available on specific versions or that
have been renamed.
One thing I would like to do is getting the affected module from the
patch (see http://wolfprojects.altervista.org/issues.html ), but that
only work when a patch is there.  If we add per-module tags, we could
add them automatically when a patch is submitted.

>
> Patch Status
> ~~~~~~~~~~~~
>
> This is a new set of fields that records information about the patch:
>
>     unit test
>     fix
>     documentation changes
>     news entry
>     what's new entry
>     commit message
>
> For each of these, the value could be 'needs work', 'complete', or
> 'not applicable'.  For issues of type 'documentation', all lines
> except 'documentation changes' and 'commit message' would be set to
> NA by default, otherwise they will be set to 'needs work' initially,
> except for "what's new", which will be set to NA for anything except
> type enhancement.
>

This sounds like a lot of new field / UI clutter / work for the
triagers, even though this could be automated somehow by looking at
the files touched by the patch.

> Note that the inclusion of 'news entry' and 'commit message' assume two
> things: that we retool the NEWS file so that NEWS entries can be added
> automatically without conflicts, and that we change our patch workflow
> to use mercurial's capability to accept 'hg export' patches and/or some
> other sort of pull request.  This part is one place we get into
> non-roundup territory, so that may be a phase two addition.
>
> Mercurial's 'evolve' feature, which we saw demoed at PyCon and which I
> understand is currently available as a plugin, makes this capability much
> more useful.  When the non-core committer syncs the master repository
> after their patch has been committed, the right thing happens to the
> history in their repository to make their local commit disappear in
> favor of the commit made to master.  The contributor will also be able
> to essentially *edit* the patch in their local repository based on the
> review feedback, without ending up with a chain of commits in their
> local repo that they have to deal with.  Evolve should also facilitate
> DVCS-based collaboration between core and non-core developers, as well as
> between non-core developers.  (I'm assuming all this works with exported
> changesets, since in the demo they talked about starting with a patch
> received via email, which is their normal workflow.  But we should double
> check this.)
>
> These fields allow us to represent all the parts that a patch must have
> to be complete, and they act as a checklist for making sure the parts
> are all there.  This is similar in intent to patchcheck, but since which
> pieces are needed is ultimately determined by a human, it is more accurate
> than that part of patchcheck and therefore more useful.
>

Maybe this could be implemented in the patch list, as a series of ✓
added automatically:
File name   |fix|tst|doc|nws|
------------+---+---+---+---+
patch1.diff | ✓ |   |   |   |
patch2.diff | ✓ | ✓ |   |   |
patch3.diff | ✓ | ✓ | ✓ | ✓ |

>
> Stage
> ~~~~~
>
> This goes away, subsumed into state.
>
>
> Status
> ~~~~~~
>
> This either holds the state information, or goes away in favor of a new
> 'state' field.
>
>
> Keywords
> ~~~~~~~~
>
>     buildbot
>     easy
>     stuck
>
> buildbot and easy are the only two existing, non-redundant tags that I can
> see a way to make operational.
>
> 'easy' of course has to be set manually.  The system for listing easy
> issues should list the issues only when there is an action that someone
> could take on the issue (clarify issue, write patch, review patch).
> (NB: it might be possible to come up with a better name for this tag.)
>

Having an icon for issues with patches in the issue list (similar to
the attachment icon in the mail list) would easily show what issues
still don't have a patch.

> For buildbot, those issues should be displayed in a report or dashboard,
> and we should try to keep this queue empty.  This becomes even more
> important when we have a patch gating system online.
>

First we should turn all the buildbot green and  have email
notification to python-checkins whenever a stable buildbot turns red.

> All of the other values are informational-only, and in some cases
> redundant.  If we want informational keywords, IMO we should do a full
> user-accessible tagging system.  That would be a separate proposal,
> though.
>

Do you mean that users can create their own tags on the fly?  I would
rather define a fixed set of available tags if we go down this route.

> Note: I claim the regression tags are redundant because I think
> regression issues should be release blockers.  (They are also ambiguous:
> is '3.3regression' a bug in 3.3 that is a regression relative to 3.2,
> or a 3.3.x bug relative to 3.3.x-1, or a 3.4 bug that is a regression
> relative to 3.3?  Better to just explain exactly what is going on in
> the issue comments, IMO.)
>
> The purpose of the 'stuck' tag is to label issues that we agree are
> real issues that we don't want to close as "won't fix", but that we
> can't for one reason or another currently move forward.  Operationally,
> stuck issues are either not displayed in work queues (see below) or are
> displayed at the end of the queue.
>

Is this a replacement for languishing?

> I think the keywords should be settable by anyone, if they aren't already.
>

They should be, I think only the stage is not settable by everyone
(and it should be IMHO).

>
>
> Issue States and State Transitions
> ----------------------------------
>
> Here are the states that I think an issue can be in:
>
>     new
>     needs information
>     needs decision
>     needs decision from committer
>     needs patch
>     needs review
>     needs patch update
>     needs commit review
>     closed/fixed
>     closed/wont fix
>     closed/duplicate
>     closed/insufficient information
>     closed/postponed
>     closed/not a bug
>     closed/third party
>
> I discuss each of these, and the possible state transitions, below.
>

These are a lot of values to add, and the workflow you suggest seems
quite complex.
OTOH it seems to have operational advantages, so it might be worth
giving it a chance, especially if it has a good UI,

> All legal state transitions should be displayed in a particular
> area in the UI, and each transition should be a single radio button.
>

A radio button is not a good UI :)
Maybe something like "This issue | needs information |▼|.", with the
dropdown showing all the possible states that can follow "needs
informations".

> [... list of all the states ...]
>
> Dashboard
> ---------
>
> The dashboard would be a new feature, a new landing page and linked from
> the left hand menu bar.  It would list the first N (adjustable) items in
> each queue relevant to the user for which they are the responsible party,
> followed by those on which the user is nosy, followed by those on which
> they are not nosy.
>
> A general user would see the 'new', 'needs patch', 'needs patch update',
> and, 'needs review' queues.
>
> A triager would see, above the preceding, the 'needs decision' queue.
>
> A committer would see, above the preceding, the 'needs decision from
> committer' and 'needs commit review' queues.
>
> In all queues, issues would be sorted in priority order followed by most
> recent activity, with color coding for the priority.  'stuck' issues
> would appear last, and there should be something to visually differentiate
> issues on which the user is responsible, just nosy, or not nosy.
>
> As with everything else, this is of course subject to discussion.
> It would also be lovely if the user could configure their dashboard,
> but that would probably fall into the category of advanced features we
> come back to later.
>

What happens to the other pages?

>
>
> Big Picture
> -----------
>
> My goal here is to facilitate the involvement of the wider community
> in our workflow as much as possible.  The structure above is designed
> to allow the community to do as much work as possible, and the "trusted
> individuals" to act as gatekeepers to insure quality.  It is theoretically
> possible for a patch to get all the way to 'needs commit review' without
> any more higher level involvement than a triage person moving the issue
> to 'patch needed'.  Of course in reality much more involvement from core
> will be needed, since we need to transfer knowledge to the community,
> and especially the newcomers, about our standards and procedures,
> Not to mention the code.  And committers are going to *want* to do
> general commentary and triage level activities on issues anyway.  But,
> for those committers who have less time, I'd like to think that this
> system would allow them to focus their time on just the stuff only they
> can do, and thus perhaps draw more contributions from some of our less
> active or currently-inactive committers.
>
> The key to this is that instead of issues getting lost, core can watch
> certain key queues: 'needs decision', 'needs committer decision', and
> 'needs commit review'.  When issues get to those stages, the people with
> the experience to do something know that it is *their* responsibility to
> do it: the issues are to a point where the general community can't help
> without core input.  If we can keep those queues to a manageable size,
> I think we can increase the amount of energy coming our way.
>

What if we can't keep them manageable?
There's a fine line between managing to handle all the incoming issues
and having issues start leaking in and giving up on the goal of
keeping the queues empty.  When your queue has 3 issues, well, you
could just take a look now and make it empty, but you don't quite have
time right now for looking at 5 issues... Maybe later (or tomorrow, or
during the weekend) you will have more time.  I think many of you had
something similar happening with your mail inbox (or new year
resolutions, or similar things).

I'll also suggest another related (and "controversial") idea.  People
like to reach goals: if they address the 3 issues in their queue they
have reached the "empty queue" goal.  Addressing 3 of the 5 issues
isn't quite the same thing.
I've seen this concept being exploited in three main ways:
  1) badges/trophies/achievements;
  2) competitions;
  3) streaks;
The first means that the user can get a badge because they closed
their 10th issues, or triaged their 50th, or submitted 5 patches,
being the first to reply on an issue for the 10th time, or whatever.
Even if fixing 3 out of 5 issues won't make you reach the "empty
queue" goal, maybe you can reach the "10 closed issues".  An example
of this are StackOverflow badges (e.g.
http://stackoverflow.com/users/95810/alex-martelli?tab=badges ).
The second includes "leaderboards" or "awards" for being above
average.  Examples of this are Twisted high score
(http://twistedmatrix.com/highscores/) or charts like
http://www.ohloh.net/p/python/contributors (at some point I was the
most active contributor and was trying to keep my contributions going,
but then Serhiy became a contributor... :).  Something similar could
be done by mentioning "exceptional" results in the weekly summary
report (e.g. people who fixed/contributed to the most issues).
The third is about perseverance.  Every day/week you have a goal to
meet, if you reach it you streak counter increases, if you miss it the
counter starts again from zero.  Once you start building up a high
count, you really don't want to miss your goal and start everything
from scratch.  Here the goal might be close 3 issues per week, or
something similar, and could have associate badges ("contribute every
day for a month", "close 3 issues per week for 3 weeks in a row", etc)

While I understand that probably most of the core devs would be
against similar things, this might motivate new users and make them
"addicted" to the tracker, while making their experience more
enjoyable, and the example I linked show that similar things exist
even in these environments (and not only on the micro-transaction
based smartphone games :).  People who don't care about this
(different people are more or less competitive) could just ignore it.
OTOH this might have a negative side-effect if users start closing
issues randomly just to get the "100 closed issues" badge, but this is
not difficult to avoid.

> The other important goal is that via these explicit state transitions,
> especially the 'needs decision' and 'needs commit review' transitions,
> we get a much clearer picture of who in the community *has* absorbed the
> core standards and procedures and is ready to be "promoted" to the next
> level of responsibility.  And the more people we get moved up, the more we
> can get done.  (Note: making that information *easily* accessible requires
> additional tooling, but it should be possible and should be done.)
>
> Note also that I tried to engineer this so that the structure does not
> handicap us or add bureaucracy: even though the basic structure is that
> you get triage signoff first and then committer signoff, if the person
> doing the triage is a committer, they can just move the issue to wherever
> it needs to be.  So the only time two levels of signoff will be *required*
> is when the committers are too busy to drain the queues.  And work on
> an issue should never be *blocked* by the system, the purpose of the
> queues is to draw attention to issues that are "ready" for decision.
> Thus the two levels help us manage the load when we need the help,
> and don't get in our way otherwise.
>
> That's my intent, anyway.  You tell me if you think I'm headed in the
> right direction.

I would like to see a summary/mock-up of what the end result would be
after all these fields are added/removed.
This will give a better idea about the complexity of the new workflow,
but there are definitely good suggestions in there (even though it's a
lot of work).

Best Regards,
Ezio Melotti