[scikit-learn] Github project management tools

Dale T Smith Dale.T.Smith at macys.com
Fri Sep 16 08:25:01 EDT 2016


A form – with required, pre-defined fields – can help when people submit bugs, issues, or requests for new features. Perhaps creating an issue template for scikit-learn is a good first step.

https://help.github.com/articles/creating-an-issue-template-for-your-repository/

Pull requests also have a template

https://help.github.com/articles/creating-a-pull-request-template-for-your-repository/

I am not sure how these fit into the team’s review and release workflow.

If this doesn’t quite fit your needs, perhaps engaging Github Support will yield something interesting.

__________________________________________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science
770-658-5176 | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com

From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Joel Nothman
Sent: Friday, September 16, 2016 1:15 AM
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Github project management tools

⚠ EXT MSG:
I think we're quite close to the intended users of Github, they just started simple and with all these more feature-complete competitors appear, are adding those features but haven't quite got it right yet. I'm not convinced that it's the perfect tool (although I haven't seen this threading problem; gmail seems to still be keeping one thread per PR?), but its simplicity and familiarity/popularity is a great advantage for handling new contributors. In terms of contributor familiarity, most of the projects that we integrate with use same: numpy, scipy, cython (recently), pandas, matplotlib, ipython. While I appreciate that we are somewhat arbitrarily supporting a near-monopoly, the case for moving away from, or even wrapping, github seems poor to me.

Apart from distinguishing between possible bug, actual bug and other (which are fairly static categories), classifying issues by status is too hard to manage. What I'd like to suggest is that we choose a way to highlight high-priority issues for the next release, either through the milestone feature, the project feature. Other issues will still get attention by way of random traffic, but we care less about the timing of their resolution.

(I'm sure there must be a way using the API to find issues linked to by PRs or not, but I don't think that's available in the UI.)

On 16 September 2016 at 09:35, Andreas Mueller <t3kcit at gmail.com<mailto:t3kcit at gmail.com>> wrote:
Hey Joel.
Thanks for bringing this up. I have a really hard time keeping up with what's happening
on the issue tracker and I have no idea how you manage.

The current tags are certainly not always helpful. Also, they are rarely updated.

I have been very frustrated by github. I used email to track all issues, but their new "upgrade"
made that impossible as issues are no longer email threads - each review is it's own thread.

It might make sense to switch to something like reviewable or gerrit.
These sit on top of github, and people can interact with them without using them.
I haven't really worked with either, but heard only good things about them.

Any way to prioritize issues and putting them into the buckets that you listed would be a great step forward.
That would require someone manually going through 470 PRs and 762 issues, though.
I would be happy to do that if we actually stick to the system. A single person is not enough to keep the tags (or whatever we end up using)
up to date, though.

Your statuses only apply to PRs, too, and we need to have something similar for issues, which have maybe these statuses

* random idea / feature request
* feature request with consensus to implement
* possible bug
* confirmed bug
* feature request or bug with active PR
* feature request or bug with stale PR

One problem with these is that man feature requests never get any comments, similar for PRs.
Is a PR without comment waiting for review? Or in dispute?
A PR could be reviewed but dispute could happen later, as we don't always agree on what to do.

I agree that we should try to organize ourselves better. I'm doubtful the new github features will help.
They certainly already have tremendously hindered me in keeping up in the couple of hours they've been online.

There is still no way to mark a comment as addressed, and comments are still more or less randomly hidden
(and links to them become dead). Both of these issues are fixed in the other review platforms.

I don't think we are the intended users of github, though I'm not sure who is.


On 09/15/2016 07:14 PM, Joel Nothman wrote:
One of the biggest issues with scikit-learn as a project is managing its backlog of issues; another is release scheduling. Some of this cannot be fixed as long as our model of voluntary contribution (with a couple of important exceptions) does not change. However, it may be worth considering the new project management features in Github.

At the moment we have the following management:
* labels corresponding to type (bug, enhancement, new feat, question), scope (API, Build/CI, ?Large Scale, Documentation), difficulty (easy, moderate), status/scheduling (needs contributor, needs review, sprint).
* PR status management with title prefixes [WIP], [MRG], [MRG+1], [MRG+2]

Firstly, we might benefit from prefixing labels by category, i.e. difficulty:easy so that complementary labels appear together.

In truth, PRs have roughly these statuses:
* WIP (not ready for review)
* waiting for review
* waiting for changes (with or without one of the following)
* in dispute (i.e. fundamental doubts about the PR)
* the above together with 1 or 2 "official" approvals
* ready for merge (pending minor changes such as what's new documentation)

New github features:

* reviews with "approved" or "request changes". A list of approvers can be found in the merge/CI panel. We could replace the MRG+1 annotation with this and use it to track disputation too. I'm not sure how it works with changes that are added after approval. I think it would have avoided one improper merge by me... One downside is that there does not yet seem to be a way to search for PRs with a specified level of approval (while searching for "MRG+1" sort-of works).
* Milestone prioritising: issues in a milestone, such as https://github.com/scikit-learn/scikit-learn/milestone/21, can be ranked with drag-and-drop. I think this could help with release scheduling as it would allow us to identify the top priorities for a release and see when enough of them are completed.
* The Kanban-style workflow management of the new Projects tool https://github.com/scikit-learn/scikit-learn/projects is another way of managing status and, I think, priority, for a small set of related issues. This might be an alternative way of managing milestone scope, or of working towards big changes like the one just completed for model selection; like proposed expansions to get_feature_names expansion; like estimator tags; making utilities public/private...

So with the goal of making it easier to track where attention is most needed, and when to move to release: What's worth trying?


_______________________________________________

scikit-learn mailing list

scikit-learn at python.org<mailto:scikit-learn at python.org>

https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn at python.org<mailto:scikit-learn at python.org>
https://mail.python.org/mailman/listinfo/scikit-learn

* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160916/ed037799/attachment-0001.html>


More information about the scikit-learn mailing list