[scikit-learn] Github project management tools

Sebastian Raschka se.raschka at gmail.com
Fri Sep 16 10:31:43 EDT 2016

> While I appreciate that we are somewhat arbitrarily supporting a near-monopoly, the case for moving away from, or even wrapping, github seems poor to me.

Yeah, I would that moving away from GitHub involves probably too much hassle given the size of the project. Also, I don’t think there are any good alternatives besides BitBucket, which also would be not a good choice for such a big project due to its pricing structure — they have a simple yet useful “priority” attribute for issues though. Not sure, but it feels like GitHub is currently in a somewhat experimental stage regarding their web UI — feels like they are changing a bit too much, too often frequently. However, maybe (or hopefully) they'll address a few of the recent annoyances in future due to user feedback. Using a wrapper seems like a good idea right now, but who knows whether or not these wrapper will introduce changes as well in near future. 

>> either through the milestone feature, the project feature

I think the milestone feature is pretty useful. Have seen this in several other projects (e.g., matplotlib). As a user/sometimes contributor, it would help with focussing on more important issues; I am sometimes a bit hesitant to submit/tackle pull requests or issues since I feel like they are somewhat distracting the core contributors from the more important stuff.


> On Sep 16, 2016, at 9:11 AM, Sebastian Raschka <se.raschka at gmail.com> wrote:
> Scikit-learn’s GitHub repo already makes use of these templates. I think the issue is more a technical one arising from their latest “style” changes. 
>> On Sep 16, 2016, at 8:25 AM, Dale T Smith <Dale.T.Smith at macys.com> wrote:
>> A form – with required, pre-defined fields – can help when people submit bugs, issues, or requests for new features. Perhaps creating an issue template for scikit-learn is a good first step.
>> https://help.github.com/articles/creating-an-issue-template-for-your-repository/
>> Pull requests also have a template
>> https://help.github.com/articles/creating-a-pull-request-template-for-your-repository/
>> I am not sure how these fit into the team’s review and release workflow.
>> If this doesn’t quite fit your needs, perhaps engaging Github Support will yield something interesting.
>> __________________________________________________________________________________________
>> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science
>> 770-658-5176 | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com
>> From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Joel Nothman
>> Sent: Friday, September 16, 2016 1:15 AM
>> To: Scikit-learn user and developer mailing list
>> Subject: Re: [scikit-learn] Github project management tools
>> ⚠ EXT MSG:
>> I think we're quite close to the intended users of Github, they just started simple and with all these more feature-complete competitors appear, are adding those features but haven't quite got it right yet. I'm not convinced that it's the perfect tool (although I haven't seen this threading problem; gmail seems to still be keeping one thread per PR?), but its simplicity and familiarity/popularity is a great advantage for handling new contributors. In terms of contributor familiarity, most of the projects that we integrate with use same: numpy, scipy, cython (recently), pandas, matplotlib, ipython. While I appreciate that we are somewhat arbitrarily supporting a near-monopoly, the case for moving away from, or even wrapping, github seems poor to me.
>> Apart from distinguishing between possible bug, actual bug and other (which are fairly static categories), classifying issues by status is too hard to manage. What I'd like to suggest is that we choose a way to highlight high-priority issues for the next release, either through the milestone feature, the project feature. Other issues will still get attention by way of random traffic, but we care less about the timing of their resolution.
>> (I'm sure there must be a way using the API to find issues linked to by PRs or not, but I don't think that's available in the UI.)
>> On 16 September 2016 at 09:35, Andreas Mueller <t3kcit at gmail.com> wrote:
>> Hey Joel.
>> Thanks for bringing this up. I have a really hard time keeping up with what's happening
>> on the issue tracker and I have no idea how you manage.
>> The current tags are certainly not always helpful. Also, they are rarely updated.
>> I have been very frustrated by github. I used email to track all issues, but their new "upgrade"
>> made that impossible as issues are no longer email threads - each review is it's own thread.
>> It might make sense to switch to something like reviewable or gerrit.
>> These sit on top of github, and people can interact with them without using them.
>> I haven't really worked with either, but heard only good things about them.
>> Any way to prioritize issues and putting them into the buckets that you listed would be a great step forward.
>> That would require someone manually going through 470 PRs and 762 issues, though.
>> I would be happy to do that if we actually stick to the system. A single person is not enough to keep the tags (or whatever we end up using)
>> up to date, though.
>> Your statuses only apply to PRs, too, and we need to have something similar for issues, which have maybe these statuses
>> * random idea / feature request
>> * feature request with consensus to implement
>> * possible bug
>> * confirmed bug
>> * feature request or bug with active PR
>> * feature request or bug with stale PR
>> One problem with these is that man feature requests never get any comments, similar for PRs.
>> Is a PR without comment waiting for review? Or in dispute?
>> A PR could be reviewed but dispute could happen later, as we don't always agree on what to do.
>> I agree that we should try to organize ourselves better. I'm doubtful the new github features will help.
>> They certainly already have tremendously hindered me in keeping up in the couple of hours they've been online.
>> There is still no way to mark a comment as addressed, and comments are still more or less randomly hidden
>> (and links to them become dead). Both of these issues are fixed in the other review platforms.
>> I don't think we are the intended users of github, though I'm not sure who is.
>> On 09/15/2016 07:14 PM, Joel Nothman wrote:
>> One of the biggest issues with scikit-learn as a project is managing its backlog of issues; another is release scheduling. Some of this cannot be fixed as long as our model of voluntary contribution (with a couple of important exceptions) does not change. However, it may be worth considering the new project management features in Github.
>> At the moment we have the following management:
>> * labels corresponding to type (bug, enhancement, new feat, question), scope (API, Build/CI, ?Large Scale, Documentation), difficulty (easy, moderate), status/scheduling (needs contributor, needs review, sprint).
>> * PR status management with title prefixes [WIP], [MRG], [MRG+1], [MRG+2]
>> Firstly, we might benefit from prefixing labels by category, i.e. difficulty:easy so that complementary labels appear together.
>> In truth, PRs have roughly these statuses:
>> * WIP (not ready for review)
>> * waiting for review
>> * waiting for changes (with or without one of the following)
>> * in dispute (i.e. fundamental doubts about the PR)
>> * the above together with 1 or 2 "official" approvals
>> * ready for merge (pending minor changes such as what's new documentation)
>> New github features:
>> * reviews with "approved" or "request changes". A list of approvers can be found in the merge/CI panel. We could replace the MRG+1 annotation with this and use it to track disputation too. I'm not sure how it works with changes that are added after approval. I think it would have avoided one improper merge by me... One downside is that there does not yet seem to be a way to search for PRs with a specified level of approval (while searching for "MRG+1" sort-of works).
>> * Milestone prioritising: issues in a milestone, such as https://github.com/scikit-learn/scikit-learn/milestone/21, can be ranked with drag-and-drop. I think this could help with release scheduling as it would allow us to identify the top priorities for a release and see when enough of them are completed.
>> * The Kanban-style workflow management of the new Projects toolhttps://github.com/scikit-learn/scikit-learn/projects is another way of managing status and, I think, priority, for a small set of related issues. This might be an alternative way of managing milestone scope, or of working towards big changes like the one just completed for model selection; like proposed expansions to get_feature_names expansion; like estimator tags; making utilities public/private...
>> So with the goal of making it easier to track where attention is most needed, and when to move to release: What's worth trying?
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

More information about the scikit-learn mailing list