Mailman 3 Getting more momentum for pip - Distutils-SIG

Getting more momentum for pip

Marc Abramowitz

5 Mar 2015 5 Mar '15

4:38 p.m.

This post is meant to start a discussion on how to make sure that important PyPA projects like pip get enough eyes so that they can continually move forward. It is absolutely NOT meant to be critical of folks who volunteer their time to work on these projects, as I have the utmost respect for folks who do this often thankless work. And since the word "thankless" just popped into my head, let me say "thank you" right now to the folks who contribute to PyPA. I've noticed that when PRs are submitted to pip (by myself and by others), they often languish for a while and then sometimes become unmergeable because of conflicts. This makes me think that the folks who review the PRs are overburdened and/or the process needs a bit more structure (e.g.: each person thinks someone else is going to review the PR and so no one does it). So some ideas off the top of my head - please comment and/or add your own suggestions to the list: - Add more committers - this will ostensibly increase the number of folks reviewing so that the turnaround time is decreased. And takes some pressure off. - Obtain some kind of funding so that committers can be compensated for their work and don't feel as bad about spending time on it. These people have day jobs and families so maybe this would help; maybe not. - Introduce some bot or automation that periodically reminds about open PRs that are getting old or PRs that have become unmergeable, etc. Or maybe for each PR, it picks one or more persons who is responsible for reviewing that PR, so there is no ambiguity about who is responsible. The OpenStack folks have quite a bit of structure in their workflow (probably too much for PyPA projects which have less people working on them?), but perhaps there are some things that can be borrowed. In particular, changes need a certain amount of upvotes to be merged and reviewers usually request feedback from certain individuals. So I guess my suggestions boil down to: - Add more humans - Add more money to make humans more efficient - Add more computer automation #3 seems most appealing to me, but of course it requires humans to develop it in the first place, but at least it's an investment that could pay dividends. Thoughts? Marc

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Ian Cordasco

5 Mar 5 Mar

4:50 p.m.

On Thu, Mar 5, 2015 at 10:38 AM, Marc Abramowitz wrote:

...

This post is meant to start a discussion on how to make sure that important PyPA projects like pip get enough eyes so that they can continually move forward. It is absolutely NOT meant to be critical of folks who volunteer their time to work on these projects, as I have the utmost respect for folks who do this often thankless work. And since the word "thankless" just popped into my head, let me say "thank you" right now to the folks who contribute to PyPA.

I've noticed that when PRs are submitted to pip (by myself and by others), they often languish for a while and then sometimes become unmergeable because of conflicts.

This makes me think that the folks who review the PRs are overburdened and/or the process needs a bit more structure (e.g.: each person thinks someone else is going to review the PR and so no one does it).

So some ideas off the top of my head - please comment and/or add your own suggestions to the list:

- Add more committers - this will ostensibly increase the number of folks reviewing so that the turnaround time is decreased. And takes some pressure off.

- Obtain some kind of funding so that committers can be compensated for their work and don't feel as bad about spending time on it. These people have day jobs and families so maybe this would help; maybe not.

- Introduce some bot or automation that periodically reminds about open PRs that are getting old or PRs that have become unmergeable, etc. Or maybe for each PR, it picks one or more persons who is responsible for reviewing that PR, so there is no ambiguity about who is responsible. The OpenStack folks have quite a bit of structure in their workflow (probably too much for PyPA projects which have less people working on them?), but perhaps there are some things that can be borrowed. In particular, changes need a certain amount of upvotes to be merged and reviewers usually request feedback from certain individuals.

So I guess my suggestions boil down to:

- Add more humans - Add more money to make humans more efficient - Add more computer automation

#3 seems most appealing to me, but of course it requires humans to develop it in the first place, but at least it's an investment that could pay dividends.

Thoughts?

Personally I value stability over a tendency to merge every PR. I know you're not advocating that every PR be merged (or at least I hope you're not), but the only way to add more core developers is to have people who are already volunteering to review other people's pull requests. "Core developer" is a title that is not so much a privilege as it is a responsibility and it should only be given to those who are already volunteering time to contribute to pip and other PyPA projects through multiple efforts (e.g., reviewing other people's code - not just sending their own, supporting the project either on StackOverflow, irc, or somewhere else, etc.) For the short time that I subscribed to pip notifications I noticed a lot of PRs and a lot of pings but very little serious critical review. Most PRs in that period of time seemed to be feature PRs and not bug fixes. Personally, Bug Fixes are more important than Features and not ever new feature deserves to be merged, in fact, I would say probably more projects would do better by rejecting most new features. OpenStack's workflow is incredibly different from most other projects that aren't people's day jobs for a few reasons: 1. OpenStack is a globally distributed effort. Each project has hundreds of contributors and most only have 5-10 core reviewers who tend to be nominated only after a lot of sweat has been poured into a project. 2. No one can commit directly to a repository. 3. There's a lot of automation around tests but there's also a lot of ceremony in review 4. New features are strongly vetted through a blueprint and specification process that more core reviewers contribute to but do not drive (hence why people who are "core" on that process are called drivers) 5. There's a much clearer review system in place where votes actually matter and are aggregated 6. People (like myself) are literally paid to work on OpenStack. Few people work on it in their free time except to keep their job 7. pip is (in my opinion) far more stable than OpenStack and I would like to keep it that way. I know there are OpenStack Infrastructure folks who monitor this list, but OpenStack continuously reaches into private areas of projects and when those change, OpenStack tends to get angry at upstream for not having backwards compatibility in non-public APIs. pip on the other hand doesn't do that. I'm sure Donald needs help. I don't think the right kind of help now is adding more people who can merge things arbitrarily. I think the right kind of help needs to be focused towards stability and we need a better way of defining what new features belong in pip and what don't to reduce the volume of pull requests that are deprioritized purely because they're features that may or may not have been discussed with PyPA cores beforehand.

Paul Moore

5:07 p.m.

On 5 March 2015 at 16:38, Marc Abramowitz wrote:

...

Thoughts?

It seems to me that there is another point that delays progress on a certain proportion of PRs, specifically feature requests, namely that no-one really has that strong an opinion on whether they are a good idea or not. I know I've noticed a few requests recently where my reaction was essentially "I don't have a problem with this, but I don't care enough to actually add it" (it's not so much the mechanical aspect of merging the PR, but also the fact that you take on some level of "ownership" of the PR if you merge it). That's a much harder issue to resolve, as it revolves around having a clear roadmap for pip and a good understanding of scope - something we don't really have. But if we spend too long debating project direction and abstracts like that, we get even less done. Ian's point is also a good one. We get a lot of PRs for new features (which is where my point above applies) but far fewer for bug fixes. And there's also the problem with bug fixes that often the number of core devs with access to the affected platform is limited. (I try to cover Windows for Python 3, but I only have Linux access via setting up a VM, and I have no access to OSX[1], and my interest in Python 2 is limited at best. I'm sure the other devs have similar constraints). But unlike Ian, I think we *do* need to look at features. Things like separating the build and install phases, defining a supportable API for pip (whether the CLI or importable), supporting the in-development peps like Metadata 2.0, external hosting, wheel improvements etc etc, all need doing. The PEP process is the best way to handle this - if people on this list agree on a PEP, it's much easier to develop an implementation instead of just deciding on individual PRs in isolation. Paul. [1] Anyone want to buy me a mac? ;-)

Randy Syring

6:21 p.m.

On 03/05/2015 12:07 PM, Paul Moore wrote:

...

It seems to me that there is another point that delays progress on a certain proportion of PRs, specifically feature requests, namely that no-one really has that strong an opinion on whether they are a good idea or not. I know I've noticed a few requests recently where my reaction was essentially "I don't have a problem with this, but I don't care enough to actually add it"

An "issue before feature request PR" policy might help here. Before someone submits the feature request PR, have them open an issue for discussion of the feature. Recommend/Require that at least one core dev be strongly in favor before giving approval to open the PR. Automatically close these issues without the approval after some amount of inactivity. Automatically lose any feature request PRs that don't have an approved issue associated with them. Put up an explanation of this policy, sympathy that it may cause frustration, but that with limited time, devs must focus their efforts, etc. The closing of issues & PRs can be automated, core devs can focus on the items they think brings real value. Bug fix PRs now stand out more. My $0.02. *Randy Syring* Husband | Father | Redeemed Sinner /"For what does it profit a man to gain the whole world and forfeit his soul?" (Mark 8:36 ESV)/

Marcus Smith

6:33 p.m.

On Thu, Mar 5, 2015 at 10:21 AM, Randy Syring wrote:

...

On 03/05/2015 12:07 PM, Paul Moore wrote:

It seems to me that there is another point that delays progress on a certain proportion of PRs, specifically feature requests, namely that no-one really has that strong an opinion on whether they are a good idea or not. I know I've noticed a few requests recently where my reaction was essentially "I don't have a problem with this, but I don't care enough to actually add it"

An "issue before feature request PR" policy might help here. Before someone submits the feature request PR, have them open an issue for discussion of the feature. Recommend/Require that at least one core dev be strongly in favor before giving approval to open the PR. Automatically close these issues without the approval after some amount of inactivity. Automatically lose any feature request PRs that don't have an approved issue associated with them.

Put up an explanation of this policy, sympathy that it may cause frustration, but that with limited time, devs must focus their efforts, etc.

+1. good idea.

Paul Moore

7:08 p.m.

On 5 March 2015 at 18:33, Marcus Smith wrote:

...

...
An "issue before feature request PR" policy might help here. Before someone submits the feature request PR, have them open an issue for discussion of the feature. Recommend/Require that at least one core dev be strongly in favor before giving approval to open the PR. Automatically close these issues without the approval after some amount of inactivity. Automatically lose any feature request PRs that don't have an approved issue associated with them.

Put up an explanation of this policy, sympathy that it may cause frustration, but that with limited time, devs must focus their efforts, etc.

+1. good idea.

Agreed, good idea. We could probably also introduce some similar guidelines over issues / bug reports. When reporting a bug, please provide clear instructions on how to reproduce the bug, details of platform known to be affected, etc. If that information isn't available, and the OP doesn't respond after being prompted for the information, we'll close the bug after a given period. On a related note, maybe we need a better definition of what platforms/configurations we support. For example, do we support using pip with older versions of setuptools? ("Please upgrade setuptools and confirm if the problem still exists, if it doesn't then that's the fix and we'll close the issue") Platform-specific issues are the hardest to deal with - if something comes up that only affects Gentoo Linux, none of the core devs have any means of reproducing it without creating a VM, working out how to install Gentoo, etc... "Not supported" doesn't have to mean we won't help, but it might mean that we label the issue somehow and need 3rd party help (either from the OP or someone else) to progress the issue, and if nobody steps forward after a certain amount of time, we close the issue (or just leave it open - if it's labelled appropriately we can exclude it from any measures we want to make of open issues we *expect* to work on). Paul.

Marcus Smith

5:16 p.m.

So I guess my suggestions boil down to:

...

- Add more humans - Add more money to make humans more efficient - Add more computer automation

maybe agree to always maintain < X open issues and < Y open PRs, before adding features. where x can vary as needed, but for starters, x=250, and y=25 sounds reasonable. this would: - force more work on issue and PR backlog - force making the tough decisions on whether something is realistically going to be worked on and closing it - force closing issues that are not getting the responses needed to actually work the problem. - would more likely cycle in new folks to become committers.

Ionel Cristian Mărieș

6:07 p.m.

Triaging the issues would help. If, say, I'd like to help it's very discouraging to look in a bug tracker and not be able to filter down on what's interesting or important. Some ideas for triaging goals: * clear labels for issues that the maintainers want to be fixed (eg: if you fix it you'd get quick feedback as a maintainer is interested in that issue). * clear labels for issues that need discussions, have design issue etc (blocked issues IOW). * clear labels for what's a bug or feature. * clear labels for features that would be accepted but no maintainer has time or is interested in (the "nice to have" issues). Currently there are no labels at all for any issue or PR. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro On Thu, Mar 5, 2015 at 7:16 PM, Marcus Smith wrote:

...

So I guess my suggestions boil down to:

...
- Add more humans - Add more money to make humans more efficient - Add more computer automation

maybe agree to always maintain < X open issues and < Y open PRs, before adding features. where x can vary as needed, but for starters, x=250, and y=25 sounds reasonable. this would: - force more work on issue and PR backlog - force making the tough decisions on whether something is realistically going to be worked on and closing it - force closing issues that are not getting the responses needed to actually work the problem. - would more likely cycle in new folks to become committers.

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Marcus Smith

6:11 p.m.

...

Currently there are no labels at all for any issue or PR.

there are labels https://github.com/pypa/pip/labels I put most of these last year.

Ionel Cristian Mărieș

6:17 p.m.

I've only looked at the first couple pages. The existing labels don't indicate clearly the first/second/fourth goals. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro On Thu, Mar 5, 2015 at 8:11 PM, Marcus Smith wrote:

...

...
Currently there are no labels at all for any issue or PR.

there are labels https://github.com/pypa/pip/labels I put most of these last year.

Paul Moore

6:14 p.m.

On 5 March 2015 at 17:16, Marcus Smith wrote:

...

So I guess my suggestions boil down to:

...
- Add more humans - Add more money to make humans more efficient - Add more computer automation

maybe agree to always maintain < X open issues and < Y open PRs, before adding features. where x can vary as needed, but for starters, x=250, and y=25 sounds reasonable. this would: - force more work on issue and PR backlog - force making the tough decisions on whether something is realistically going to be worked on and closing it - force closing issues that are not getting the responses needed to actually work the problem. - would more likely cycle in new folks to become committers.

That implies closing 183 issues and 65 PRs from where we are now. And when you say "adding features" presumably that means somehow forbidding people (core devs? we can't forbid anyone else...) from creating new PRs until we're below the limit. In general, I don't think it's practical in a volunteer-based project, to "force" people to hit specific targets. Having said that, though, more people doing tracker gardening would be a good thing - if anyone (core dev or not) wants to go through doing triage on PRs and issues and closing ones that can be closed, getting self-contained test cases for bugs, doing code reviews on PRs, etc, that would be great. Paul

Marcus Smith

6:28 p.m.

...

That implies closing 183 issues and 65 PRs from where we are now. And when you say "adding features" presumably that means somehow forbidding people (core devs? we can't forbid anyone else...) from creating new PRs until we're below the limit.

In general, I don't think it's practical in a volunteer-based project, to "force" people to hit specific targets.

you certainly can't be hard line about it, but I think it's a practical rule and achievable. you just don't work on new non-critical things until you get the issue and PR counts down. As much as I would like to see pypa have a "system" of issue and PR review, I think a simple rule like this is more achievable right now.

Nick Coghlan

6 Mar 6 Mar

10:56 a.m.

On 6 March 2015 at 03:16, Marcus Smith wrote:

...

So I guess my suggestions boil down to:

...
- Add more humans - Add more money to make humans more efficient - Add more computer automation

maybe agree to always maintain < X open issues and < Y open PRs, before adding features. where x can vary as needed, but for starters, x=250, and y=25 sounds reasonable.

This is one of the key differences between open source projects and corporate projects. In a corporate context, it's important to keep your backlog under control by saying "we're never going to invest time in fixing this" and declare such issues as "won't fix". In a community driven open source project, the situation is different. Here, because everyone is free to spend their time however they like (or however they can persuade someone to pay them), the only real reason to "won't fix" an issue is because it's a genuinely bad idea (or based on a flawed understanding of the situation) and there's no way to redeem the suggestion. For anything else, it's often better to leave it open as an opportunity for someone to persuade the core developers that it's worth accepting the liability of maintaining that code into the future - those that do the work, make the rules. It does require a lot of education work, though. Most folks (especially when just starting to learn to code) are inclined to think that contributing more code is purely beneficial. It's not, as code is a liability that costs you long term effort to maintain. As Jack Diederich puts it: "code is the enemy and you want as little of it in your product as possible". What you actually want is to *solve people's problems in the general case*, such that the net gain in time saved across the user community vastly outweighs the maintenance cost of allowing that code to exist in the project. For mature projects with a fairly well defined scope, this means the default answer is going to be "no", and the most positive likely initial answer is "maybe". Hence the split of the core CPython mailing lists into python-dev (where the default answer is just a straight up "no") and python-ideas (where the default answer is more along the lines of "that's a potentially interesting idea, let's discuss it further"). This is arguably a flaw in more recent approaches to getting folks engaged in open source projects, with a focus on quick wins and immediate contributions. While those do exist in most projects, they're generally about implementing *existing* ideas, or improving docs, or fixing bugs. For folks that want to get a *specific* issue fixed, then what they generally need to do is to learn what the core contributors care about, and how to get them interested in addressing that problem. There isn't the simple "I'm a paying customer, this is a problem I'm reporting, we have an SLA, please resolve my issue in accordance with that" dynamic that exists in a traditional vendor relationship. While it's only been intermittently effective, one useful tactic CPython core developers have occasionally applied is a "5-for-1" review trade: if someone else reviews 5 open patches, then they'll review one that the new reviewer cares about. That's a pretty straightforward quid pro quo, and gets people into the habit of contributing time to things others are interested in so as to get time contributed back to their own issues in return. It relies on core contributors having the spare capacity to offer that deal though. pythonmentors.com describes another approach, which allows folks to self-identify as wanting to invest time in becoming a core committer/reviewer themselves, rather than just wanting to get a drive-by patch merged. Those folks then gain the benefit of getting the attention of the more active mentors in the core developer group, as well as a community of like-minded peers all attempting to learn the tricks of navigating the vagaries of CPython core development. Another way is when an existing contributors deliberately recruits someone they trust to take over a task from them. In those cases, the mentorship relationship already formed outside the particular community of contributors, and is transferred into the new context. Finally, there's straight up trading of favours: if you contribute something an existing core contributor wants themselves, a) you'll likely get their attention on that original review pretty easily; and b) if you have something *else* you want reviewed, they're far more likely to try to find the time if you're already helped them cross a lingering item off their generally voluminous todo lists. Regards, Nick. P.S. The other useful thing to do is to better educate folks on how to make the case for spending work time on upstream projects and key dependencies, *without* a specific near term business need. I'm pretty happy with this piece I recently wrote for Red Hat as an example of that kind of thing: http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastruc... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

5 Mar 5 Mar

6:58 p.m.

...

On Mar 5, 2015, at 11:38 AM, Marc Abramowitz wrote:

Thoughts?

So I agree with Paul in that part of the reason why some of the PRs languish is that they are adding new features that none of us feel strongly about, so nobody wants to push the button to say “yes we’re going to support this”. Being conservative with features is generally a good thing for a (mostly) volunteer project because we do have a limited amount of manpower and more features increases the surface area we have to take care of. We also tend to have a problem saying no to features and tend to just let them languish instead of making a decision one way or the other when we don’t feel strongly. Another issue we have is that often we get requests (bug fixes or otherwise) which break tests or which themselves do not have tests written. We can’t merge these as is but it’s not unusual for the original author to not update their PR and leave it languishing with broken/no tests. It’s hard to want to close these as “abandoned” because there’s code that may or may not be working if someone takes the time to go through and finish it, but until someone feels like doing that the PR is not able to be merged. Yet another issue is that pip’s test suite is not particularly very good. We’re missing a lot of coverage and we don’t have *any* CI running on platforms other than Ubuntu. This means that merging things is somewhat “dangerous” because it’s easy to break things without noticing unless you pull down the change and manually test things you can try. Even then that’s not good enough unless you can test it on other platforms as well. I’m sure Paul can fill in the blank on how often the test suite simply doesn’t run on Windows because of some POSIX assumption snuck in somewhere. Another issue (that sort of ties in with the test one) is pip’s code base is simply not very good. Things are not well encapsulated and it’s non obvious what’s going to change anytime you change something. There was recently an issue where changing the order we checked for things to be uninstalled in broke an assumption that a lot of people were relying on. Obviously the biggest limiting factor is simple manpower, and most of the above issues deal with things that make our use of use of what manpower we have more inefficient. I think that the best way to increase the momentum is to explore ways to make our use of the manpower we have more effective and reduce the waste. I don’t think that an artificial limit on the number of issues or pull requests is a good path forward. Since we’re more or less entirely volunteer besides myself, I feel like an artificial limit on numbers will just mean that things don’t happen. I feel like if someone doesn’t want to close issues for whatever reasons, trying to force them to do that will just mean they are more likely to choose to spend their time elsewhere. I also worry that anything which autocloses PRs (after a period of time, or if there wasn’t an associated issue, or whatever) is somewhat hostile to potential contributors and I feel like having an open PR isn’t that big of a deal in general. I think the most effective way forward is that we need to work on fixing the things that needlessly suck up the pip core team’s time. Things that would be huge improvements in this area: * Refactoring pip to better encapsulate and separate concerns, creating boundaries between different parts[1] * Improve the test suite by covering cases that aren’t being covered, moving functional tests to unit tests * Creating a high level test suite that runs setuptools, pip, virtualenv, etc all together against real projects. * Help with getting CI for other systems setup, particularly for Windows. This may take the shape of helping Travis, or it may be setting up another service or our own CI system. Other things that would help are: * People doing in-depth reviews of the current PRs that are there and suggesting changes or pointing out issues, etc. * People triaging issues (unfortunately this one isn’t super easy with GitHub Issues since you have to be a committer to change these things). * People going through and reviewing old issues and PRs to try and figure out if the situations that caused them to be opened originally still apply or if that problem has been fixed or if the code has changed significantly enough that it’s likely to not longer exist. These sorts of things would make it *much* easier to merge new things because there would be less risk and less things involved in actually going through and figuring out if any particular merge is a good idea or not. I also think that people willing to put in the work to do things like this would be good candidates for becoming core developers themselves, which would also help by increasing the number of people we have able to review and commit. [1] http://pyvideo.org/video/1670/boundaries --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Marcus Smith

7:27 p.m.

...

I don’t think that an artificial limit on the number of issues or pull requests is a good path forward.

I would say "reasonable" limit, not "artificial". : ) It's a simple way to balance the efforts towards project maintenance.

...

I feel like if someone doesn’t want to close issues for whatever reasons, trying to force them to do that will just mean they are more likely to choose to spend their time elsewhere.

the counter argument is that by not responding to issues and PRs, contributers end up going elsewhere.

Donald Stufft

7:28 p.m.

...

On Mar 5, 2015, at 2:27 PM, Marcus Smith wrote:

I don’t think that an artificial limit on the number of issues or pull requests is a good path forward.

I would say "reasonable" limit, not "artificial". : ) It's a simple way to balance the efforts towards project maintenance.

I feel like if someone doesn’t want to close issues for whatever reasons, trying to force them to do that will just mean they are more likely to choose to spend their time elsewhere.

the counter argument is that by not responding to issues and PRs, contributers end up going elsewhere.

Sure, but if we spend our time elsewhere they aren’t going to get responses either ;) --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Marc Abramowitz

7:32 p.m.

...

* Refactoring pip to better encapsulate and separate concerns, creating boundaries between different parts

These of course are a drop in the bucket of what could be done: - https://github.com/pypa/pip/pull/2404 - https://github.com/pypa/pip/pull/2410 - https://github.com/pypa/pip/pull/2411 Now probably `install` is the one that would add the most value and I briefly thought of doing that but then I thought to myself that there are so many open PRs already and one for `install` would probably break a whole bunch of them. Also I don't want to have too many open ones because I just don't like having too many open loops.

Donald Stufft

7:37 p.m.

...

On Mar 5, 2015, at 2:32 PM, Marc Abramowitz wrote:

...
* Refactoring pip to better encapsulate and separate concerns, creating boundaries between different parts

These of course are a drop in the bucket of what could be done:

- https://github.com/pypa/pip/pull/2404 https://github.com/pypa/pip/pull/2404 - https://github.com/pypa/pip/pull/2410 https://github.com/pypa/pip/pull/2410 - https://github.com/pypa/pip/pull/2411 https://github.com/pypa/pip/pull/2411

Now probably `install` is the one that would add the most value and I briefly thought of doing that but then I thought to myself that there are so many open PRs already and one for `install` would probably break a whole bunch of them. Also I don't want to have too many open ones because I just don't like having too many open loops.

To be honest, I didn’t so much mean the commands themselves. It’s a minor improvement but it’s largely shuffling deck chairs on the titanic in my opinion. It doesn’t meaningfully make things cleaner. The things I’m talking about are more about the internals of pip, pip.index, pip.download, pip.req.*, etc. These are the “core” parts of pip and that code is horrible and messy and actually figuring out how to clean that up would be a major big deal. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

7:38 p.m.

On 5 March 2015 at 19:32, Marc Abramowitz wrote:

...

...
* Refactoring pip to better encapsulate and separate concerns, creating boundaries between different parts

These of course are a drop in the bucket of what could be done:

- https://github.com/pypa/pip/pull/2404 - https://github.com/pypa/pip/pull/2410 - https://github.com/pypa/pip/pull/2411

Now probably `install` is the one that would add the most value and I briefly thought of doing that but then I thought to myself that there are so many open PRs already and one for `install` would probably break a whole bunch of them. Also I don't want to have too many open ones because I just don't like having too many open loops.

Much more important (and I think what Donald was probably referring to) is the internal refactorings - properly encapsulating the finder, the various requirement objects, making "build from source" and "install from wheel" into clearly defined components. Personally, while I don't think there's anything particularly *wrong* your 3 PRs mentioned above, my feeling is that they don't really add much. Given that nothing other than the pip code itself is supposed to use pip internal functions at this point in time, there's nobody who'll really gain from them. Paul

Marc Abramowitz

7:55 p.m.

Yeah my changes are quite trivial. And there is much more complex stuff that could be improved. IIRC, `pip/req/req_install.py` is the real behemoth. I remember feeling afraid to touch that thing. :) I do think this illustrates some of the problem though in that if those 3 very simple PRs are not merged or closed, then I don't have a lot of faith that any more complex PR that I might submit would be worth the time investment. I feel like I'm somewhat conditioned to only submit small PRs to pip because they have the lowest risk (although also the lowest reward of course). But I don't want to get too bogged down with focusing on me and my PRs. I think this involves everybody.

Ian Cordasco

8:11 p.m.

On Thu, Mar 5, 2015 at 1:55 PM, Marc Abramowitz wrote:

...

Yeah my changes are quite trivial. And there is much more complex stuff that could be improved. IIRC, `pip/req/req_install.py` is the real behemoth. I remember feeling afraid to touch that thing. :)

I do think this illustrates some of the problem though in that if those 3 very simple PRs are not merged or closed, then I don't have a lot of faith that any more complex PR that I might submit would be worth the time investment.

I feel like I'm somewhat conditioned to only submit small PRs to pip because they have the lowest risk (although also the lowest reward of course).

The lowest reward to whom? If it's a good change that fixes something (regardless of the size that's a big reward to the project and its users. I'm also not going to go back in the thread to reply all of the messages, but I want to be clear that I didn't mean every feature should be rejected. Just that for a project as critical to an ecosystem like Python's, rejecting features should be fairly easy to do and very simple. If no current core developer wants to maintain it or feels strongly about it, that should be closed. A nice, already written, form explanation would probably antagonize contributors less than a short reasoning that might come off curt. That said, you'll always lose some contributors by closing their pull requests or by trying to help them get it into a good state. In my opinion, gigantic PRs that refactor things should be rejected immediately. Those can almost certainly always be done in smaller, easier to review, and easier to understand pull requests. Mega-refactors that will take hours (or even days) to review that modify all of the tests should be absolutely out of the question if pip is going to come up with better guidelines. requests has received a handful of them, and we've tried to coach those people (who are often first-time contributors) into breaking it up but they always leave, and the project has to realize that sometimes it's an acceptable loss. Having clear guidelines is great, they should be enforced and they should be linked somewhere, like the top of the README. Regarding empowering people to close, label, and triage issues is going to be much harder. There are only two systems I can think of that allow for this: Launchpad and Trac. Trac is Django's tracker (in case people aren't aware) and Django has found a way to integrate it with GitHub. We /could/ take that approach, but that leaves the following questions: - Who is going to maintain the server(s)? - Who is going to provision them initially? - What happens if those people (ideally it's more than just one person) disappear? Will we have enough people to reduce the bus factor? - Will this ever become yet another thing that the core developers have to spend time on instead of reviewing pull requests and fixing bugs? Regarding auto-closing, please don't do it. Especially for the ones where someone didn't file a bug first, it may be the only way to figure out what's going on? And for CI, we need people who will help with the windows CI solution on more than one front clearly. I think pyca/cryptography has OS X builders on Travis, so we could probably add tests for that, but for RHEL that's going to be much harder. I think this is where Marc's reference to OpenStack fits in perfectly though. In OpenStack there's a concept of third party CI. (There's a similar notion with the buildbots for CPython.) The people who register those CI systems maintain them but the project determines whether or not that system's failing build should count against a change or not. GitHub allows for multiple statuses to be set on a commit/PR from multiple services. Thoughts?

Randy Syring

8:17 p.m.

On 03/05/2015 03:11 PM, Ian Cordasco wrote:

...

There are only two systems I can think of that allow for this: Launchpad and Trac.

There is also Atlassian Jira, they give free accounts for open source projects. You could also stick with GitHub and just give people commit rights and tell them not to commit, only manage issues. It's easy enough to back-out changes by and then ban bad actors. *Randy Syring* Husband | Father | Redeemed Sinner /"For what does it profit a man to gain the whole world and forfeit his soul?" (Mark 8:36 ESV)/

Donald Stufft

8:31 p.m.

...

On Mar 5, 2015, at 3:11 PM, Ian Cordasco wrote:

On Thu, Mar 5, 2015 at 1:55 PM, Marc Abramowitz wrote:

...
Yeah my changes are quite trivial. And there is much more complex stuff that could be improved. IIRC, `pip/req/req_install.py` is the real behemoth. I remember feeling afraid to touch that thing. :)

I do think this illustrates some of the problem though in that if those 3 very simple PRs are not merged or closed, then I don't have a lot of faith that any more complex PR that I might submit would be worth the time investment.

I feel like I'm somewhat conditioned to only submit small PRs to pip because they have the lowest risk (although also the lowest reward of course).

The lowest reward to whom? If it's a good change that fixes something (regardless of the size that's a big reward to the project and its users.

I'm also not going to go back in the thread to reply all of the messages, but I want to be clear that I didn't mean every feature should be rejected. Just that for a project as critical to an ecosystem like Python's, rejecting features should be fairly easy to do and very simple. If no current core developer wants to maintain it or feels strongly about it, that should be closed. A nice, already written, form explanation would probably antagonize contributors less than a short reasoning that might come off curt. That said, you'll always lose some contributors by closing their pull requests or by trying to help them get it into a good state.

In my opinion, gigantic PRs that refactor things should be rejected immediately. Those can almost certainly always be done in smaller, easier to review, and easier to understand pull requests. Mega-refactors that will take hours (or even days) to review that modify all of the tests should be absolutely out of the question if pip is going to come up with better guidelines. requests has received a handful of them, and we've tried to coach those people (who are often first-time contributors) into breaking it up but they always leave, and the project has to realize that sometimes it's an acceptable loss.

Having clear guidelines is great, they should be enforced and they should be linked somewhere, like the top of the README.

Github has CONTRIBUTING.rst which is an obvious place to put something like this. Giant mega PRs are certainly harder to merge than small PRs but also important is the benefit of the PR itself. A small PR that doesn’t seem to improve things much or is just shuffling around code is less useful than a PR that reorganizes some of the code to be cleaner. Sadly with how the code in pip is written, sometimes it’s just not reasonable to make small PRs because things are not well factored and changing things requires touching a lot of different areas. Those kinds of PRs are OK as a last resort, but ideally the PR will be opened up early on as a WIP and they should be attempted to be as small as possible still.

...

Regarding empowering people to close, label, and triage issues is going to be much harder. There are only two systems I can think of that allow for this: Launchpad and Trac. Trac is Django's tracker (in case people aren't aware) and Django has found a way to integrate it with GitHub. We /could/ take that approach, but that leaves the following questions:

- Who is going to maintain the server(s)? - Who is going to provision them initially? - What happens if those people (ideally it's more than just one person) disappear? Will we have enough people to reduce the bus factor? - Will this ever become yet another thing that the core developers have to spend time on instead of reviewing pull requests and fixing bugs?

Hosted services are ideal for this, though that limits things a bit, but hosted services of OSS software is even better of course. There are a number of bug trackers that can be configured to allow open registration and so that anyone who registers can modify ticket states. Maintaining things is easier than setting them up (I already maintain stuff for the PSF, so adding more stuff isn’t that big of a deal) the major thing is going through all the different solutions, figuring out which ones best fit our needs, making sure it can be configured via salt instead of via a UI, etc and making a proposal to actually switch to it and what the fallout and work required to do that would be.

...

Regarding auto-closing, please don't do it. Especially for the ones where someone didn't file a bug first, it may be the only way to figure out what's going on?

And for CI, we need people who will help with the windows CI solution on more than one front clearly. I think pyca/cryptography has OS X builders on Travis, so we could probably add tests for that, but for RHEL that's going to be much harder. I think this is where Marc's reference to OpenStack fits in perfectly though. In OpenStack there's a concept of third party CI. (There's a similar notion with the buildbots for CPython.) The people who register those CI systems maintain them but the project determines whether or not that system's failing build should count against a change or not. GitHub allows for multiple statuses to be set on a commit/PR from multiple services. Thoughts?

Assuming we can give someone permission to set a status on github PRs without giving them push access or other access I would be perfectly fine with a “third party CI” solution like that. We’d need to hash out exact requirements but it’s certainly a possibility. I’d like to point out again that if anyone knows go and can write a little golang program that spins up a Windows Azure server, and runs a command using winrm, any command even echo, we can give that to Travis to help them get Windows support.

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Ben Finney

10 p.m.

New subject: Implementing large changes in small increments (was: Getting more momentum for pip)

Donald Stufft writes:

...

Sadly with how the code in pip is written, sometimes it’s just not reasonable to make small PRs because things are not well factored and changing things requires touching a lot of different areas.

I've seen a number of other projects enforce “small revisions only, otherwise your change gets accepted”. If actually enforced, it is a highly successful way to get meaningful review of changes, and does not appear to limit the scope of the eventual change. What does end up happening in such projects (e.g., Linux) is the community learns how to – and teaches newcomers how to – implement large changes as smaller refactorings, each of which results in a working system. I think the Pip developers should not fear the loss of large changes. Large changes can always be implemented as a series of small, understandable changes, if skill and design effort are brought to bear. The resulting large changes also end up being better examined and better designed. -- \ “Well, my brother says Hello. So, hooray for speech therapy.” | `\ —Emo Philips | _o__) | Ben Finney

Ben Finney

10:06 p.m.

New subject: Implementing large changes in small increments

Ben Finney writes:

...

Donald Stufft writes:

...
Sadly with how the code in pip is written, sometimes it’s just not reasonable to make small PRs because things are not well factored and changing things requires touching a lot of different areas.

I've seen a number of other projects enforce “small revisions only, otherwise your change gets accepted”. If actually enforced, it is a highly successful way to get meaningful review of changes, and does not appear to limit the scope of the eventual change.

That's “small changes only, otherwise your change gets rejected”, of course. The policy for Linux that I alluded to is in §3 of this document: 3) Separate your changes. ------------------------- Separate each _logical change_ into a separate patch. […] The point to remember is that each patch should make an easily understood change that can be verified by reviewers. Each patch should be justifiable on its own merits. If one patch depends on another patch in order for a change to be complete, that is OK. Simply note "this patch depends on patch X" in your patch description. […] URL:https://www.kernel.org/doc/Documentation/SubmittingPatches I would be happy to see more projects adopt this, and enforce it, for change contributions. -- \ “Reality must take precedence over public relations, for nature | `\ cannot be fooled.” —Richard P. Feynman, _Rogers' Commission | _o__) Report into the Challenger Crash_, 1986-06 | Ben Finney

Greg Ewing

6 Mar 6 Mar

1:16 a.m.

New subject: Implementing large changes in small increments

On 03/06/2015 11:06 AM, Ben Finney wrote:

...

That's “small changes only, otherwise your change gets rejected”, of course.

Yes, otherwise submitting a patch that replaces the entire source code of Python with Ruby would be a sure-fire way to get it accepted. :-) -- Greg

Ian Cordasco

2:51 a.m.

New subject: Implementing large changes in small increments

Wait, I have an idea. Let's rewrite pip in Rust! ;) On Thu, Mar 5, 2015 at 7:16 PM, Greg Ewing wrote:

...

On 03/06/2015 11:06 AM, Ben Finney wrote:

...
That's “small changes only, otherwise your change gets rejected”, of course.

Yes, otherwise submitting a patch that replaces the entire source code of Python with Ruby would be a sure-fire way to get it accepted. :-)

-- Greg

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

anatoly techtonik

8:17 a.m.

New subject: Implementing large changes in small increments

Stealing some packaging code from Go and tweaking may not be that bad idea. =) At least they use code review system to let new people learn and old people share. On Fri, Mar 6, 2015 at 5:51 AM, Ian Cordasco wrote:

...

Wait, I have an idea. Let's rewrite pip in Rust! ;)

On Thu, Mar 5, 2015 at 7:16 PM, Greg Ewing wrote:

...
On 03/06/2015 11:06 AM, Ben Finney wrote:

...
That's “small changes only, otherwise your change gets rejected”, of course.

Yes, otherwise submitting a patch that replaces the entire source code of Python with Ruby would be a sure-fire way to get it accepted. :-)

-- Greg

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

-- anatoly t.

Ian Cordasco

10:40 p.m.

New subject: Implementing large changes in small increments

On Fri, Mar 6, 2015 at 2:17 AM, anatoly techtonik wrote:

...

Stealing some packaging code from Go and tweaking may not be that bad idea. =) At least they use code review system to let new people learn and old people share.

Yep and most of the people working on Go are paid to do so. Sharing information through code review is necessary inside a corporation and worth being fired over if you don't.

...

On Fri, Mar 6, 2015 at 5:51 AM, Ian Cordasco wrote:

...
Wait, I have an idea. Let's rewrite pip in Rust! ;)

On Thu, Mar 5, 2015 at 7:16 PM, Greg Ewing wrote:

...
On 03/06/2015 11:06 AM, Ben Finney wrote:

...
That's “small changes only, otherwise your change gets rejected”, of course.

Yes, otherwise submitting a patch that replaces the entire source code of Python with Ruby would be a sure-fire way to get it accepted. :-)

-- Greg

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

-- anatoly t.

Nick Coghlan

11:37 a.m.

New subject: Implementing large changes in small increments (was: Getting more momentum for pip)

On 6 March 2015 at 08:00, Ben Finney wrote:

...

Donald Stufft writes:

...
Sadly with how the code in pip is written, sometimes it’s just not reasonable to make small PRs because things are not well factored and changing things requires touching a lot of different areas.

I've seen a number of other projects enforce “small revisions only, otherwise your change gets accepted”. If actually enforced, it is a highly successful way to get meaningful review of changes, and does not appear to limit the scope of the eventual change.

What does end up happening in such projects (e.g., Linux) is the community learns how to – and teaches newcomers how to – implement large changes as smaller refactorings, each of which results in a working system.

This is: a) a really good idea; and b) really painful without good tooling support Linux does it via emailed patchbombs, as do a lot of other open source projects which don't have a separate code review tool. That works if your contributors are used to *consuming* patches that way, but inapplicable to projects used to web based reviews. CPython uses the Reitveld instance integrated with bugs.python.org, and has the same problem as pip: incremental changes are a pain to publish, review, and merge, so we review and accept monolithic patches instead (cf the problem statement in https://www.python.org/dev/peps/pep-0462/) While the main UI is very busy, I've actually quite liked my own experience with Gerrit for http://gerrit.beaker-project.org/ (I was the dev lead for Red Hat's Beaker hardware integration testing system from Oct 2012 until mid 2014, and the product owner until a couple of weeks ago). I've never used Gerrit in the OpenStack context though, so I don't know if Donald dislikes Gerrit in its own right, or just the way OpenStack uses it. That means one option potentially worth exploring might be http://gerrithub.io/. I haven't used GerritHub yet myself, but I'm pretty sure it lets you mix & match between GitHub PRs for simple changes and GerritHub reviews for more complex ones. The Beaker workflow is an example of vanilla Gerrit usage, rather than using OpenStack's custom fork: https://beaker-project.org/dev/guide/writing-a-patch.html#submitting-your-pa... http://gerrit.beaker-project.org/#/c/4025 is an example of a fairly deep patch stack, where each patch can be reviewed independently, but later patches won't be merged until after earlier ones have been submitted. (Rebasing support is also baked directly into the tool) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ben Finney

12:09 p.m.

New subject: Implementing large changes in small increments

Nick Coghlan writes:

...

CPython uses the Reitveld instance integrated with bugs.python.org, and has the same problem as pip: incremental changes are a pain to publish, review, and merge, so we review and accept monolithic patches instead (cf the problem statement in https://www.python.org/dev/peps/pep-0462/)

Fair enough. I don't know of a good code review tool for Mercurial.

...

While the main UI is very busy, I've actually quite liked my own experience with Gerrit for http://gerrit.beaker-project.org/

My understanding is that Gerrit makes it tedious to review a sequence of revisions, in proportion to the number of revisions in the sequence. If I understand correctly, such a sequence must have separate reviews for every revision, and an aggregate of all the changes is not available to the reviewer. I'm impressed by GitLab's code review tool UI; see an example at URL:https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/344/diffs. The merge request page has tabs for the discussion, the commit log, and the overall diff – and you choose from inline diff or side-by-side diff. GitLab is free software, including all its tools; anyone can set up a GitLab instance and the project data can move from one instance to another without loss. For the purposes of the past thread where some proposed migrating to the proprietary lock-in site GitHub, those objections don't exist with GitLab: a project can migrate to a different host and keep all the valuable data it accumulated. A move to GitLab would be unobjectionable, in my view. That it has good code review features would help the issues in this thread too. If anyone knows of equivalent hosting for Mercurial with equivalent code review tools under free-software terms with no lock-in, that would be even better I think. -- \ “Don't be misled by the enormous flow of money into bad defacto | `\ standards for unsophisticated buyers using poor adaptations of | _o__) incomplete ideas.” —Alan Kay | Ben Finney

Nick Coghlan

10:04 p.m.

New subject: Implementing large changes in small increments

On 6 Mar 2015 22:10, "Ben Finney" wrote:

...

Nick Coghlan writes:

...
CPython uses the Reitveld instance integrated with bugs.python.org, and has the same problem as pip: incremental changes are a pain to publish, review, and merge, so we review and accept monolithic patches instead (cf the problem statement in https://www.python.org/dev/peps/pep-0462/)

Fair enough. I don't know of a good code review tool for Mercurial.

I'd like to ensure Kallithea fits that bill, but the actual work on that seems to mostly be driven by the folks at Unity3D at the moment. In the meantime, Phabricator is a decent choice if you just want to use an existing GitHub independent tool that works with either git or Mercurial. pip adopting that workflow would also be a good proof of concept for Donald's proposal to also adopt that workflow for CPython (or at least its support repos).

...

...
While the main UI is very busy, I've actually quite liked my own experience with Gerrit for http://gerrit.beaker-project.org/

My understanding is that Gerrit makes it tedious to review a sequence of revisions, in proportion to the number of revisions in the sequence.

When the goal is to break a change up into small, independently reviewable changes that's generally a feature rather than a defect :)

...

If I understand correctly, such a sequence must have separate reviews for every revision, and an aggregate of all the changes is not available to the reviewer.

Correct, but my understanding is that when using it in tandem with GitHub, there's nothing stopping you from also submitting a PR if a reviewer wants an all-inclusive view.

...

I'm impressed by GitLab's code review tool UI; see an example at URL:https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/344/diffs. The merge request page has tabs for the discussion, the commit log, and the overall diff – and you choose from inline diff or side-by-side diff.

GitLab is free software, including all its tools; anyone can set up a GitLab instance and the project data can move from one instance to another without loss. For the purposes of the past thread where some proposed migrating to the proprietary lock-in site GitHub, those objections don't exist with GitLab: a project can migrate to a different host and keep all the valuable data it accumulated.

A move to GitLab would be unobjectionable, in my view. That it has good code review features would help the issues in this thread too.

It doesn't have the integration with other services and the low barriers to contribution that are the main reasons a lot of projects prefer GitHub. Of course, when your problem is already "we're receiving more contributions than we can process effectively", deciding to require a slightly higher level of engagement in order to submit a change for consideration isn't necessarily a bad thing :)

...

If anyone knows of equivalent hosting for Mercurial with equivalent code review tools under free-software terms with no lock-in, that would be even better I think.

That's what I'd like forge.python.org to eventually be for the core Python ecosystem, but we don't know yet whether that's going to be an entirely self-hosted Kallithea instance (my preference) or a Phabricator instance backed by GitHub (Donald's preference). Hence my suggestion that a "forge.pypa.io" Phabricator instance might be an interesting thing to set up and start using for pip. Donald's already done the research on that in the context of https://www.python.org/dev/peps/pep-0481/ and for pip that's a matter of "just add Phabricator" without having to migrate anything (except perhaps the issues if folks wanted to do that). Cheers, Nick.

...

-- \ “Don't be misled by the enormous flow of money into bad defacto | `\ standards for unsophisticated buyers using poor adaptations of | _o__) incomplete ideas.” —Alan Kay | Ben Finney

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Ian Cordasco

10:43 p.m.

New subject: Implementing large changes in small increments

On Fri, Mar 6, 2015 at 4:04 PM, Nick Coghlan wrote:

...

On 6 Mar 2015 22:10, "Ben Finney" wrote:

...
Nick Coghlan writes:

...
CPython uses the Reitveld instance integrated with bugs.python.org, and has the same problem as pip: incremental changes are a pain to publish, review, and merge, so we review and accept monolithic patches instead (cf the problem statement in https://www.python.org/dev/peps/pep-0462/)

Fair enough. I don't know of a good code review tool for Mercurial.

I'd like to ensure Kallithea fits that bill, but the actual work on that seems to mostly be driven by the folks at Unity3D at the moment.

In the meantime, Phabricator is a decent choice if you just want to use an existing GitHub independent tool that works with either git or Mercurial. pip adopting that workflow would also be a good proof of concept for Donald's proposal to also adopt that workflow for CPython (or at least its support repos).

...
...
While the main UI is very busy, I've actually quite liked my own experience with Gerrit for http://gerrit.beaker-project.org/

My understanding is that Gerrit makes it tedious to review a sequence of revisions, in proportion to the number of revisions in the sequence.

When the goal is to break a change up into small, independently reviewable changes that's generally a feature rather than a defect :)

...
If I understand correctly, such a sequence must have separate reviews for every revision, and an aggregate of all the changes is not available to the reviewer.

Correct, but my understanding is that when using it in tandem with GitHub, there's nothing stopping you from also submitting a PR if a reviewer wants an all-inclusive view.

...
I'm impressed by GitLab's code review tool UI; see an example at URL:https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/344/diffs. The merge request page has tabs for the discussion, the commit log, and the overall diff – and you choose from inline diff or side-by-side diff.

GitLab is free software, including all its tools; anyone can set up a GitLab instance and the project data can move from one instance to another without loss. For the purposes of the past thread where some proposed migrating to the proprietary lock-in site GitHub, those objections don't exist with GitLab: a project can migrate to a different host and keep all the valuable data it accumulated.

A move to GitLab would be unobjectionable, in my view. That it has good code review features would help the issues in this thread too.

It doesn't have the integration with other services and the low barriers to contribution that are the main reasons a lot of projects prefer GitHub.

Of course, when your problem is already "we're receiving more contributions than we can process effectively", deciding to require a slightly higher level of engagement in order to submit a change for consideration isn't necessarily a bad thing :)

...
If anyone knows of equivalent hosting for Mercurial with equivalent code review tools under free-software terms with no lock-in, that would be even better I think.

That's what I'd like forge.python.org to eventually be for the core Python ecosystem, but we don't know yet whether that's going to be an entirely self-hosted Kallithea instance (my preference) or a Phabricator instance backed by GitHub (Donald's preference).

Hence my suggestion that a "forge.pypa.io" Phabricator instance might be an interesting thing to set up and start using for pip. Donald's already done the research on that in the context of https://www.python.org/dev/peps/pep-0481/ and for pip that's a matter of "just add Phabricator" without having to migrate anything (except perhaps the issues if folks wanted to do that).

Cheers, Nick.

...
-- \ “Don't be misled by the enormous flow of money into bad defacto | `\ standards for unsophisticated buyers using poor adaptations of | _o__) incomplete ideas.” —Alan Kay | Ben Finney

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

I'm fairly concerned that what has turned into a "how can we increase the feedback received for people submitting pull requests" has turned into a bike shed moment for using F/LOSS tooling instead of GitHub when the cores who actually work on the project have already expressed a disinterest in moving and a satisfaction with GitHub. GitLab's UI would do nothing to improve review management. Phabricator, while nice, again adds yet another layer to the piece for new contributors to involve themselves in. GitHub is one monolith and closed source (and a company with culture problems) but that doesn't change the fact that it's the core developers choice what software to use and they've (for the time being) chosen GitHub. Can we please stop this discussion already? It's no longer beneficial or relevant.

Donald Stufft

11:01 p.m.

New subject: Implementing large changes in small increments

...

On Mar 6, 2015, at 5:43 PM, Ian Cordasco wrote:

On Fri, Mar 6, 2015 at 4:04 PM, Nick Coghlan wrote:

...
On 6 Mar 2015 22:10, "Ben Finney" wrote:

...
Nick Coghlan writes:

...
CPython uses the Reitveld instance integrated with bugs.python.org, and has the same problem as pip: incremental changes are a pain to publish, review, and merge, so we review and accept monolithic patches instead (cf the problem statement in https://www.python.org/dev/peps/pep-0462/)

Fair enough. I don't know of a good code review tool for Mercurial.

I'd like to ensure Kallithea fits that bill, but the actual work on that seems to mostly be driven by the folks at Unity3D at the moment.

In the meantime, Phabricator is a decent choice if you just want to use an existing GitHub independent tool that works with either git or Mercurial. pip adopting that workflow would also be a good proof of concept for Donald's proposal to also adopt that workflow for CPython (or at least its support repos).

...
...
While the main UI is very busy, I've actually quite liked my own experience with Gerrit for http://gerrit.beaker-project.org/

My understanding is that Gerrit makes it tedious to review a sequence of revisions, in proportion to the number of revisions in the sequence.

When the goal is to break a change up into small, independently reviewable changes that's generally a feature rather than a defect :)

...
If I understand correctly, such a sequence must have separate reviews for every revision, and an aggregate of all the changes is not available to the reviewer.

Correct, but my understanding is that when using it in tandem with GitHub, there's nothing stopping you from also submitting a PR if a reviewer wants an all-inclusive view.

...
I'm impressed by GitLab's code review tool UI; see an example at URL:https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/344/diffs. The merge request page has tabs for the discussion, the commit log, and the overall diff – and you choose from inline diff or side-by-side diff.

GitLab is free software, including all its tools; anyone can set up a GitLab instance and the project data can move from one instance to another without loss. For the purposes of the past thread where some proposed migrating to the proprietary lock-in site GitHub, those objections don't exist with GitLab: a project can migrate to a different host and keep all the valuable data it accumulated.

A move to GitLab would be unobjectionable, in my view. That it has good code review features would help the issues in this thread too.

It doesn't have the integration with other services and the low barriers to contribution that are the main reasons a lot of projects prefer GitHub.

Of course, when your problem is already "we're receiving more contributions than we can process effectively", deciding to require a slightly higher level of engagement in order to submit a change for consideration isn't necessarily a bad thing :)

...
If anyone knows of equivalent hosting for Mercurial with equivalent code review tools under free-software terms with no lock-in, that would be even better I think.

That's what I'd like forge.python.org to eventually be for the core Python ecosystem, but we don't know yet whether that's going to be an entirely self-hosted Kallithea instance (my preference) or a Phabricator instance backed by GitHub (Donald's preference).

Hence my suggestion that a "forge.pypa.io" Phabricator instance might be an interesting thing to set up and start using for pip. Donald's already done the research on that in the context of https://www.python.org/dev/peps/pep-0481/ and for pip that's a matter of "just add Phabricator" without having to migrate anything (except perhaps the issues if folks wanted to do that).

Cheers, Nick.

...
-- \ “Don't be misled by the enormous flow of money into bad defacto | `\ standards for unsophisticated buyers using poor adaptations of | _o__) incomplete ideas.” —Alan Kay | Ben Finney

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

I'm fairly concerned that what has turned into a "how can we increase the feedback received for people submitting pull requests" has turned into a bike shed moment for using F/LOSS tooling instead of GitHub when the cores who actually work on the project have already expressed a disinterest in moving and a satisfaction with GitHub.

GitLab's UI would do nothing to improve review management.

Phabricator, while nice, again adds yet another layer to the piece for new contributors to involve themselves in. GitHub is one monolith and closed source (and a company with culture problems) but that doesn't change the fact that it's the core developers choice what software to use and they've (for the time being) chosen GitHub. Can we please stop this discussion already? It's no longer beneficial or relevant. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Tooling wise, Github PRs work well for us. I don’t (and I don’t believe that any of the other core devs) have any major issues with them. Github issues on the other hand, they function “OK” but it would be nice to have something that we can allow anyone to modify the state of tickets to help with triage. However even this isn’t a super pressing concern because our ticket count is small enough that I don’t think there’s likely to be too many to be handled by people commenting on issues and a core team coming in to change things. However if someone has a proposal for a different issue tracker (and plans for how to migrate to it), personally I’d be willing to listen. F/OSS tooling is nice, but I honestly care a whole lot less about that and a lot more about whatever tooling is the most effective for us to get the job done. This can include hosted services (and possibly even hosted services that cost money). Written in Python is also nice, but again I honestly don’t care about that nearly as much as I care about the tooling being effective. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

11:15 p.m.

New subject: Implementing large changes in small increments

On 6 March 2015 at 23:01, Donald Stufft wrote:

...

Tooling wise, Github PRs work well for us. I don’t (and I don’t believe that any of the other core devs) have any major issues with them.

Github issues on the other hand, they function “OK” but it would be nice to have something that we can allow anyone to modify the state of tickets to help with triage. However even this isn’t a super pressing concern because our ticket count is small enough that I don’t think there’s likely to be too many to be handled by people commenting on issues and a core team coming in to change things. However if someone has a proposal for a different issue tracker (and plans for how to migrate to it), personally I’d be willing to listen.

I'm also fine with github. I don't have an issue with the issue tracker, although as Donald says it would be helpful if it had a concept of "tracker privileges" separate from "core committer". But that's *not* a big enough concern to me that I'd want to go to a different tool. Paul

Ian Cordasco

11:53 p.m.

New subject: Implementing large changes in small increments

Has PyPA considered contacting GitHub support? I'm happy to do the same since I've wanted this for a while myself on other projects. On Fri, Mar 6, 2015 at 5:15 PM, Paul Moore wrote:

...

On 6 March 2015 at 23:01, Donald Stufft wrote:

...
Tooling wise, Github PRs work well for us. I don’t (and I don’t believe that any of the other core devs) have any major issues with them.

Github issues on the other hand, they function “OK” but it would be nice to have something that we can allow anyone to modify the state of tickets to help with triage. However even this isn’t a super pressing concern because our ticket count is small enough that I don’t think there’s likely to be too many to be handled by people commenting on issues and a core team coming in to change things. However if someone has a proposal for a different issue tracker (and plans for how to migrate to it), personally I’d be willing to listen.

I'm also fine with github. I don't have an issue with the issue tracker, although as Donald says it would be helpful if it had a concept of "tracker privileges" separate from "core committer". But that's *not* a big enough concern to me that I'd want to go to a different tool.

Paul

Nick Coghlan

7 Mar 7 Mar

3:13 p.m.

New subject: Implementing large changes in small increments

On 7 March 2015 at 09:53, Ian Cordasco wrote:

...

Has PyPA considered contacting GitHub support? I'm happy to do the same since I've wanted this for a while myself on other projects.

I have some indirect contacts as well where I'd be happy to pass this request on - so consider that done (it may not go anywhere, but there's no harm in asking). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Marc Abramowitz

5:19 p.m.

New subject: Implementing large changes in small increments

...

I have some indirect contacts as well where I'd be happy to pass this request on - so consider that done (it may not go anywhere, but there's no harm in asking).

Yep, can't hurt. We could also try the social networking thing. https://twitter.com/msabramo/status/574256914478977025

Nick Coghlan

3:11 p.m.

New subject: Implementing large changes in small increments

On 7 March 2015 at 09:01, Donald Stufft wrote:

...

F/OSS tooling is nice, but I honestly care a whole lot less about that and a lot more about whatever tooling is the most effective for us to get the job done. This can include hosted services (and possibly even hosted services that cost money). Written in Python is also nice, but again I honestly don’t care about that nearly as much as I care about the tooling being effective.

Right, that's why I suggested GerritHub or Phabricator as possibilities for consideration, based on my interpretation of some of the concerns raised, since they both allow the GitHub repos to remain the "single source of truth", while adding some additional process options around them. However, it sounds like there aren't any current major tooling issues aside from GitHub's lack of support for a "Triager" level of permissions, so even the idea of potentially adopting your own suggested Phabricator+GitHub approach wouldn't rank very high on pip's process improvement list at this point. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Jeremy Stanley

6 Mar 6 Mar

1:55 p.m.

New subject: Implementing large changes in small increments (was: Getting more momentum for pip)

On 2015-03-06 21:37:31 +1000 (+1000), Nick Coghlan wrote: [...]

...

I've never used Gerrit in the OpenStack context though, so I don't know if Donald dislikes Gerrit in its own right, or just the way OpenStack uses it. [...]

Having talked with him about it regularly, I gather that he (and others) dislike the Gerrit/LKML "rebase, revise and refine your patch" workflow, instead preferring a Github-like "incrementally build on your pull request with new commits" workflow... though presumably he can explain it in better detail. In my experience it comes down to a trade-off where the Github model is easier on patch submitters because they can just keep piling fixes for their pull request on top if it until the corresponding topic branch is suitable to merge, while the Gerrit model is easier on reviewers because they're reviewing a patch in context rather than a topic branch.

...

The Beaker workflow is an example of vanilla Gerrit usage, rather than using OpenStack's custom fork: [...]

OpenStack hasn't been running a fork of Gerrit since upgrading to 2.8 back in April 2014 (modulo a few simple backports from 2.9), and has plans to upgrade to 2.9 next month or the month after. That's not to say that there isn't a bunch of additional tooling and automation built up around it (the Zuul CI system in particular) but aside from some minimal theming and including a little Javascript to tie outside data sources into the interface it's just plain Gerrit. -- Jeremy Stanley

Donald Stufft

4:22 p.m.

New subject: Implementing large changes in small increments (was: Getting more momentum for pip)

...

On Mar 6, 2015, at 8:55 AM, Jeremy Stanley wrote:

On 2015-03-06 21:37:31 +1000 (+1000), Nick Coghlan wrote: [...]

...
I've never used Gerrit in the OpenStack context though, so I don't know if Donald dislikes Gerrit in its own right, or just the way OpenStack uses it. [...]

Having talked with him about it regularly, I gather that he (and others) dislike the Gerrit/LKML "rebase, revise and refine your patch" workflow, instead preferring a Github-like "incrementally build on your pull request with new commits" workflow... though presumably he can explain it in better detail.

In my experience it comes down to a trade-off where the Github model is easier on patch submitters because they can just keep piling fixes for their pull request on top if it until the corresponding topic branch is suitable to merge, while the Gerrit model is easier on reviewers because they're reviewing a patch in context rather than a topic branch.

...
The Beaker workflow is an example of vanilla Gerrit usage, rather than using OpenStack's custom fork: [...]

OpenStack hasn't been running a fork of Gerrit since upgrading to 2.8 back in April 2014 (modulo a few simple backports from 2.9), and has plans to upgrade to 2.9 next month or the month after. That's not to say that there isn't a bunch of additional tooling and automation built up around it (the Zuul CI system in particular) but aside from some minimal theming and including a little Javascript to tie outside data sources into the interface it's just plain Gerrit. -- Jeremy Stanley _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

In general I’m fine with Gerrit (or Gerrit like systems). I think Gerrit has a crappy looking interface, and I think that the interface is harder to use than GitHub but the process itself doesn’t bother me much. I do think that it would be better if it would review whole PRs instead of individual commits (and if you want to squash them, the tool should do a squash merge). I don’t think that pip’s problems are ones that would be solved by switching to a different code review tool. GitHub functions well for that task, we don’t require multiple core reviewers to agree, only one, so the merge button is functionally equivalent to a +1 button and then having a machine later do the merge. Our velocity isn’t near high enough to where we need separate check and merge gating or anything like that. I would be against moving away from GitHub for PRs without a really compelling reason, GitHub PRs are easy to use, and it’s popular. We reduce the barrier to entry to contributing by making our process the same as every other project on GitHub’s. A better test suite and a more comprehensive CI system is where most of our tooling problems are. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

7 Mar 7 Mar

3:04 p.m.

New subject: Implementing large changes in small increments (was: Getting more momentum for pip)

On 7 March 2015 at 02:22, Donald Stufft wrote:

...

A better test suite and a more comprehensive CI system is where most of our tooling problems are.

For the cross platform CI problem, we could likely set up post-merge CI on the CPython buildbot fleet. We trust the pip team to run code there anyway (courtesy of ensurepip), but those are persistent systems, so we wouldn't want to run every PR through them. That would put you in a situation where pre-merge CI is at least giving you a check nothing is fundamentally broken, while post-merge CI would check you haven't broken any *other* environments. Working on enabling that may also be a good opportunity to finally hook the CPython Buildbot master up with the credentials it needs to run ephemeral clients on Rackspace: http://docs.buildbot.net/latest/manual/cfg-buildslaves-openstack.html Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

anatoly techtonik

6 Mar 6 Mar

8:10 a.m.

On Thu, Mar 5, 2015 at 11:11 PM, Ian Cordasco wrote:

...

And for CI, we need people who will help with the windows CI solution on more than one front clearly.

https://ci.appveyor.com/ works for open source projects. -- anatoly t.

Ian Cordasco

1:38 p.m.

Anatoly, We already ruled out AppVeyor On Mar 6, 2015 2:11 AM, "anatoly techtonik" wrote:

...

On Thu, Mar 5, 2015 at 11:11 PM, Ian Cordasco wrote:

...
And for CI, we need people who will help with the windows CI solution on more than one front clearly.

https://ci.appveyor.com/ works for open source projects.

-- anatoly t.

Paul Moore

5 Mar 5 Mar

7:34 p.m.

On 5 March 2015 at 18:58, Donald Stufft wrote:

...

Yet another issue is that pip’s test suite is not particularly very good. We’re missing a lot of coverage and we don’t have *any* CI running on platforms other than Ubuntu. This means that merging things is somewhat “dangerous” because it’s easy to break things without noticing unless you pull down the change and manually test things you can try. Even then that’s not good enough unless you can test it on other platforms as well. I’m sure Paul can fill in the blank on how often the test suite simply doesn’t run on Windows because of some POSIX assumption snuck in somewhere.

The test suite is pretty much broken on Windows, from what I recall. I intended at one point to try to get it running cleanly on Windows, but it was soul-destroyingly hard work, and I never got very far with it. Given that there's no good Windows CI service (Appveyor is great but it's very slow even on simple projects, and I think it has limits on how long test suites can run so I'm not even sure we could run pip's tests on it) I fear that any work done getting the test suite working on Windows would pretty quickly regress... If there was one thing on the infrastructure and support side of things that would help enormously, it would be someone setting up CI services for more platforms - Windows in particular, but things like the ancient RHEL systems that people keep having issues on would also be good. And resource willing to get the test suite working on those platforms. (I'd be happy to help someone work on fixing the test suite on Windows, but I really don't have the time to do it all myself).

...

Other things that would help are:

* People doing in-depth reviews of the current PRs that are there and suggesting changes or pointing out issues, etc.

Very much so. Anyone can add review comments to PRs. We could (and probably should) document some quality guidelines for PRs (must include a test, must include docs if there's a user-visible impact, must be cross-platform, must work on Python 2 and 3). Having more people who can test PRs on a wider range of platforms (Windows!!!) would be great too - a simple comment "checked and confirmed on Windows" is a great help.

...

* People triaging issues (unfortunately this one isn’t super easy with GitHub Issues since you have to be a committer to change these things).

Hmm, that's a problem - but yes, even if they can only add comments, saying "Please close as unreproducible", "Duplicate of XXX", "Please add label YYY" would be helpful. The committers could trawl such comments occasionally and action them.

...

* People going through and reviewing old issues and PRs to try and figure out if the situations that caused them to be opened originally still apply or if that problem has been fixed or if the code has changed significantly enough that it’s likely to not longer exist.

Oh yes, please. And in particular, the old issues with repeated "+1" or "me, too" comments, ping all the people who said "me too" and ask them if they can provide a patch. And weeding out issues that only apply when using ancient versions of setuptools, that sort of thing.

...

These sorts of things would make it *much* easier to merge new things because there would be less risk and less things involved in actually going through and figuring out if any particular merge is a good idea or not. I also think that people willing to put in the work to do things like this would be good candidates for becoming core developers themselves, which would also help by increasing the number of people we have able to review and commit.

+1 Paul

Donald Stufft

7:53 p.m.

...

On Mar 5, 2015, at 2:34 PM, Paul Moore wrote:

On 5 March 2015 at 18:58, Donald Stufft wrote:

...
Yet another issue is that pip’s test suite is not particularly very good. We’re missing a lot of coverage and we don’t have *any* CI running on platforms other than Ubuntu. This means that merging things is somewhat “dangerous” because it’s easy to break things without noticing unless you pull down the change and manually test things you can try. Even then that’s not good enough unless you can test it on other platforms as well. I’m sure Paul can fill in the blank on how often the test suite simply doesn’t run on Windows because of some POSIX assumption snuck in somewhere.

The test suite is pretty much broken on Windows, from what I recall. I intended at one point to try to get it running cleanly on Windows, but it was soul-destroyingly hard work, and I never got very far with it. Given that there's no good Windows CI service (Appveyor is great but it's very slow even on simple projects, and I think it has limits on how long test suites can run so I'm not even sure we could run pip's tests on it) I fear that any work done getting the test suite working on Windows would pretty quickly regress...

If there was one thing on the infrastructure and support side of things that would help enormously, it would be someone setting up CI services for more platforms - Windows in particular, but things like the ancient RHEL systems that people keep having issues on would also be good. And resource willing to get the test suite working on those platforms. (I'd be happy to help someone work on fixing the test suite on Windows, but I really don't have the time to do it all myself).

Yea, I wouldn’t personally put effort into fixing the test suite on Windows without something in place to ensure it doesn’t break again. I know the folks behind Travis CI. I know they were looking at adding Windows support and said that if someone can write a go app that boots up a Windows Azure instance and then uses winrm to run a command on it, even echo, that would get them a big step of the way towards being able to support it. OSX and lots of other POSIX based systems are also not covered of course, it would be great to setup something that other people can contribute build machines to. We can spin up anything on a Rackspace cloud (and probably other clouds too) but some things people might care about (AIX?) we can’t do. I think it wouldn’t be unreasonable to say that for things we can’t run a builder ourselves for, that people who care about that platform needs to provide us with a suitable instance. None of that matters much though without something that allows us to run tests on more platforms than whatever Travis provides us. Ideally this would support PR based testing (which means it needs some sort of VM or isolation support to do it securely) but if some platforms can’t be easily virtualized like that then a post merge trigger is acceptable too.

...

...
Other things that would help are:

* People doing in-depth reviews of the current PRs that are there and suggesting changes or pointing out issues, etc.

Very much so. Anyone can add review comments to PRs. We could (and probably should) document some quality guidelines for PRs (must include a test, must include docs if there's a user-visible impact, must be cross-platform, must work on Python 2 and 3). Having more people who can test PRs on a wider range of platforms (Windows!!!) would be great too - a simple comment "checked and confirmed on Windows" is a great help.

Yea, I don’t have a Windows machine so I’m often times just guessing if it works on Windows, or pinging you for it. For major things I can spin up a Windows VM but that takes a good 30+ minutes to do which again goes back to that our current setup has a lot of time wasters for pip core.

...

...
* People triaging issues (unfortunately this one isn’t super easy with GitHub Issues since you have to be a committer to change these things).

Hmm, that's a problem - but yes, even if they can only add comments, saying "Please close as unreproducible", "Duplicate of XXX", "Please add label YYY" would be helpful. The committers could trawl such comments occasionally and action them.

I personally get emails for every issue, closing duplicates or adding labels and such is something that takes 15 seconds to do if someone leaves a comment like that. Another option is to move our issue tracker off of Github into something else that supports non-committers being able to manage the issue tracker. Empowering non core to do more things is another thing that would be useful and requires someone to take the time to figure out how we can best do that (switch away from GH issues? To What?) and then actually do the work to make it happen (create salt states to deploy, create scripts to migrate etc).

...

...
* People going through and reviewing old issues and PRs to try and figure out if the situations that caused them to be opened originally still apply or if that problem has been fixed or if the code has changed significantly enough that it’s likely to not longer exist.

Oh yes, please. And in particular, the old issues with repeated "+1" or "me, too" comments, ping all the people who said "me too" and ask them if they can provide a patch. And weeding out issues that only apply when using ancient versions of setuptools, that sort of thing.

I try to do this periodically but I mostly only do the ones that I can tell from a glance that are safe to close.

...

...
These sorts of things would make it *much* easier to merge new things because there would be less risk and less things involved in actually going through and figuring out if any particular merge is a good idea or not. I also think that people willing to put in the work to do things like this would be good candidates for becoming core developers themselves, which would also help by increasing the number of people we have able to review and commit.

+1

Paul

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft

7:58 p.m.

...

On Mar 5, 2015, at 2:53 PM, Donald Stufft wrote:

...
On Mar 5, 2015, at 2:34 PM, Paul Moore wrote:

On 5 March 2015 at 18:58, Donald Stufft wrote:

...
Yet another issue is that pip’s test suite is not particularly very good. We’re missing a lot of coverage and we don’t have *any* CI running on platforms other than Ubuntu. This means that merging things is somewhat “dangerous” because it’s easy to break things without noticing unless you pull down the change and manually test things you can try. Even then that’s not good enough unless you can test it on other platforms as well. I’m sure Paul can fill in the blank on how often the test suite simply doesn’t run on Windows because of some POSIX assumption snuck in somewhere.

The test suite is pretty much broken on Windows, from what I recall. I intended at one point to try to get it running cleanly on Windows, but it was soul-destroyingly hard work, and I never got very far with it. Given that there's no good Windows CI service (Appveyor is great but it's very slow even on simple projects, and I think it has limits on how long test suites can run so I'm not even sure we could run pip's tests on it) I fear that any work done getting the test suite working on Windows would pretty quickly regress...

If there was one thing on the infrastructure and support side of things that would help enormously, it would be someone setting up CI services for more platforms - Windows in particular, but things like the ancient RHEL systems that people keep having issues on would also be good. And resource willing to get the test suite working on those platforms. (I'd be happy to help someone work on fixing the test suite on Windows, but I really don't have the time to do it all myself).

Yea, I wouldn’t personally put effort into fixing the test suite on Windows without something in place to ensure it doesn’t break again.

I know the folks behind Travis CI. I know they were looking at adding Windows support and said that if someone can write a go app that boots up a Windows Azure instance and then uses winrm to run a command on it, even echo, that would get them a big step of the way towards being able to support it.

OSX and lots of other POSIX based systems are also not covered of course, it would be great to setup something that other people can contribute build machines to. We can spin up anything on a Rackspace cloud (and probably other clouds too) but some things people might care about (AIX?) we can’t do. I think it wouldn’t be unreasonable to say that for things we can’t run a builder ourselves for, that people who care about that platform needs to provide us with a suitable instance.

None of that matters much though without something that allows us to run tests on more platforms than whatever Travis provides us. Ideally this would support PR based testing (which means it needs some sort of VM or isolation support to do it securely) but if some platforms can’t be easily virtualized like that then a post merge trigger is acceptable too.

...
...
Other things that would help are:

* People doing in-depth reviews of the current PRs that are there and suggesting changes or pointing out issues, etc.

Very much so. Anyone can add review comments to PRs. We could (and probably should) document some quality guidelines for PRs (must include a test, must include docs if there's a user-visible impact, must be cross-platform, must work on Python 2 and 3). Having more people who can test PRs on a wider range of platforms (Windows!!!) would be great too - a simple comment "checked and confirmed on Windows" is a great help.

Yea, I don’t have a Windows machine so I’m often times just guessing if it works on Windows, or pinging you for it. For major things I can spin up a Windows VM but that takes a good 30+ minutes to do which again goes back to that our current setup has a lot of time wasters for pip core.

...
...
* People triaging issues (unfortunately this one isn’t super easy with GitHub Issues since you have to be a committer to change these things).

Hmm, that's a problem - but yes, even if they can only add comments, saying "Please close as unreproducible", "Duplicate of XXX", "Please add label YYY" would be helpful. The committers could trawl such comments occasionally and action them.

I personally get emails for every issue, closing duplicates or adding labels and such is something that takes 15 seconds to do if someone leaves a comment like that. Another option is to move our issue tracker off of Github into something else that supports non-committers being able to manage the issue tracker. Empowering non core to do more things is another thing that would be useful and requires someone to take the time to figure out how we can best do that (switch away from GH issues? To What?) and then actually do the work to make it happen (create salt states to deploy, create scripts to migrate etc).

Oh, and another thing that would empower non core to do things is either figuring out a way to allow non core to restart Travis CI builds (whether this is something we run that interacts with the Travis CI API, or helping get Travis CI to support authors restarting builds on their own PRs or whatever) or coming up with a proposal for something we can use instead of Travis that solves some of these issues. Yet another improvement would be to figure out how we can make the test suite not randomly fail as much. Likely this is going to involve finding the places where we’re reaching out to places on the internet (PyPI, Github, etc) and instead mocking out that interaction or making a test server that provides whatever we’re using that external service for and then spinning up a copy of that test server whenever we run the tests and run it against that instead.

...

...
...
* People going through and reviewing old issues and PRs to try and figure out if the situations that caused them to be opened originally still apply or if that problem has been fixed or if the code has changed significantly enough that it’s likely to not longer exist.

Oh yes, please. And in particular, the old issues with repeated "+1" or "me, too" comments, ping all the people who said "me too" and ask them if they can provide a patch. And weeding out issues that only apply when using ancient versions of setuptools, that sort of thing.

I try to do this periodically but I mostly only do the ones that I can tell from a glance that are safe to close.

...
...
These sorts of things would make it *much* easier to merge new things because there would be less risk and less things involved in actually going through and figuring out if any particular merge is a good idea or not. I also think that people willing to put in the work to do things like this would be good candidates for becoming core developers themselves, which would also help by increasing the number of people we have able to review and commit.

+1

Paul

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Paul Moore

8:09 p.m.

On 5 March 2015 at 19:53, Donald Stufft wrote:

...

...
Hmm, that's a problem - but yes, even if they can only add comments, saying "Please close as unreproducible", "Duplicate of XXX", "Please add label YYY" would be helpful. The committers could trawl such comments occasionally and action them.

I personally get emails for every issue, closing duplicates or adding labels and such is something that takes 15 seconds to do if someone leaves a comment like that.

Ditto. The only real slowdown is considering whether I trust the opinion of whoever added the comment. And as people contribute more, that becomes progressively easier.

...

Another option is to move our issue tracker off of Github into something else that supports non-committers being able to manage the issue tracker. Empowering non core to do more things is another thing that would be useful and requires someone to take the time to figure out how we can best do that (switch away from GH issues? To What?) and then actually do the work to make it happen (create salt states to deploy, create scripts to migrate etc).

The python core has a system of people getting tracker privileges that mean that they can and do have a group of people who contribute via issue triage, reviews, etc. Many such people graduate to becoming core developers, and those that don't still provide a hugely valuable service. It's a shame github doesn't have a way for us to do that, but of course setting up an alternative tracker would be yet another drain on our limited developer resources. Paul

anatoly techtonik

6 Mar 6 Mar

8:06 a.m.

On Thu, Mar 5, 2015 at 7:38 PM, Marc Abramowitz wrote:

...

- Add more computer automation

#3 seems most appealing to me, but of course it requires humans to develop it in the first place, but at least it's an investment that could pay dividends.

On page https://bitbucket.org/techtonik/python-stdlib/src there is a working proof-of-concept that fetches patch files from bugs.python.org, detects filenames and sees to which module the patch belongs (module layout is described in .json file). It is possible to tweak the script to also lookup who are the recent authors for the module files. This will allow to them to review automatically. -- anatoly t.

Paul Moore

11:55 a.m.

On 5 March 2015 at 16:38, Marc Abramowitz wrote:

...

This makes me think that the folks who review the PRs are overburdened and/or the process needs a bit more structure (e.g.: each person thinks someone else is going to review the PR and so no one does it).

One thing I, personally, find difficult when reviewing PRs (specifically feature requests) is the fact that I usually don't actually have a *need* for the functionality being proposed. It's very easy for me to say "this doesn't help me personally, so I'll ignore it", but that is ducking a big part of the responsibility of being a core committer. But forming a view on something I've no experience of or direct interest in is *hard*, and takes a lot of time. Discussions tend to involve a lot of people with strong opinions (e.g. the PR author) who can't move the change forward, and a few people with weaker opinions (e.g. me :-)) who can. It's very easy to think "just accept it because it helps someone". But that's a cop-out and long-term isn't a sustainable approach. It's not "thinking someone else will review the PR", it's more making a conscious decision on how much energy and effort I'm willing to put into a PR that doesn't have any benefit for me. (And even just *discussing* a PR can be a lot of energy, it's not easy to politely explain to someone that you don't think their use case, that they went to a lot of trouble writing a PR for, isn't worth it). What would help a *lot* is some sort of agreement on what pip is, and what its core goals are. Something similar to what it means to be "pythonic" for the Python language itself. At the moment, I don't think this is very clearly understood even within the core dev group (so external contributors have no hope...) And for me, it'd help avoid the endless debates that often start with the phrase "pip should..." For example, is the lack of a programmable API an issue for pip? I think it is, and having people able to write their own tools that use pip's finder, or its wheel installer, is a (long term) goal for pip, rather than, say, continually adding more pip subcommands. But I don't know if that's the consensus. And to my knowledge, no 3rd party PRs have *ever* been of the form "Encapsulate pip's functionality X in a clean API so I can use it in my script"... Or is the "pip search" command a wart that should be removed because it isn't pip's job to do PyPI searches? There's some low-hanging fruit if a more focused tool is the goal... Or should pip give you tools to replicate your current environment (pip freeze, requirements files)? What about "remove anything *not* in this requirement file"? Personally, I only use requirements files to bundle up "install this lot of stuff". I don't write the sort of thing where a "pin every dependency" philosophy is appropriate, so freeze isn't something I use. But lots of people do, so what's the workflow that pip freeze supports? The problem with discussing this sort of thing is that it's *very* wide-ranging, and tends to produce huge rambling mega-threads[1] when discussed in a public list. I'm not advocating any sort of private cabal deciding the fate of pip, but maybe somewhere where the core devs could agree their *own* opinions before having to face the public wouldn't be such a bad thing. That's more or less what I'd expected the pypa-dev list to be (as a parallel to the python-dev list) but it doesn't feel like it's turned out that way, maybe because it doesn't have a clear enough charter, or maybe because there's no obvious *other* place to direct people to for off-topic posts (like python-list is for python-dev). Or maybe grand designs are a distraction in themselves, and none of the core devs being interested in a PR means just that - not that they don't have the time, or that the use case isn't valid, or anything else. Just that they aren't interested, sorry. [1] Please, don't start a rambling mega-thread from *this* post :-) Paul PS I just spend way too long composing this email, and now I'm burned out. Maybe my time would have been better spent commenting on a couple of PRs...

Ian Cordasco

8:42 p.m.

On Fri, Mar 6, 2015 at 5:55 AM, Paul Moore wrote:

...

On 5 March 2015 at 16:38, Marc Abramowitz wrote:

...
This makes me think that the folks who review the PRs are overburdened and/or the process needs a bit more structure (e.g.: each person thinks someone else is going to review the PR and so no one does it).

One thing I, personally, find difficult when reviewing PRs (specifically feature requests) is the fact that I usually don't actually have a *need* for the functionality being proposed. It's very easy for me to say "this doesn't help me personally, so I'll ignore it", but that is ducking a big part of the responsibility of being a core committer. But forming a view on something I've no experience of or direct interest in is *hard*, and takes a lot of time. Discussions tend to involve a lot of people with strong opinions (e.g. the PR author) who can't move the change forward, and a few people with weaker opinions (e.g. me :-)) who can. It's very easy to think "just accept it because it helps someone". But that's a cop-out and long-term isn't a sustainable approach.

+1. This is a problem I have with Flake8. People keep asking for more command-line arguments because "It's just one more option. It won't hurt anyone." But Flake8 is another project without a great set of tests. It would be easy to say "Yeah sure, just this one other option that only one person has ever asked for" but there's only ever one person reviewing pull requests - me. It's also not sustainable to keep adding poorly named command-line flags.

...

It's not "thinking someone else will review the PR", it's more making a conscious decision on how much energy and effort I'm willing to put into a PR that doesn't have any benefit for me. (And even just *discussing* a PR can be a lot of energy, it's not easy to politely explain to someone that you don't think their use case, that they went to a lot of trouble writing a PR for, isn't worth it).

What would help a *lot* is some sort of agreement on what pip is, and what its core goals are. Something similar to what it means to be "pythonic" for the Python language itself. At the moment, I don't think this is very clearly understood even within the core dev group (so external contributors have no hope...) And for me, it'd help avoid the endless debates that often start with the phrase "pip should..."

+10

...

For example, is the lack of a programmable API an issue for pip? I think it is, and having people able to write their own tools that use pip's finder, or its wheel installer, is a (long term) goal for pip, rather than, say, continually adding more pip subcommands. But I don't know if that's the consensus. And to my knowledge, no 3rd party PRs have *ever* been of the form "Encapsulate pip's functionality X in a clean API so I can use it in my script"...

If pip is ever refactored appropriately (which I acknowledge is not a trivial condition to meet), maybe then pip could consider presenting a public API, but I think there are currently too many people who already reach into pip to ignore the need for such an interface. Perhaps the answer is, as pip is refactored, to create libraries that are then vendored into pip and that people can install independently to do that one thing they need to do.

...

Or is the "pip search" command a wart that should be removed because it isn't pip's job to do PyPI searches? There's some low-hanging fruit if a more focused tool is the goal...

There's also how many different replacements for "pip search" on PyPI?

...

Or should pip give you tools to replicate your current environment (pip freeze, requirements files)? What about "remove anything *not* in this requirement file"? Personally, I only use requirements files to bundle up "install this lot of stuff". I don't write the sort of thing where a "pin every dependency" philosophy is appropriate, so freeze isn't something I use. But lots of people do, so what's the workflow that pip freeze supports?

The problem with discussing this sort of thing is that it's *very* wide-ranging, and tends to produce huge rambling mega-threads[1] when discussed in a public list. I'm not advocating any sort of private cabal deciding the fate of pip, but maybe somewhere where the core devs could agree their *own* opinions before having to face the public wouldn't be such a bad thing. That's more or less what I'd expected the pypa-dev list to be (as a parallel to the python-dev list) but it doesn't feel like it's turned out that way, maybe because it doesn't have a clear enough charter, or maybe because there's no obvious *other* place to direct people to for off-topic posts (like python-list is for python-dev).

So sometimes private cabals need to be made in order to get a basis of what is reasonable. The WSGI working group tried to do that but that failed after about a week as more people tried to join the cabal and were allowed to do so.

...

Or maybe grand designs are a distraction in themselves, and none of the core devs being interested in a PR means just that - not that they don't have the time, or that the use case isn't valid, or anything else. Just that they aren't interested, sorry.

[1] Please, don't start a rambling mega-thread from *this* post :-)

Paul

PS I just spend way too long composing this email, and now I'm burned out. Maybe my time would have been better spent commenting on a couple of PRs...

Go rest. These discussions can exhaust even the best rested of us.

Nick Coghlan

10:33 p.m.

On 7 Mar 2015 06:44, "Ian Cordasco" wrote:

...

On Fri, Mar 6, 2015 at 5:55 AM, Paul Moore wrote:

...
The problem with discussing this sort of thing is that it's *very* wide-ranging, and tends to produce huge rambling mega-threads[1] when discussed in a public list. I'm not advocating any sort of private cabal deciding the fate of pip, but maybe somewhere where the core devs could agree their *own* opinions before having to face the public wouldn't be such a bad thing. That's more or less what I'd expected the pypa-dev list to be (as a parallel to the python-dev list) but it doesn't feel like it's turned out that way, maybe because it doesn't have a clear enough charter, or maybe because there's no obvious *other* place to direct people to for off-topic posts (like python-list is for python-dev).

So sometimes private cabals need to be made in order to get a basis of what is reasonable. The WSGI working group tried to do that but that failed after about a week as more people tried to join the cabal and were allowed to do so.

It's worth noting that CPython didn't get public source control until it was already around 9 years old (see the What's New for Python 2.0). My understanding is also that the architecture & philosophy for CPython were very much set by the original Python Labs crew (Guido, Tim Peters, Barry Warsaw, Fred Drake) when they worked for Zope Corporation, just as the direction of beaker-project.org is very much governed by what the core team that works full time on it for Red Hat wants to do. Confusing "open source" and even "open governance" with "no hierarchy" is a common mistake, when the only essential requirement is that anyone is welcome to observe and even suggest changes, whether to artefacts (open source) or decision making processes (open governance). The one thing that potential (and current!) contributors have to accept is that the existing contributors are the ones that decide between "yes", "no", and "maybe, let's discuss it some more", regardless of whether the proposed change is to code, processes, or who has the authority to accept changes. All of which can be summarised in the phrase: "Those that do the work, make the rules" :) Cheers, Nick.

...

...
Or maybe grand designs are a distraction in themselves, and none of the core devs being interested in a PR means just that - not that they don't have the time, or that the use case isn't valid, or anything else. Just that they aren't interested, sorry.

[1] Please, don't start a rambling mega-thread from *this* post :-)

Paul

PS I just spend way too long composing this email, and now I'm burned out. Maybe my time would have been better spent commenting on a couple of PRs...

Go rest. These discussions can exhaust even the best rested of us. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

3337

Age (days ago)

3339

Last active (days ago)

List overview

Download

51 comments

12 participants

participants (12)

anatoly techtonik
Ben Finney
Donald Stufft
Greg Ewing
Ian Cordasco
Ionel Cristian Mărieș
Jeremy Stanley
Marc Abramowitz
Marcus Smith
Nick Coghlan
Paul Moore
Randy Syring

Getting more momentum for pip

Marc Abramowitz

Marcus Smith

Marcus Smith

Marcus Smith

Marcus Smith

Marcus Smith

Marc Abramowitz

Marc Abramowitz

Marc Abramowitz

tags

participants (12)