On Nov 29, 2014, at 8:12 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:


On 30 Nov 2014 09:28, "Donald Stufft" <donald@stufft.io> wrote:
>
> As promised in the "Move selected documentation repos to PSF BitBucket
> account?" thread I've written up a PEP for moving selected repositories from
> hg.python.org to Github.
>
> You can see this PEP online at: https://www.python.org/dev/peps/pep-0481/
>
> I've also reproduced the PEP below for inline discussion.

Given that hg.python.org isn't going anywhere, you could also use hg-git to maintain read-only mirrors at the existing URLs and minimise any breakage (as well as ensuring a full historical copy remains available on PSF infrastructure). Then the only change needed would be to set up appropriate GitHub web hooks to replace anything previously based on a commit hook rather than periodic polling.



Ah yes, I meant to include that and just forgot to do it when I went to test
hg-git to see how well it worked and whether I got different commit hashes on
different machines. I also thought about adding a git.python.org which just
acted as a read-only mirror of what was on Github, but I don’t know if that’s
actually generally useful or not.

The PEP should also cover providing clear instructions for setting up git-remote-hg with the remaining Mercurial repos (most notably CPython), as well as documenting a supported workflow for generating patches based on the existing CPython GitHub mirror.



I can add this. I’ve never actually tried using git-remote-hg with CPython
itself because I’ve made it segfault on other Mercurial repositories and I
never figured out why so I just generally fight my way through using Mercurial
on projects that themselves use Mercurial. I will absolutely test to see if
git-remote-hg works with CPython and I can document using that to contribute to
CPython. I’m not sure it needs to be part of the PEP or not? Feels like
something that would be better inside the devguide itself but I’m not opposed
to putting it both locations.

Beyond that, GitHub is indeed the most expedient option. My two main reasons for objecting to taking the expedient path are:


It's not entirely about expedience. I think a lot of the reason why we should
look towards outsourcing some of these items is that volunteer time is not
a fungible resource. Volunteers are generally only willing to work on things
which they personally care about. This is entirely unlike a business where you
have employees who will generally work on whatever you tell them to because
that's what you're paying them for. To this end I personally don't really have
an interest in trying to create a better code hosting platform than Github when
Github is doing an amazing job in my opinion and they satisify my needs fine.
Given the *current* state of tooling it appears that there are not a lot of
people who both care about making that piece of software exist and are capable
of competing with Github in terms of quality.

1. I strongly believe that the long term sustainability of the overall open source community requires the availability and use of open source infrastructure. While I admire the ingenuity of the "free-as-in-beer" model for proprietary software companies fending off open source competition, I still know a proprietary platform play when I see one (and so do venture capitalists looking to extract monopoly rents from the industry in the future). (So yes, I regret relenting on this principle in previously suggesting the interim use of another proprietary hosted service)



I somewhat agree. However I’m less concerned specifically about where projects
are hosted exactly and more about the *ability* to move to a completely OSS
infrastructure. In particular if at somepoint we need to move off of Github we
can totally do that, it’s not particularly difficult. Currently you lose the
higher quality polish of Github if you do that however if at some point in the
future Github either turns evil or an OSS software offers a truly compelling
alternative to Github then there is really nothing stopping a migration to
another platform. As I said in the PEP I view this as a “let’s cross that
bridge if/when we get to it”. The main thing we should look at is things that
would be difficult to migrate away from. For code hosting in particular most of
the truly valuable data is stored within the DVCS so migrating the bulk of the
data is as simple as pushing the repository to a new location. The other data
is within the issues, for these repositories I suggest moving the issues to
Github entirely because I don’t suspect they’ll get many if any issues
themselves so the amount of data stored in issues will be low.

However I also think that long term sustainability of any particular project
depends on attracting and retaining contributors. To this end going to where
the people are already and paving inroads that reduce the barriers to
contributing is an important thing to do as well. This proposal is aimed
squarely at reducing barriers to contribution while also giving a nod to the
first concern by selecting a platform that has done a lot to enable OSS and
doing it in a way that the ability to leave that platform is maintained so that
in some future we can migrate away if need be.

2. I also feel that this proposal is far too cavalier in not even discussing the possibility of helping out the Mercurial team to resolve their documentation and usability issues rather than just yelling at them "your tool isn't popular enough for us, and we find certain aspects of it too hard to use, so we're switching to something else rather than working with you to address our concerns". We consider the Mercurial team a significant enough part of the Python ecosystem that Matt was one of the folks specifically invited to the 2014 language summit to discuss their concerns around the Python 3 transition. Yet we'd prefer to switch to something else entirely rather than organising a sprint with them at PyCon to help ensure that our existing Mercurial based infrastructure is approachable for git & GitHub users? (And yes, I consider some of the core Mercurial devs to be friends, so this isn't an entirely abstract concern for me)


I was on the fence about including the bit about branches in the PEP itself and
I ended up doing it only because multiple people brought it up when I asked
them for a review. I also tried not to include the fact that I haven’t
personally figured out how to actually use Mercurial effectively in the PEP
itself because honestly I don’t think that is really the core idea behind
moving to git and Github. I think if you look at any semi objective measurement
between git and Mercurial’s effective popularity git is going to be a clear
winner, and if you do the same thing for Github compared to any other code
hosting service or software then Github wins that by any even larger margin.

The reason the PEP primarily focuses on the popularity of the the tool is
because as you mentioned, issues like poor documentation, bad support for a
particular platform, a particular workflow not being very good can be solved by
working with the tool authors to solve that particular problem. I wouldn’t
consider those issues in a vacuum to be a good reason to migrate away from that
tool. However there’s very little that CPython can do to get more people using
Mercurial, and presumably the authors of Mercurial are already doing what they
can to get people to use them. However by choosing, and continuing to choose, a
tool that an order of magnitude less people use, we’re choosing to effectively
make it harder to contribute. We’re increasing the likelhood that a contributor
is going to have to learn our particular DVCS even if they already know one and
we’re increasing the likelhood that we’re burdening users to learn a technology
that isn’t going to transfer to most other projects that they might want to
contribute to, even within the Python ecosystem itself.

Even the suggestion of a way to make it approachable for git and Github users
still acts as a barrier to contribution. Now it’s true that when you have to
select against competing tools the tool you choose is going to be a barrier to
*someone*. For example if we do choose to go with git, then that tool will be a
barrier to people who already know Mercurial but don’t already know git.
However by selecting the most popular tool you ensure that the particular
barrier is a barrier to as few as people as possible. 

To be clear, I don’t consider the technical differences between Mercurial and
git to be very large hurdles to overcome, it’s primarily about the mindshare
and the fact that Mercurial doesn’t really enable us to do much that git
doesn’t also allow us to do (other than support friends, which isn’t an
unreasonable desire either) while keeping us from tapping into the collective
power of the number of git users. To put it another way, Linux vs the BSDs, I
actually prefer FreeBSD over Linux, but I almost never use it because Linux is
vastly more popular and by selecting it I’m more likely to be able to transfer
that knowledge to different scenarios and find help when I have a problem.

Given my proposal to use BitBucket as a near term solution for enabling pull request based workflows, it's clear I consider the second argument the more significant of the two.

However, if others consider some short term convenience that may or may not attract additional contributors more important than supporting the broader Python and open source communities (an argument I'm more used to hearing in the ruthlessly commercial environment of Red Hat, rather than in upstream contexts that tend to be less worried about "efficiency at any cost"), I'm not going to expend energy trying to prevent a change I disagree with on principle, but will instead work to eliminate (or significantly reduce) the current expedience argument in GitHub's favour.


As a result, I'm -0 on the PEP, rather than -1 (and will try to stay out of further discussions).

Given this proposal, I'm planning to refocus PEPs 474 & 462 specifically on resolving the CPython core workflow issues, since that will require infrastructure customisation regardless, and heavy customisation of GitHub based infrastructure requires opting in to the use of the GitHub specific APIs that create platform lockin. (Note that the argument in PEP 481 about saving overall infrastructure work is likely spurious - the vast majority of that work will be in addressing the complex CPython workflow requirements, and moving some support repos to GitHub does little to alleviate that

That was specifically about saving infrastructure work to supporting Kallithea
(or whatever solution). I don’t suspect that CPython was planning on using
Kallithea (maybe I’m wrong?) so forge.python.org was primarily aimed towards
the non-CPython repositories.

If folks decide they want to migrate the ancillary repos back from GitHub after that other infrastructure work is done, so be it, but if they don't, that's OK too. We're already running heterogeneous infrastructure across multiple services (especially if you also take PyPA into account), so having additional support repos externally hosted isn't that big a deal from a purely practical perspective.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA