[Python-checkins] peps: Switch to proposing a full migration to Git, Github, and Phabricator

donald.stufft python-checkins at python.org
Sun Feb 1 16:26:41 CET 2015


https://hg.python.org/peps/rev/0a92b2d4967b
changeset:   5686:0a92b2d4967b
user:        Donald Stufft <donald at stufft.io>
date:        Sun Feb 01 10:26:15 2015 -0500
summary:
  Switch to proposing a full migration to Git, Github, and Phabricator

files:
  pep-0481.txt |  448 +++++++++++++++++++++++---------------
  1 files changed, 271 insertions(+), 177 deletions(-)


diff --git a/pep-0481.txt b/pep-0481.txt
--- a/pep-0481.txt
+++ b/pep-0481.txt
@@ -1,5 +1,5 @@
 PEP: 481
-Title: Migrate Some Supporting Repositories to Git and Github
+Title: Migrate CPython to Git, Github, and Phabricator
 Version: $Revision$
 Last-Modified: $Date$
 Author: Donald Stufft <donald at stufft.io>
@@ -13,156 +13,305 @@
 Abstract
 ========
 
-This PEP proposes migrating to Git and Github for certain supporting
-repositories (such as the repository for Python Enhancement Proposals) in a way
-that is more accessible to new contributors, and easier to manage for core
-developers. This is offered as an alternative to PEP 474 which aims to achieve
-the same overall benefits but while continuing to use the Mercurial DVCS and
-without relying on a commerical entity.
-
-In particular this PEP proposes changes to the following repositories:
-
-* https://hg.python.org/devguide/
-* https://hg.python.org/devinabox/
-* https://hg.python.org/peps/
-
-
-This PEP does not propose any changes to the core development workflow for
-CPython itself.
+This PEP proposes migrating the repository hosting of CPython and the
+supporting repositories to Git and Github. It also proposes adding Phabricator
+as an alternative to Github Pull Requests to handle reviewing changes. This
+particular PEP is offered as an alternative to PEP 474 and PEP 462 which aims
+to achieve the same overall benefits but restricts itself to tools that support
+Mercurial and are completely Open Source.
 
 
 Rationale
 =========
 
-As PEP 474 mentions, there are currently a number of repositories hosted on
-hg.python.org which are not directly used for the development of CPython but
-instead are supporting or ancillary repositories. These supporting repositories
-do not typically have complex workflows or often branches at all other than the
-primary integration branch. This simplicity makes them very good targets for
-the "Pull Request" workflow that is commonly found on sites like Github.
+CPython is an open source project which relies on a number of volunteers
+donating their time. As an open source project it relies on attracting new
+volunteers as well as retaining existing ones in order to continue to have
+a healthy amount of manpower available. In addition to increasing the amount of
+manpower that is available to the project, it also needs to allow for effective
+use of what manpower *is* available.
 
-However whereas PEP 474 proposes to continue to use Mercurial and restricts
-itself to only solutions which are OSS and self-hosted, this PEP expands the
-scope of that to include migrating to Git and using Github.
+The current toolchain of the CPython project is a custom and unique combination
+of tools which mandates a workflow that is similar to one found in a lot of
+older projects, but which is becoming less and less popular as time goes on.
 
-The existing method of contributing to these repositories generally includes
-generating a patch and either uploading them to bugs.python.org or emailing
-them to peps at python.org. This process is unfriendly towards non-comitter
-contributors as well as cumbersome for comitters seeking to accept the patches
-sent by users. In contrast, the Pull Request workflow style enables non
-technical contributors, especially those who do not know their way around the
-DVCS of choice, to contribute using the web based editor. On the committer
-side, the Pull Requests enable them to tell, before merging, whether or not
-a particular Pull Request will break anything. It also enables them to do a
-simple "push button" merge which does not require them to check out the
-changes locally. Another such feature that is useful in particular for docs,
-is the ability to view a "prose" diff. This Github-specific feature enables
-a committer to view a diff of the rendered output which will hide things like
-reformatting a paragraph and show you what the actual "meat" of the change
-actually is.
+The one-off nature of the CPython toolchain and workflow means that any new
+contributor is going to need spend time learning the tools and workflow before
+they can start contributing to CPython. Once a new contributor goes through
+the process of learning the CPython workflow they also are unlikely to be able
+to take that knowledge and apply it to future projects they wish to contribute
+to. This acts as a barrier to contribution which will scare off potential new
+contributors.
 
+In addition the tooling that CPython uses is under-maintained, antiquated,
+and it lacks important features that enable committers to more effectively use
+their time when reviewing and approving changes. The fact that it is
+under-maintained means that bugs are likely to last for longer, if they ever
+get fixed, as well as it's more likely to go down for extended periods of time.
+The fact that it is antiquated means that it doesn't effectively harness the
+capabilities of the modern web platform. Finally the fact that it lacks several
+important features such as a lack of pre-testing commits and the lack of an
+automatic merge tool means that committers have to do needless busy work to
+commit even the simplest of changes.
 
-Why Git?
---------
 
-Looking at the variety of DVCS which are available today, it becomes fairly
-clear that git has the largest mindshare. The Open Hub (previously Ohloh)
-statistics [#openhub-stats]_ show that currently 37% of the repositories
-indexed by Open Hub are using git which is second only to SVN (which has 48%),
-while Mercurial has just 2% of the indexed repositories (beating only bazaar
-which has 1%). In additon to the Open Hub statistics, a look at the top 100
-projects on PyPI (ordered by total download counts) shows that within the
-Python space itself, the majority of projects use git.
+Version Control System
+----------------------
 
-=== ========= ========== ====== === ====
-Git Mercurial Subversion Bazaar CVS None
-=== ========= ========== ====== === ====
-62  22        7          4      1   1
-=== ========= ========== ====== === ====
+The first decision that needs to be made is the VCS of the primary server side
+repository. Currently the CPython repository, as well as a number of supporting
+repositories, uses Mercurial. When evaluating the VCS we must consider the
+capabilities of the VCS itself as well as the network effect and mindshare of
+the community around that VCS.
 
+There are really only two real options for this, Mercurial and Git. Between the
+two of them the technical capabilities are largely equivilant. For this reason
+this PEP will largely ignore the technical arguments about the VCS system and
+will instead focus on the social aspects.
 
-Chosing a DVCS which has the larger mindshare will make it more likely that any
-particular person who has experience with DVCS at all will be able to
-meaningfully contribute without having to learn a new tool.
+It is not possible to get exact numbers for the number of projects or people
+which are using a particular VCS, however we can infer this by looking at
+several sources of information for what VCS projects are using.
 
-In addition to simply making it more likely that any individual will already
-know how to use git, the number of projects and people using it means that the
-resources for learning the tool are likely to be more fully fleshed out.
-When you run into problems, the likelihood that someone else had that problem
-and posted a question and recieved an answer is also far higher.
+The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that 37% of
+the repositories indexed by The Open Hub are using Git (second only to SVN
+which has 48%) while Mercurial has just 2% (beating only bazaar which has 1%).
+This has Git being just over 18 times as popular as Mercurial on The Open Hub.
 
-Thirdly, by using a more popular tool you also increase your options for
-tooling *around* the DVCS itself. Looking at the various options for hosting
-repositories, it's extremely rare to find a hosting solution (whether OSS or
-commerical) that supports Mercurial but does not support Git. On the flip side,
-there are a number of tools which support Git but do not support Mercurial.
-Therefore the popularity of git increases the flexibility of our options going
-into the future for what toolchain these projects use.
+Another source of information on the popular of the difference VCSs is PyPI
+itself. This source is more targeted at the Python community itself since it
+represents projects developed for Python. Unfortunately PyPI does not have a
+standard location for representing this information, so this requires manual
+processing. If we limit our search to the top 100 projects on PyPI (ordered
+by download counts) we can see that 62% of them use Git while 22% of them use
+Mercurial while 13% use something else. This has Git being just under 3 times
+as popular as Mercurial for the top 100 projects on PyPI.
 
-Also, by moving to the more popular DVCS, we increase the likelihood that the
-knowledge that the person has learned in contributing to these support
-repositories will transfer to projects outside of the immediate CPython project
-such as to the larger Python community which is primarily using Git hosted on
-Github.
+Obviously from these numbers Git is by far the more popular DVCS for open
+source projects and choosing the more popular VCS has a number of positive
+benefits.
 
-In previous years there was concern about how well supported git was on Windows
-in comparison to Mercurial. However, git has grown to support Windows as a
-first class citizen. In addition to that, for Windows users who are not well
-acquainted with the Windows command line, there are GUI options as well.
+For new contributors it increases the likelihood that they will have already
+learned the basics of Git as part of working with another project or if they
+are just now learning Git, that they'll be able to take that knowledge and
+apply it to other projects. Additionally a larger community means more people
+writing how to guides, answering questions, and writing articles about Git
+which makes it easier for a new user to find answers and information about
+the tool they are trying to learn.
 
+Another benefit is that by nature of having a larger community, there will be
+more tooling written *around* it. This increases options for everything from
+GUI clients, helper scripts, repository hosting, etc.
 
-Why Github?
+
+Repository Hosting
+------------------
+
+This PEP proposes allowing GitHub Pull Requests to be submitted, however GitHub
+does not have a way to submit Pull Requests against a repository that is not
+hosted on GitHub. This PEP also proposes that in addition to GitHub Pull
+Requests Phabricator's Differential app can also be used to submit proposed
+changes and Phabricator *does* allow submitting changes against a repository
+that is not hosted on Phabricator.
+
+For this reason this PEP proposes using GitHub as the canonical location of
+the repository with a read-only mirror located in Phabricator. If at some point
+in the future GitHub is no longer desired, then repository hosting can easily
+be moved to solely in Phabricator and the ability to accept GitHub Pull
+Requests dropped.
+
+In addition to hosting the repositories on Github, a read only copy of all
+repositories will also be mirrored onto the PSF Infrastructure.
+
+
+Code Review
 -----------
 
-There are a number of software projects or web services which offer
-functionality similar to that of Github. These range from commerical web
-services such as Bitbucket to self-hosted OSS solutions such as Kallithea or
-Gitlab. This PEP proposes that we move these repositories to Github.
+Currently CPython uses a custom fork of Rietveld which has been modified to
+not run on Google App Engine which is really only able to be maintained
+currently by one person. In addition it is missing out on features that are
+present in many modern code review tools.
 
-There are two primary reasons for selecting Github: Popularity and
-Quality/Polish.
+This PEP proposes allowing both Github Pull Requests and Phabricator changes
+to propose changes and review code. It suggests both so that contributors can
+select which tool best enables them to submit changes, and reviewers can focus
+on reviewing changes in the tooling they like best.
 
-Github is currently the most popular hosted repository hosting according to
-Alexa, where it currently has a global rank of 121. Much like for Git itself,
-by choosing the most popular tool we gain benefits in increasing the likelihood
-that a new contributor will have already experienced the toolchain, the quality
-and availablity of the help, more and better tooling being built around it, and
-the knowledge transfer to other projects. A look again at the top 100 projects
-by download counts on PyPI shows the following hosting locations:
 
-====== ========= =========== ========= =========== ==========
-GitHub BitBucket Google Code Launchpad SourceForge Other/Self
-====== ========= =========== ========= =========== ==========
-62     18        6           4         3           7
-====== ========= =========== ========= =========== ==========
+GitHub Pull Requests
+~~~~~~~~~~~~~~~~~~~~
 
-In addition to all of those reasons, Github also has the benefit that while
-many of the options have similar features when you look at them in a feature
-matrix the Github version of each of those features tend to work better and be
-far more polished. This is hard to quantify objectively however it is a fairly
-common sentiment if you go around and ask people who are using these services
-often.
+GitHub is a very popular code hosting site and is increasingly becoming the
+primary place people look to contribute to a project. Enabling users to
+contribute through GitHub is enabling contributors to contribute using tooling
+that they are likely already familiar with and if they are not they are likely
+to be able to apply to another project.
 
-Finally, a reason to choose a web service at all over something that is
-self-hosted is to be able to more efficiently use volunteer time and donated
-resources. Every additional service hosted on the PSF infrastructure by the
-PSF infrastructure team further spreads out the amount of time that the
-volunteers on that team have to spend and uses some chunk of resources that
-could potentially be used for something where there is no free or affordable
-hosted solution available.
+GitHub Pull Requests have a fairly major advantage over the older "submit a
+patch to a bug tracker" model. It allows developers to work completely within
+their VCS using standard VCS tooling so it does not require creating a patch
+file and figuring out what the right location is to upload it to. This lowers
+the barrier for sending a change to be reviewed.
 
-One concern that people do have with using a hosted service is that there is a
-lack of control and that at some point in the future the service may no longer
-be suitable. It is the opinion of this PEP that Github does not currently and
-has not in the past engaged in any attempts to lock people into their platform
-and that if at some point in the future Github is no longer suitable for one
-reason or another, then at that point we can look at migrating away from Github
-onto a different solution. In other words, we'll cross that bridge if and when
-we come to it.
+On the reviewing side, GitHub Pull Requests are far easier to review, they have
+nice syntax highlighted diffs which can operate in either unified or side by
+side views. They allow expanding the context on a diff up to and including the
+entire file. Finally they allow commenting inline and on the pull request as
+a whole and they present that in a nice unified way which will also hide
+comments which no longer apply. Github also provides a "rendered diff" view
+which enables easily viewing a diff of rendered markup (such as rst) instead
+of needing to review the diff of the raw markup.
+
+The Pull Request work flow also makes it trivial to enable the ability to
+pre-test a change before actually merging it. Any particular pull request can
+have any number of different types of "commit statuses" applied to it, marking
+the commit (and thus the pull request) as either in a pending, successful,
+errored, or failure state. This makes it easy to see inline if the pull request
+is passing all of the tests, if the contributor has signed a CLA, etc.
+
+Actually merging a Github Pull Request is quite simple, a core reviewer simply
+needs to press the "Merge" button once the status of all the checks on the
+Pull Request are green for successful.
+
+GitHub also has a good workflow for submitting pull requests to a project
+completely through their web interface. This would enable the Python
+documentation to have "Edit on GitHub" buttons on every page and people who
+discover things like typos, inaccuracies, or just want to make improvements to
+the docs they are currently writing can simply hit that button and get an in
+browser editor that will let them make changes and submit a pull request all
+from the comfort of their browser.
+
+
+Phabricator
+~~~~~~~~~~~
+
+In addition to GitHub Pull Requests this PEP also proposes setting up a
+Phabricator instance and pointing it at the GitHub hosted repositories. This
+will allow utilizing the Phabricator review applications of Differential and
+Audit.
+
+Differential functions similarly to GitHub pull requests except that they
+require installing the ``arc`` command line tool to upload patches to
+Phabricator.
+
+Whether to enable Phabricator for any particular repository can be chosen on
+a case by case basis, this PEP only proposes that it must be enabled for the
+CPython repository, however for smaller repositories such as the PEP repository
+it may not be worth the effort.
+
+
+Criticism
+=========
+
+X is not written in Python
+--------------------------
+
+One feature that the current tooling (Mercurial, Rietveld) has is that the
+primary language for all of the pieces are written in Python. It is this PEPs
+belief that we should focus on the *best* tools for the job and not the *best*
+tools that happen to be written in Python. Volunteer time is a precious
+resource to any open source project and we can best respect and utilize that
+time by focusing on the benefits and downsides of the tools themselves rather
+than what language their authors happened to write them in.
+
+One concern is the ability to modify tools to work for us, however one of
+the Goals here is to *not* modify software to work for us and instead adapt
+ourselves to a more standard workflow. This standardization pays off in the
+ability to re-use tools out of the box freeing up developer time to actually
+work on Python itself as well as enabling knowledge sharing between projects.
+
+However if we do need to modify the tooling, Git itself is largely written in
+C the same as CPython itself is. It can also have commands written for it using
+any language, including Python. Phabricator is written in PHP which is a fairly
+common language in the web world and fairly easy to pick up. GitHub itself is
+largely written in Ruby but given that it's not Open Source there is no ability
+to modify it so it's implementation language is completely meaningless.
+
+
+GitHub is not Free/Open Source
+------------------------------
+
+GitHub is a big part of this proposal and someone who tends more to ideology
+rather than practicality may be opposed to this PEP on that grounds alone. It
+is this PEPs belief that while using entirely Free/Open Source software is an
+attractive idea and a noble goal, that valuing the time of the contributors by
+giving them good tooling that is well maintained and that they either already
+know or if they learn it they can apply to other projects is a more important
+concern than treating whether something is Free/Open Source is a hard
+requirement.
+
+However, history has shown us that sometimes benevolent proprietary companies
+can stop being benevolent. This is hedged against in a few ways:
+
+* We are not utilizing the GitHub Issue Tracker, both because it is not
+  powerful enough for CPython but also because for the primary CPython
+  repository the ability to take our issues and put them somewhere else if we
+  ever need to leave GitHub relies on GitHub continuing to allow API access.
+
+* We are utilizing the GitHub Pull Request workflow, however all of those
+  changes live inside of Git. So a mirror of the GitHub repositories can easily
+  contain all of those Pull Requests. We would potentially lose any comments if
+  GitHub suddenly turned "evil", but the changes themselves would still exist.
+
+* We are utilizing the GitHub repository hosting feature, however since this is
+  just git moving away from GitHub is as simple as pushing the repository to
+  a different location. Data portability for the repository itself is extremely
+  high.
+
+* We are also utilizing Phabricator to provide an alternative for people who
+  do not wish to use GitHub. This also acts as a fallback option which will
+  already be in place if we ever need to stop using GitHub.
+
+Relying on GitHub comes with a number of benefits beyond just the benefits of
+the platform itself. Since it is a commercially backed venture it has a full
+time staff responsible for maintaining its services. This includes making sure
+they stay up, making sure they stay patched for various security
+vulnerabilities, and further improving the software and infrastructure as time
+goes on.
+
+
+Mercurial is better than Git
+----------------------------
+
+Whether Mercurial or Git is better on a technical level is a highly subjective
+opinion. This PEP does not state whether the mechanics of Git or Mercurial is
+better and instead focuses on the network effect that is available for either
+option. Since this PEP proposes switching to Git this leaves the people who
+prefer Mercurial out, however those users can easily continue to work with
+Mercurial by using the hg-git [#hg-git]_ extension for Mercurial which will
+let it work with a repository which is Git on the serverside.
+
+
+CPython Workflow is too Complicated
+-----------------------------------
+
+One sentiment that came out of previous discussions was that the multi branch
+model of CPython was too complicated for Github Pull Requests. It is the belief
+of this PEP that statement is not accurate.
+
+Currently any particular change requires manually creating a patch for 2.7 and
+3.x which won't change at all in this regards.
+
+If someone submits a fix for the current stable branch (currently 3.4) the
+GitHub Pull Request workflow can be used to create, in the browser, a Pull
+Request to merge the current stable branch into the master branch (assuming
+there is no merge conflicts). If there is a merge conflict that would need to
+be handled locally. This provides an improvement over the current situation
+where the merge must always happen locally.
+
+Finally if someone submits a fix for the current development branch currently
+then this has to be manually applied to the stable branch if it desired to
+include it there as well. This must also happen locally as well in the new
+workflow, however for minor changes it could easily be accomplished in the
+GitHub web editor.
+
+Looking at this, I do not believe that *any* system can hide the complexities
+involved in maintaining several long running branches. The only thing that the
+tooling can do is make it as easy as possible to submit changes.
 
 
 Example: Scientific Python
---------------------------
+==========================
 
 One of the key ideas behind the move to both git and Github is that a feature
 of a DVCS, the repository hosting, and the workflow used is the social network
@@ -190,66 +339,11 @@
 workflow less reusable with other projects.
 
 
-Migration
-=========
-
-Through the use of hg-git [#hg-git]_ we can easily convert a Mercurial
-repository to a Git repository by simply pushing the Mercurial repository to
-the Git repository. People who wish to continue to use Mercurial locally can
-then use hg-git going into the future using the new Github URL. However they
-will need to re-clone their repositories as using Git as the server seems to
-trigger a one time change of the changeset ids.
-
-As none of the selected repositories have any tags, branches, or bookmarks
-other than the ``default`` branch the migration will simply map the ``default``
-branch in Mercurial to the ``master`` branch in git.
-
-In addition, since none of the selected projects have any great need of a
-complex bug tracker, they will also migrate their issue handling to using the
-GitHub issues.
-
-In addition to the migration of the repository hosting itself there are a
-number of locations for each particular repository which will require updating.
-The bulk of these will simply be changing commands from the hg equivalent to
-the git equivalent.
-
-In particular this will include:
-
-* Updating www.python.org to generate PEPs using a git clone and link to
-  Github.
-* Updating docs.python.org to pull from Github instead of hg.python.org for the
-  devguide.
-* Enabling the ability to send an email to python-checkins at python.org for each
-  push.
-* Enabling the ability to send an IRC message to #python-dev on Freenode for
-  each push.
-* Migrate any issues for these projects to their respective bug tracker on
-  Github.
-* Use hg-git to provide a read-only mirror on hg.python.org which will enable
-  read-only uses of the hg.python.org instances of the specified repositories
-  to remain the same.
-
-This will restore these repositories to similar functionality as they currently
-have. In addition to this the migration will also include enabling testing for
-each pull request using Travis CI [#travisci]_ where possible to ensure that
-a new PR does not break the ability to render the documentation or PEPs.
-
-
-User Access
-===========
-
-Moving to Github would involve adding an additional user account that will need
-to be managed, however it also offers finer grained control, allowing the
-ability to grant someone access to only one particular repository instead of
-the coarser grained ACLs available on hg.python.org.
-
-
 References
 ==========
 
 .. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
-.. [#hg-git] `hg-git <https://hg-git.github.io/>`
-.. [#travisci] `Travis CI <https://travis-ci.org/>`
+.. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`
 
 
 Copyright

-- 
Repository URL: https://hg.python.org/peps


More information about the Python-checkins mailing list