Here is the latest version of the PEP. Since no one seems to be bringing up issues of missing steps or incorrect priorities, I think it's time to start work! The initial key todos relate to getting the ancillary PEPs moved over. I've taken on responsibility for writing the CLA bot (the pre-existing solutions don't seem to be maintained or are locked down to specific CLA signing solutions). The remaining items are:
- Create a 'python-dev' team
- Define commands to move a Mercurial repository to Git
- How to update peps webpages from the future Git repo
- How to update the devguide webpages from the future Git repo
If anyone wants to step forward and help, then please do! I just ask you keep all of us up-to-date on what's going on. And if people want to work on some other task that's related to the cpython repo then that's fine as well, but do realize it might be quite a while before your work gets used (if at all, as things could potentially change).
Post-History: 17-Jan-2016, 19-Jan-2016, 23-Jan-2016
This PEP outlines the steps required to migrate Python's development
process from Mercurial [#hg]_ as hosted at
[#h.p.o]_ to Git [#git]_ on GitHub [#GitHub]_. Meeting
the minimum goals of this PEP should allow for the development
process of Python to be as productive as it currently is, and meeting
its extended goals should improve it.
In 2014, it became obvious that Python's custom development
process was becoming a hindrance. As an example, for an external
contributor to submit a fix for a bug that eventually was committed,
the basic steps were:
2. Checkout out the CPython source code from hg.python.org
3. Make the fix.
4. Upload a patch.
5. Have a core developer review the patch using our fork of the
Rietveld code review tool [#rietveld]_.
6. Download the patch to make sure it still applies cleanly.
7. Run the test suite manually.
8. Commit the change manually.
9. If the change was for a bugfix release, merge into the
10. Run the test suite manually again.
11. Commit the merge.
12. Push the changes.
This is a very heavy, manual process for core developers. Even in the
simple case, you could only possibly skip the code review step, as you
would still need to build the documentation. This led to patches
languishing on the issue tracker due to core developers not being
able to work through the backlog fast enough to keep up with
submissions. In turn, that led to a side-effect issue of discouraging
outside contribution due to frustration from lack of attention, which
is dangerous problem for an open source project as it runs counter to
having a viable future for the project. Simply accepting patches
external contributor, it is as slow and burdensome as it gets for
a core developer to work with.
Hence the decision was made in late 2014 that a move to a new
development process was needed. A request for PEPs
proposing new workflows was made, in the end leading to two:
PEP 481 and PEP 507 proposing GitHub [#github]_ and
GitLab [#gitlab]_, respectively.
The year 2015 was spent off-and-on working on those proposals and
trying to tease out details of what made them different from each
other on the core-workflow mailing list [#core-workflow]_.
PyCon US 2015 also showed that the community was a bit frustrated
with our process due to both cognitive overhead for new contributors
and how long it was taking for core developers to
look at a patch (see the end of Guido van Rossum's
keynote at PyCon US 2015 [#guido-keynote]_ as an example of the
On January 1, 2016, the decision was made by Brett Cannon to move the
development process to GitHub. The key reasons for choosing GitHub
* Maintaining custom infrastructure has been a burden on volunteers
(e.g., a custom fork of Rietveld [#rietveld]_
that is not being maintained is currently being used).
* The custom workflow is very time-consuming for core developers
(not enough automated tooling built to help support it).
* The custom workflow is a hindrance to external contributors
(acts as a barrier of entry due to time required to ramp up on
* There is no feature differentiating GitLab from GitHub beyond
GitLab being open source.
* Familiarity with GitHub is far higher amongst core developers and
external contributors than with GitLab.
* Our BDFL prefers GitHub (who would be the first person to tell
you that his opinion shouldn't matter, but the person making the
decision felt it was important that the BDFL feel comfortable with
the workflow of his own programming language to encourage his
There's even already an unofficial image to use to represent the
migration to GitHub [#pythocat]_.
The overarching goal of this migration is to improve the development
process to the extent that a core developer can go from external
contribution submission through all the steps leading to committing
said contribution all from within a browser on a tablet with WiFi
using *some* development process (this does not inherently mean
GitHub's default workflow). All of this will be done in such a way
that if an external contributor chooses not to use GitHub then they
will continue to have that option.
Repositories to Migrate
[#h.p.o]_ hosts many repositories, there are only
five key repositories that should move:
1. devinabox [#devinabox-repo]_
2. benchmarks [#benchmarks-repo]_
3. peps [#peps-repo]_
4. devguide [#devguide-repo]_
5. cpython [#cpython-repo]_
The devinabox and benchmarks repositories are code-only.
The peps and devguide repositories involve the generation of webpages.
And the cpython repository has special requirements for integration
The migration plan is separated into sections based on what is
required to migrate the repositories listed in the
`Repositories to Migrate`_ section. Completion of requirements
outlined in each section should unblock the migration of the related
repositories. The sections are expected to be completed in order, but
not necessarily the requirements within a section.
Requirements for Code-Only Repositories
Completion of the requirements in this section will allow the
devinabox and benchmarks repositories to move to
GitHub. While devinabox has a sufficiently descriptive name, the
benchmarks repository does not; therefore, it will be named
Create a 'python-dev' team
To manage permissions, a 'python-dev' team will be created as part of
the python organization [#github-python-org]_. Any repository that is
moved will have the 'python-dev' team added to it with write
permissions [#github-org-perms]_. Anyone who previously had rights to
manage SSH keys on hg.python.org
will become a team maintainer for the
Define commands to move a Mercurial repository to Git
Since moving to GitHub also entails moving to Git [#git]_, we must
decide what tools and commands we will run to translate a Mercurial
repository to Git. The exact tools and steps to use are an
open issue; see `Tools and commands to move from Mercurial to Git`_.
A key part of any open source project is making sure that its source
code can be properly licensed. This requires making sure all people
making contributions have signed a contributor license agreement
(CLA) [#cla]_. Up until now, enforcement of CLA signing of
contributed code has been enforced by core developers checking
whether someone had an ``*`` by their username on
off with automated checking and enforcement of contributors signing
To keep tracking of CLA signing under the direct control of the PSF,
tracking who has signed the PSF CLA will be continued by marking that
means is that an association will be needed between a person's
will be done through a new field in a user's profile.
This does implicitly require that contributors will need both a
CLA and contribute through GitHub.
A bot to enforce CLA signing
With an association between someone's GitHub account and their
someone has signed the CLA, a bot can monitor pull requests on
GitHub and denote whether the contributor has signed the CLA.
If the user has signed the CLA, the bot will add a positive label to
the issue to denote the pull request has no CLA issues (e.g., a green
label stating, "CLA: ✓"). If the contributor has not signed a CLA,
a negative label will be added to the pull request will be blocked
using GitHub's status API (e.g., a red label stating, "CLA: ✗"). If a
another label (e.g., "CLA: ✗ (no account)"). Using a label for both
positive and negative cases provides a fallback notification if the
bot happens to fail, preventing potential false-positives or
false-negatives. It also allows for an easy way to trigger the bot
again by simply removing a CLA-related label.
If no pre-existing, maintained bot exists that fits our needs, one
will be written from scratch. It will be hosted on Heroku [#heroku]_
and written to target Python 3.5 to act as a showcase for
asynchronous programming. The bot's actual name is an open issue:
`Naming the bots`_
Requirements for Web-Related Repositories
Due to their use for generating webpages, the
devguide [#devguide-repo]_ and peps [#peps-repo]_ repositories need
their respective processes updated to pull from their new Git
The devguide repository might also need to be named
``python-devguide`` to make sure the repository is not ambiguous
when viewed in isolation from the
python organization [#github-python-org]_.
Requirements for the cpython Repository
Obviously the most active and important repository currently hosted
repository [#cpython-repo]_. Because of its importance and high-
frequency use, it requires more tooling before being moved to GitHub
compared to the other repositories mentioned in this PEP.
Document steps to commit a pull request
During the process of choosing a new development workflow, it was
decided that a linear history is desired. People preferred having a
single commit representing a single change instead of having a set of
unrelated commits lead to a merge commit that represented a single
change. This means that the convenient "Merge" button in GitHub pull
requests is undesirable, as it creates a merge commit along with all
of the contributor's individual commits (this does not affect the
other repositories where the desire for a linear history doesn't
Luckily, Git [#git]_ does not require GitHub's workflow and so one can
be chosen which gives us a linear history by using Git's CLI. The
expectation is that all pull requests will be fast-forwarded and
rebased before being pushed to the master repository. This should
give proper attribution to the pull request author in the Git
history. This does have the consequence of losing some GitHub
features such as automatic closing of pull requests, link generation,
A second set of recommended commands will also be written for
committing a contribution from a patch file uploaded to
history, but it will need to be made to have attribution to the patch
The exact sequence of commands that will be given as guidelines to
core developers is an open issue:
`Git CLI commands for committing a pull request to cpython`_.
Traditionally the ``Misc/NEWS`` file [#news-file]_ has been problematic
for changes which spanned Python releases. Often times there will be
merge conflicts when committing a change between e.g., 3.5 and 3.6
only in the ``Misc/NEWS`` file. It's so common, in fact, that the
example instructions in the devguide explicitly mention how to
resolve conflicts in the ``Misc/NEWS`` file
[#devguide-merge-across-branches]_. As part of our tool
modernization, working with the ``Misc/NEWS`` file will be
There are currently two competing approaches to solving the
``Misc/NEWS`` problem which are discussed in an open issue:
`How to handle the Misc/NEWS file`_.
Traditionally the ``Misc/ACKS`` file [#acks-file]_ has been managed
by hand. But thanks to Git supporting an ``author`` value as well as
a ``committer`` value per commit, authorship of a commit can be part
of the history of the code itself.
As such, manual management of ``Misc/ACKS`` will become optional. A
script will be written that will collect all author and committer
names and merge them into ``Misc/ACKS`` with all of the names listed
prior to the move to Git. Running this script will become part of the
Linking pull requests to issues
Historically, external contributions were attached to an issue on
contributions were uploaded as a file. For changes committed by a
core developer who committed a change directly, the specifying of an
issue number in the commit message of the format ``Issue #`` at the
start of the message led to a comment being posted to the issue
linking to the commit.
Linking a pull request to an issue
An association between a pull request and an issue is needed to track
when a fix has been proposed. The association needs to be many-to-one
as there can take multiple pull requests to solve a single issue
(technically it should be a many-to-many association for when a
single fix solves multiple issues, but this is fairly rare and issues
can be merged into one using the ``Superseder`` field on the issue
Association between a pull request and an issue will be done based on
detecting the regular expression``[Ii]ssue #(?P<bpo_id>\d+)``. If
this is specified in either the title or in the body of a message on
a pull request then connection will be made on
request to signify that the connection was made successfully. This
could lead to incorrect associations if the wrong issue or
referencing another issue was done, but these are rare occasions.
Notify the issue if the pull request is committed
Once a pull request is closed (merged or not), the issue should be
updated to reflect this fact.
Update linking service for mapping commit IDs to URLs
ID from either the Subversion or Mercurial copies of the
cpython repo [#cpython-repo]_ to get redirected to the URL for that
revision in the Mercurial repository. The URL rewriter will need to
be updated to redirect to the Git repository and to support the new
revision IDs created for the Git repository.
Just as hg.python.org
[#h.p.o]_ currently points to the Mercurial
the Git repository.
Backup of pull request data
Since GitHub [#github]_ is going to be used for code hosting and code
review, those two things need to be backed up. In the case of code
hosting, the backup is implicit as all non-shallow Git [#git]_ clones
contain the full history of the repository, hence there will be many
backups of the repository.
The code review history does not have the same implicit backup
mechanism as the repository itself. That means a daily backup of code
review history should be done so that it is not lost in case of any
issues with GitHub. It also helps guarantee that a migration from
GitHub to some other code review system is feasible were GitHub to
Once Python is no longer kept in Mercurial, the ``sys._mercurial``
attribute will need to be changed to return ``('CPython', '', '')``.
An equivalent ``sys._git`` attribute will be added which fulfills the
Update the devguide
The devguide will need to be updated with details of the new
workflow. Mostly likely work will take place in a separate branch
until the migration actually occurs.
Update PEP 101
The release process will need to be updated as necessary.
Optional, Planned Features
Once the cpython repository [#cpython-repo]_ is migrated, all
repositories will have been moved to GitHub [#github]_ and the
development process should be on equal footing as before. But a key
reason for this migration is to improve the development process,
making it better than it has ever been. This section outlines some
plans on how to improve things.
It should be mentioned that overall feature planning for
migration -- are tracked on their own wiki page [#tracker-plans]_.
Bot to handle pull request merging
As stated in the section entitled
"`Document steps to commit a pull request`_", the desire is to
maintain a linear history for cpython. Unfortunately,
Github's [#github]_ web-based workflow does not support a linear
history. Because of this, a bot should be written to substitute for
GitHub's in-browser commit abilities.
To start, the bot should accept commands to commit a pull request
against a list of branches. This allows for committing a pull request
that fixes a bug in multiple versions of Python.
More advanced features such as a commit queue can come later. This
would linearly apply accepted pull requests and verify that the
commits did not interfere with each other by running the test suite
and backing out commits if the test run failed. To help facilitate
the speed of testing, all patches committed since the last test run
can be applied and run in a single test run as the optimistic
assumption is that the patches will work in tandem. Some mechanism to
re-run the tests in case of test flakiness will be needed, whether it
is from removing a "test failed" label, web interface for core
developers to trigger another testing event, etc.
Inspiration or basis of the bot could be taken from pre-existig bots
such as Homu [#homu]_ or Zuul [#zuul]_.
The name given to this bot in order to give it commands is an open
issue: `Naming the bots`_.
Continuous integration per pull request
To help speed up pull request approvals, continuous integration
testing should be used. This helps mitigate the need for a core
developer to download a patch simply to run the test suite against
Which free CI service to use is an open issue:
`Choosing a CI service`_.
Test coverage report
Getting an up-to-date test coverage report for Python's standard
library would be extremely beneficial as generating such a report can
take quite a while to produce.
There are a couple pre-existing services that provide free test
coverage for open source projects. Which option is best is an open
issue: `Choosing a test coverage service`_.
Notifying issues of pull request comments
The current development process does not include notifying an issue
Rietveld [#rietveld]_. It would be nice to fix this so that people
GitHub [#github]_ and yet still know when something occurs on GitHub
in terms of review comments on relevant pull requests. Current
issue when at least one review comment has been made over a certain
period of time (e.g., 15 or 30 minutes). This keeps the email volume
notifications while still making sure that those only following
using Google, Launchpad, or OpenID credentials. It would be good to
expand this to GitHub credentials.
Web hooks for re-generating web content
one of the repositories to be moved as part of this migration. As
such, it would be nice to set up appropriate webhooks to trigger
rebuilding the appropriate web content when the files they are based
on change instead of having to wait for, e.g., a cronjob to trigger.
Link web content back to files that it is generated from
It would be helpful for people who find issues with any of the
documentation that is generated from a file to have a link on each
page which points back to the file on GitHub [#github]_ that stores
the content of the page. That would allow for quick pull requests to
fix simple things such as spelling mistakes.
Splitting out parts of the documentation into their own repositories
change with the code, other parts are fairly static and are not
tightly bound to the CPython code itself. The following sections of
the documentation fit this category of slow-changing,
These parts of the documentation could be broken out into their own
repositories to simplify their maintenance and to expand who has
commit rights to them to ease in their maintenance.
It has also been suggested to split out the
documents. That would require deciding whether a workflow could be
developed where it would be difficult to forget to update
What's New (potentially through a label added to PRs, like
"What's New needed").
Backup of Git repositories
While not necessary, it would be good to have official backups of the
various Git repositories for disaster protection. It will be up to
the PSF infrastructure committee to decide if this is worthwhile or
Requirements for migrating the devinabox [#devinabox-repo]_ and
benchmarks [#benchmarks-repo]_ repositories:
* Not started
- `Create a 'python-dev' team`_
- `Define commands to move a Mercurial repository to Git`_
* In progress
- `A bot to enforce CLA signing`_:
Repositories whose build steps need updating:
* Not started
- peps [#peps-repo]_
- devguide [#devguide-repo]_
* In progress
Requirements to move over the cpython repo [#cpython-repo]_:
* Not started
- `Document steps to commit a pull request`_
- `Handling Misc/NEWS`_
- `Handling Misc/ACKS`_
- `Linking a pull request to an issue`_
- `Notify the issue if the pull request is committed`_
- `Update linking service for mapping commit IDs to URLs`_
- `Backup of pull request data`_
- `Deprecate sys._mercurial`_
- `Update the devguide`_
- `Update PEP 101`_
* In progress
* Not started
- `Bot to handle pull request merging`_
- `Continuous integration per pull request`_
- `Test coverage report`_
- `Notifying issues of pull request comments`_
- `Web hooks for re-generating web content`_
- `Link web content back to files that it is generated from`_
- `Splitting out parts of the documentation into their own repositories`_
- `Backup of Git repositories`_
* In progress
For this PEP, open issues are ones where a decision needs to be made
to how to approach or solve a problem. Open issues do not entail
coordination issues such as who is going to write a certain bit of
With the code repositories moving over to Git [#git]_, there is no
technical need to keep hg.python.org
[#h.p.o]_ running. Having said
that, some in the community would like to have it stay functioning as
a Mercurial [#hg]_ mirror of the Git repositories. Others have said
that they still want a mirror, but one using Git.
As maintaining hg.python.org
is not necessary, it will be up to the
PSF infrastructure committee to decide if they want to spend the
time and resources to keep it running. They may also choose whether
they want to host a Git mirror on PSF infrastructure.
Depending on the decision reached, other ancillary repositories will
either be forced to migration or they can choose to simply stay on
Tools and commands to move from Mercurial to Git
A decision needs to be made on exactly what tooling and what commands
involving those tools will be used to convert a Mercurial repository
to Git. Currently a suggestion has been made to use
Git CLI commands for committing a pull request to cpython
Because Git [#git]_ may be a new version control system for core
developers, the commands people are expected to run will need to be
written down. These commands also need to keep a linear history while
giving proper attribution to the pull request author.
Another set of commands will also be necessary for when working with
history will be kept implicitly, but it will need to make sure to
How to handle the Misc/NEWS file
There are three competing approaches to handling
``Misc/NEWS`` [#news-file]_. One is to add a news entry for issues on
as "resolved" could not be closed until a news entry is added in the
"news" field in the issue tracker. The benefit of tying the news
entry to the issue is it makes sure that all changes worthy of a news
entry have an accompanying issue. It also makes classifying a news
entry automatic thanks to the Component field of the issue. The
Versions field of the issue also ties the news entry to which Python
releases were affected. A script would be written to query
the output needed to be checked into the code repository. This
approach is agnostic to whether a commit was done by CLI or bot.
A competing approach is to use an individual file per news entry,
containing the text for the entry. In this scenario each feature
release would have its own directory for news entries and a separate
file would be created in that directory that was either named after
the issue it closed or a timestamp value (which prevents collisions).
Merges across branches would have no issue as the news entry file
would still be uniquely named and in the directory of the latest
version that contained the fix. A script would collect all news entry
files no matter what directory they reside in and create an
appropriate news file (the release directory can be ignored as the
mere fact that the file exists is enough to represent that the entry
belongs to the release). Classification can either be done by keyword
in the new entry file itself or by using subdirectories representing
each news entry classification in each release directory (or
classification of news entries could be dropped since critical
information is captured by the "What's New" documents which are
organized). The benefit of this approach is that it keeps the changes
with the code that was actually changed. It also ties the message to
being part of the commit which introduced the change. For a commit
made through the CLI, a script will be provided to help generate the
file. In a bot-driven scenario, the merge bot will have a way to
specify a specific news entry and create the file as part of its
flattened commit (while most likely also supporting using the first
line of the commit message if no specific news entry was specified).
Code for this approach has been written previously for the Mercurial
A yet third option is a merge script to handle the conflicts. This
approach allows for keeping the NEWS file as a single file. It does
run the risk, though, of failure and thus blocking a commit until it
can be manually resolved.
Naming the bots
As naming things can lead to bikeshedding of epic proportions, Brett
Cannon will choose the final name of the various bots (the name of
the project for the bots themselves can be anything, this is purely
for the name used in giving commands to the bot or the account name).
The names will come from Monty Python, which is only fitting since
Python is named after the comedy troupe. They will most likely come
from 'Monty Python and the Holy Grail' [#holy-grail]_ (which happens
to be how Brett was introduced to Monty Python). Current ideas on the
"Black Knight" sketch [#black-knight-sketch]_:
"Bridge of Death" sketch [#bridge-of-death-sketch]_:
(and that specific spelling; Monty Python is British, after all)
"Killer rabbit" sketch [#killer-rabbit-sketch]_:
"French Taunter" sketch [#french-taunter-sketch]_:
"Constitutional Peasants" sketch [#constitutional-peasants-sketch]_:
"Knights Who Say 'Ni'" sketch [#ni-sketch]_:
From "Monty Python and the Holy Grail" in general:
Choosing a CI service
There are various CI services that provide free support for open
source projects hosted on GitHub [#github]_. Two such examples are
Travis [#travis]_ and Codeship [#codeship]_. Whatever solution is
chosen will need to not time out in the time it takes to execute
Python's test suite. It should optimally provide access to multiple C
compilers for more thorough testing. Network access is also
The current CI service for Python is Pypatcher [#pypatcher]_. A
request can be made in IRC to try a patch from
Choosing a test coverage service
Getting basic test coverage of Python's standard library can be
created simply by using coverage.py [#coverage]_. Getting
thorough test coverage is actually quite tricky, with the details
outlined in the devinabox's README [#devinabox-repo]_. It would be
best if a service could be found that would allow for thorough test
coverage, but it might not be feasible.
Free test coverage services include Coveralls [#coveralls]_ and
Separate Python 2 and Python 3 repositories
It was discussed whether separate repositories for Python 2 and
Python 3 were desired. The thinking was that this would shrink the
overall repository size which benefits people with slow Internet
connections or small bandwidth caps.
In the end it was decided that it was easier logistically to simply
keep all of CPython's history in a single repository.
Commit multi-release changes in bugfix branch first
As the current development process has changes committed in the
oldest branch first and then merged up to the default branch, the
question came up as to whether this workflow should be perpetuated.
In the end it was decided that committing in the newest branch and
then cherry-picking changes into older branches would work best as
most people will instinctively work off the newest branch and it is a
more common workflow when using Git [#git]_.
Cherry-picking is also more bot-friendly for an in-browser workflow.
In the merge-up scenario, if you were to request a bot to do a merge
and it failed, then you would have to make sure to immediately solve
the merge conflicts if you still allowed the main commit, else you
would need to postpone the entire commit until all merges could be
handled. With a cherry-picking workflow, the main commit could
proceed while postponing the merge-failing cherry-picks. This allows
for possibly distributing the work of managing conflicting merges.
Deriving ``Misc/NEWS`` from the commit logs
As part of the discussion surrounding `Handling Misc/NEWS`_, the
suggestion has come up of deriving the file from the commit logs
itself. In this scenario, the first line of a commit message would be
taken to represent the news entry for the change. Some heuristic to
tie in whether a change warranted a news entry would be used, e.g.,
whether an issue number is listed.
This idea has been rejected due to some core developers preferring to
write a news entry separate from the commit message. The argument is
the first line of a commit message compared to that of a news entry
have different requirements in terms of brevity, what should be said,
.. [#reasons] Email to core-workflow outlining reasons why GitHub was selected
.. [#benchmarks-repo] Mercurial repository for the Unified Benchmark Suite
.. [#github-org-perms] GitHub repository permission levels
.. [#devguide-merge-across-branches] Devguide instructions on how to merge across branches
.. [#black-knight-sketch] The "Black Knight" sketch from "Monty Python and the Holy Grail"
.. [#bridge-of-death-sketch] The "Bridge of Death" sketch from "Monty Python and the Holy Grail"
.. [#holy-grail] "Monty Python and the Holy Grail" sketches
.. [#killer-rabbit-sketch] "Killer rabbit" sketch from "Monty Python and the Holy Grail"
.. [#french-taunter-sketch] "French Taunter" from "Monty Python and the Holy Grail"
.. [#constitutional-peasants-sketch] "Constitutional Peasants" from "Monty Python and the Holy Grail"
.. [#ni-sketch] "Knights Who Say Ni" from "Monty Python and the Holy Grail"
This document has been placed in the public domain.