[core-workflow] My initial thoughts on the steps/blockers of the transition

Mon Jan 4 20:08:19 EST 2016

> On Jan 4, 2016, at 7:42 PM, Brett Cannon <brett at python.org> wrote:
> 
> So consider this the starting discussion of the PEP that will be the hg.python.org <http://hg.python.org/> -> GitHub transition PEP that I will be in charge of. Once we have general agreement on the steps necessary I will start the actual PEP and check it in, but I figure there's no point in have a skeleton PEP if we can't agree on the skeleton. :) While I list steps influencing all the repos, I want to focus on the ones stopping any repo from moving over for now, expanding what we worry about to the cpython repo as we knock blockers down until we move everything over and start adding GitHub perks.
> 
> The way I see it, we have 4 repos to move: devinabox, benchmarks, peps, devguide, and cpython. I also think that's essentially the order we should migrate them over. Some things will need to be universally handled before we transition a single repo, while other things are only a blocker for some of the repos.
> 
> Universal blockers
> ==============
> There are four blockers that must be resolved before we even consider moving a repo over. They can be solved in parallel, but they all need to have a selected solution before we can move even the devinabox repo.
> 
> First, we need to decide how we are going to handle adding all the core devs to GitHub. Are we simply going to add all of them to the python organization, or do we want something like a specific python-dev gteamroup that gets added to all of the relevant repos? Basically I'm not sure how picky we want to be about the people who already have organization access on GitHub about them implicitly getting access to the cpython repo at the end of the day (and the same goes for any of the other repos in the python organization). For tracking names, I figure we will create a file in the devguide where people can check in their GitHub usernames and I can manually add people as people add themselves to the file.

We should have a team for each logical group of access I think. We can add people to multiple teams, or to just one team. The list of people who can access *all* repos on the Python org is https://caremad.io/s/7Q0wcgOcoE/, everyone else (currently) has restricted access to one or more repositories.

> 
> Second, CLA enforcement. As of right now people go to https://www.python.org/psf/contrib/contrib-form/ <https://www.python.org/psf/contrib/contrib-form/>, fill in the form, and then Ewa gets an email where she manually flips a flag in Roundup. If we want to use a web hook to verify someone has signed a CLA then we need to decide where the ground truth for CLAs are. Do we want to keep using Roundup to manage CLA agreements and thus add a GitHub field in bugs.python.org <http://bugs.python.org/> for people's profile and a web hook or bot that will signal if someone has the flag flipped on bugs.python.org <http://bugs.python.org/>? Or is there some prepackaged service that we can use that will keep track of this which would cause us to not use Roundup (which might be easier, but depending on the service require everyone to re-sign)? There's also the issue of supporting people who want to submit code by uploading a patch to bugs.python.org <http://bugs.python.org/> but not use GitHub. Either way I don't want to have to ask everyone who submits a PR what their bugs.python.org <http://bugs.python.org/> username is and then go check that manually.

There is CLAHub (https://www.clahub.com/ <https://www.clahub.com/>) but I don’t have any idea how good it is, I just know of it’s existence.

> 
> Third, how do we want to do the repo conversions? We need to choose the tool(s) and command(s) that we want to use. There was mention of wanting a mapping from hg commit ID to git commit ID. If we have that we could have a static bugs.python.org/commit/ <http://bugs.python.org/commit/><ID> page that had the mapping embedded in some JavaScript and if <ID> matched then we just forward them to the corresponding GitHub commit page, otherwise just blindly forward to GitHub and assume the ID is git-only, giving us a stable URL for commit web views.

This is the tool I used for the demo repo, it seemed to work ok as long as I ran it on Linux: https://github.com/frej/fast-export <https://github.com/frej/fast-export>

It lets you annotate the git commits with the HG hash as a “git note”, though tool support for git notes doesn’t seem to be very good. Github doesn’t display them but they are available in the CLI if you run the right command to ask for the hg hash that a particular git commit hash came from.

> 
> Fourth, for the ancillary repos of devinabox, peps, benchmarks, and devguide, do we care if we use the GitHub merge button for PRs or do we want to enforce a linear history with all repos? We just need to decide if care about linear histories and then we can move forward since any bot we create won't block us from using GitHub.

Personally for most repositories I would just use the GitHub merge button.

> 
> Those four things are enough to move devinabox over. It probably is enough for the benchmarks suite, but I have an email to speed@ asking if people want to use this opportunity to re-evaluate the benchmark suite and make any changes that will affect repo size (e.g., use pip to pull in the libraries and frameworks used by a benchmark rather than vendoring their code, making the repo much smaller).
> 
> Website-related stuff
> ================
> This also almost gets us the peps repo, but we do need to figure out how to change the website to build from the git checkout rather than an hg one. Same goes for the devguide. It would be great if we can set up web hooks to immediately trigger rebuilds of those portions of the sites instead of having to wait until a cronjob triggers.

We don’t *actually* need to do much here. We could have a cronjob (or webhook based daemon or something) that used hg-git to do ``hg pull`` from GitHub onto hg.python.org (and probably a mirror on git.python.org too). That would A) allow read only tooling that is currently pointing at hg.python.org to continue to work unmodified and B) allow people to interact with our repos, in a read only fashion, without ever talking to GitHub. Couple that with the ability to still upload patches to the bug tracker and people can still contribute without ever personally sending a packet of data to Github.

Of course it’d be nice to get the website itself pulling straight from Github (possible also using a web hook based daemon) though it could also either use hg-git, or just switch to Git. Either way should work.

> 
> CPython requirements
> =================
> There are six things to work out before we move over cpython. First, do we want to split out Python 2 branches into their own repo? There might be a clone size benefit which obviously is nice for people on slow Internet connections. It also clearly separates out Python 2 from 3 and lets those who prefer to focus on one compared to the other do that more easily. It does potentially make any single fix that spans 2 and 3 require a bit more work since it won't be an intra-clone change. We could also contemplate sub-repos for things like the Doc/ or Tools/ directories (although I doubt it's worth it).

Personally I feel like we should just have all of the branches live in the same repository. I don’t think there’s going to be much to gain by stripping out the other branches and I think that the downside of trying to work on 2+ repositories is a hefty price to pay. A fresh clone on the demo repo I setup has a .git of 156M.

> 
> Second, do we want all fixes to go into master and then cherry-pick into other branches, or do we want to follow our current practice of going into the active feature branch and then merge into master? I personally prefer the former and I think most everyone else does as well, but I thought it should be at least thought about.

I think it will work best if fixes go into master. I find less problems with people writing patches/PRs against the wrong branch that way.

> 
> Third, how to handle Misc/NEWS? We can add a NEWS field to bugs.python.org <http://bugs.python.org/> and then generate the NEWS file by what issues are tied to what version and when they were closed. The other approach is to create a file per NEWS entry in a version-specific directory (Larry created code for hg already for this to give people an idea: http://bugs.python.org/issue18967 <http://bugs.python.org/issue18967>). Then when we cut a release we run a tool the slurps up all of the relevant files -- which includes files in the directory for the next feature release which represent fixes which were cherry picked -- and generates the NEWS file for the final release. The per-file approach is bot-friendly and also CLI-friendly, but potentially requires more tooling and I don't know if people feel news entries should be tied to the issue or in the repo (although that assumes people find tweaking Roundup easy :).

I haven’t actually used it yet, but a friend has recently made https://pypi.python.org/pypi/towncrier <https://pypi.python.org/pypi/towncrier> which might be useful for this.

> 
> Fourth, we need to decide exactly what commands we expect core devs to run initially for committing code. Since we agreed to a linear history we need to document exactly what we expect people to do for a PR to get it into their git repo. This will go into the devguide -- probably will want to start a github branch at some point -- and will act as the commands the bot will want to work off of.

In case folks don’t know, github makes the PRs available as a remote HEAD that you can check out directly from a clone, might be useful for this.

> 
> Fifth, what to do about Misc/ACKS? Since we are using PRs, even if we flatten them, I believe the PR creators will get credit in the commit as the author while the core dev committing will be flagged as the person doing the merge (someone correct me if I'm wrong because if I am this whole point is silly). With the commits containing credit directly, we can either automatically generate Misc/ACKS like the NEWS file or simply drop it for future contributors and just leave the file for past contributors since git will have kept track for us.

Git allows you to do it either way, by default it tracks the author and the committer separately so people will get credit, but if someone is just apply a diff then that obviously won’t happen by default (but people can still use the relevant options to make it happen).

> 
> Six, we will need to update our Buildbot fleet.
> 
> This gets us to the bare minimum needed to function.
> 
> Parity with hg.python.org <http://hg.python.org/>
> ----------------------------------
> For parity, there are some Roundup integrations that will be necessary, like auto-generating links, posting commits to #python-dev on IRC, etc. I'm not sure if people want to block until that is all in place or not. I do think we should make sure there is some web hook that can take an issue # from the title of a PR and automatically posts to the corresponding issue on bugs.python.org <http://bugs.python.org/> that the PR exists. If people disagree then feel free to say so.

Github has a built in IRC bot for commits FWIW. I agree with the issue # bit.

> 
> Adding perks
> ==========
> Now we get to some added stuff that we never had on our own infrastructure. :)
> 
> We should wire up CI for all PRs. I don't know if we want to go with Travis, Codeship, or what CI provider, but we should definitely hook it up and fully utilize the resource. This could even include running doctest over the docs, making sure the reST markup is accurate, etc.

I like Travis a lot and I know the folks behind it, I’m sure they’d be happy to help.

> 
> Do we need to set up a web hook to trigger website rebuilds? We should at least have a mirror on Read the Docs that is triggered by web hook so that we have a backup of the documentation (if possible; not sure how custom our Sphinx setup is compared to what they require to work).

I’m sure Eric would be willing to help to make this happen.

> 
> We should try to get test coverage wired up as well per CI. I don't know if coveralls.io <http://coveralls.io/> or some other provider is best, but we should see what is available and find out if we can use them to either get basic coverage or thorough coverage (read https://hg.python.org/devinabox/file/tip/README#l124 <https://hg.python.org/devinabox/file/tip/README#l124> to see what thorough coverage entails, but it does require a checkout of coverage.py).

I prefer codecov, but it shouldn’t be too hard to do. I tried to get Python + C coverage checking in the demo with that, but I failed at making the C coverage work.

> 
> We should build a bot. It must use a Monty Python reference to trigger (black-knight, none-shall-pass, man-from-scene-24, five-questions, what-is-your-quest, what-is-your-favourite-colour, etc.; obviously I'm leaning towards the Black Knight or Bridge of Death scenes from the Holy Grail for inspiration since they deal with blocking you from doing something). It should handle specifying the commit message, what branches to commit to/cherry pick into, and a NEWS entry (if necessary). I don't know if it needs to do anything else as a requirement. It should probably implement a commit queue like Zuul or Homu (and both of those can be considered as the basis of the bot). Also gating commits on passing a test run probably would also be good.
> 
> I'm sure we will want to use some labels and milestones to track what PRs are for what versions, if they are blocked on something, etc.
> 
> ---
> 
> Long email! :) I think that is my current brain dumped in email form. As I said at the beginning, I think we should focus on what is blocking the easiest repos first and then just keep knocking down blockers as we try to move over more repos.
> _______________________________________________
> core-workflow mailing list
> core-workflow at python.org
> https://mail.python.org/mailman/listinfo/core-workflow
> This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160104/cf2ef7b6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160104/cf2ef7b6/attachment-0001.sig>