[core-workflow] My initial thoughts on the steps/blockers of the transition

Mon Jan 4 22:18:32 EST 2016

On 5 January 2016 at 10:42, Brett Cannon <brett at python.org> wrote:
> So consider this the starting discussion of the PEP that will be the
> hg.python.org -> GitHub transition PEP that I will be in charge of. Once we
> have general agreement on the steps necessary I will start the actual PEP
> and check it in, but I figure there's no point in have a skeleton PEP if we
> can't agree on the skeleton. :) While I list steps influencing all the
> repos, I want to focus on the ones stopping any repo from moving over for
> now, expanding what we worry about to the cpython repo as we knock blockers
> down until we move everything over and start adding GitHub perks.
>
> The way I see it, we have 4 repos to move: devinabox, benchmarks, peps,
> devguide, and cpython. I also think that's essentially the order we should
> migrate them over. Some things will need to be universally handled before we
> transition a single repo, while other things are only a blocker for some of
> the repos.
>
> Universal blockers
> ==============
> There are four blockers that must be resolved before we even consider moving
> a repo over. They can be solved in parallel, but they all need to have a
> selected solution before we can move even the devinabox repo.
>
> First, we need to decide how we are going to handle adding all the core devs
> to GitHub. Are we simply going to add all of them to the python
> organization, or do we want something like a specific python-dev gteamroup
> that gets added to all of the relevant repos? Basically I'm not sure how
> picky we want to be about the people who already have organization access on
> GitHub about them implicitly getting access to the cpython repo at the end
> of the day (and the same goes for any of the other repos in the python
> organization). For tracking names, I figure we will create a file in the
> devguide where people can check in their GitHub usernames and I can manually
> add people as people add themselves to the file.

I think we want at least one group to track CPython commit privileges
and potentially a second to track CLA signatories.

For collecting GitHub username info, I think it makes more sense to
add a new User profile field in Roundup than it does to use a text
file in the devguide:
http://roundup.sourceforge.net/docs/customizing.html#tracker-schema

That way whether or not someone has signed the CLA and whether or not
they have commit privileges is directly associated with both their
Roundup user ID and their GitHub one, but the latter is only linked if
they choose to provide it.

It also means that if we eventually have a Roundup hook submitting
patches on behalf of people (perhaps triggered by a "Create PR" link
on the patch display rather than implicitly), it can set "Author" on
the PR correctly if the author has provided a GitHub username.

We should also consider the question of who needs to be added to the
admin group for the GitHub python organisation.

> Second, CLA enforcement. As of right now people go to
> https://www.python.org/psf/contrib/contrib-form/, fill in the form, and then
> Ewa gets an email where she manually flips a flag in Roundup. If we want to
> use a web hook to verify someone has signed a CLA then we need to decide
> where the ground truth for CLAs are. Do we want to keep using Roundup to
> manage CLA agreements and thus add a GitHub field in bugs.python.org for
> people's profile and a web hook or bot that will signal if someone has the
> flag flipped on bugs.python.org? Or is there some prepackaged service that
> we can use that will keep track of this which would cause us to not use
> Roundup (which might be easier, but depending on the service require
> everyone to re-sign)? There's also the issue of supporting people who want
> to submit code by uploading a patch to bugs.python.org but not use GitHub.
> Either way I don't want to have to ask everyone who submits a PR what their
> bugs.python.org username is and then go check that manually.

The way kubernetes does this is that googlebot checks if the submitter
has signed the CLA, and if they have it sets a green "cla: yes" flag
on the PR: https://github.com/kubernetes/kubernetes/labels/cla%3A%20yes

If they haven't, then it posts a message asking them to sign it and
applies a red "cla: no" label:
https://github.com/kubernetes/kubernetes/pull/19271#issuecomment-168836357

For us, I think the approach that makes the most sense depends on
whether or not it's easy to query the Roundup user DB based on a
custom field. If it's easy, then I think we should just have the bot
query Roundup to ask "Is this GitHub user id linked to a user account
that has signed the CLA?".

If querying by custom field is a pain, then I think we should instead
have a GitHub group to track CLA signatories and tweak Roundup to add
someone to that group with their signatory status is updated. The bot
would then check the derived GitHub group rather than querying Roundup
directly.

> Fourth, for the ancillary repos of devinabox, peps, benchmarks, and
> devguide, do we care if we use the GitHub merge button for PRs or do we want
> to enforce a linear history with all repos? We just need to decide if care
> about linear histories and then we can move forward since any bot we create
> won't block us from using GitHub.

Linear history is most useful for bisecting regressions, so I don't
see a major need for it on any of the repos other than the main
CPython one, while for at least the PEPs and the devguide I see a lot
of value in enabling the low friction edit-PR-merge workflow for
submitting small doc fixes (typos, etc).

> Website-related stuff
> ================
> This also almost gets us the peps repo, but we do need to figure out how to
> change the website to build from the git checkout rather than an hg one.
> Same goes for the devguide. It would be great if we can set up web hooks to
> immediately trigger rebuilds of those portions of the sites instead of
> having to wait until a cronjob triggers.

I like Donald's suggestion of setting up a webhook to ensure
hg.python.org remains an up to date read-only Mercurial mirror. (Which
suggests another important step: tweaking the configuration of those
repos to block commits as part of the cutover of each repo)

Something we may also want to consider is whether or not we might be
able to use ReadTheDocs for building and hosting at least some of the
ancillary repos (it would be nice to be able to use full Sphinx markup
when writing PEPs, for example, and the devguide is already a Sphinx
project).

> CPython requirements
> =================
> There are six things to work out before we move over cpython. First, do we
> want to split out Python 2 branches into their own repo? There might be a
> clone size benefit which obviously is nice for people on slow Internet
> connections.

Shallow clones are going to be more of a benefit there, and I agree
with Donald that splitting the repos would be more trouble than it's
worth.

> We could also contemplate
> sub-repos for things like the Doc/ or Tools/ directories (although I doubt
> it's worth it).

While I'd still like to see the tutorial and the HOWTO guides moved
out to their own version independent repos in the long term, I don't
see any reason to do that as part of the migration to a richer
repository hosting environment - we can consider if and when we want
to tackle it *after* the migration.

> Second, do we want all fixes to go into master and then cherry-pick into
> other branches, or do we want to follow our current practice of going into
> the active feature branch and then merge into master? I personally prefer
> the former and I think most everyone else does as well, but I thought it
> should be at least thought about.

Master+cherry-pick makes sense to me. I have two concrete reasons for
that, one personal and one professional:

* the personal reason is that it means I can effectively ignore all
but the most recent maintenance branch when contributing on my own
time
* the professional reason is that "please backport the fix for issue
#ABC to version X.Y" requests become much easier to handle (even if
X.Y is in security fix only mode), as it aligns with the normal
upstream workflow rather than being an exceptional case

> Third, how to handle Misc/NEWS? We can add a NEWS field to bugs.python.org
> and then generate the NEWS file by what issues are tied to what version and
> when they were closed. The other approach is to create a file per NEWS entry
> in a version-specific directory (Larry created code for hg already for this
> to give people an idea: http://bugs.python.org/issue18967). Then when we cut
> a release we run a tool the slurps up all of the relevant files -- which
> includes files in the directory for the next feature release which represent
> fixes which were cherry picked -- and generates the NEWS file for the final
> release. The per-file approach is bot-friendly and also CLI-friendly, but
> potentially requires more tooling and I don't know if people feel news
> entries should be tied to the issue or in the repo (although that assumes
> people find tweaking Roundup easy :).

I'm generally a fan of loose coupling, and the main virtue I see for
"in the repo" file-based approaches is that it means that everything
you need to generate the release notes remains with the code and
documentation - you're not relying on an external service being
available. It also doesn't rule out the use of a tracker field later -
such a field would just need to be converted into a file in the repo
when preparing the patch.

> Fifth, what to do about Misc/ACKS? Since we are using PRs, even if we
> flatten them, I believe the PR creators will get credit in the commit as the
> author while the core dev committing will be flagged as the person doing the
> merge (someone correct me if I'm wrong because if I am this whole point is
> silly). With the commits containing credit directly, we can either
> automatically generate Misc/ACKS like the NEWS file or simply drop it for
> future contributors and just leave the file for past contributors since git
> will have kept track for us.

I think we'll still need the manual ACKS file to accommodate patches
uploaded to Roundup (especially older ones), so I think the most
useful thing to do here is to have a script that can check the ACKs
file for missing names that appear in the GitHub contributor list.

> We should build a bot. It must use a Monty Python reference to trigger
> (black-knight, none-shall-pass, man-from-scene-24, five-questions,
> what-is-your-quest, what-is-your-favourite-colour, etc.; obviously I'm
> leaning towards the Black Knight or Bridge of Death scenes from the Holy
> Grail for inspiration since they deal with blocking you from doing
> something).

I'm still a fan of "blue-no-green" :)

> It should handle specifying the commit message, what branches to
> commit to/cherry pick into, and a NEWS entry (if necessary). I don't know if
> it needs to do anything else as a requirement. It should probably implement
> a commit queue like Zuul or Homu (and both of those can be considered as the
> basis of the bot). Also gating commits on passing a test run probably would
> also be good.

Something else worth considering is whether to have one bot or
multiple. With the Kubernetes issue I linked above for example, you
can see that the googlebot handles the CLA question, but there's a
separate k8s-bot that prompts committers to move the issue along (and
what appears to be a third party bot doing integration testing for
Mesosphere).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia