[core-workflow] My initial thoughts on the steps/blockers of the transition

Ezio Melotti ezio.melotti at gmail.com
Tue Jan 5 00:53:54 EST 2016


On Tue, Jan 5, 2016 at 2:42 AM, Brett Cannon <brett at python.org> wrote:
> So consider this the starting discussion of the PEP that will be the
> hg.python.org -> GitHub transition PEP that I will be in charge of. Once we
> have general agreement on the steps necessary I will start the actual PEP
> and check it in, but I figure there's no point in have a skeleton PEP if we
> can't agree on the skeleton. :) While I list steps influencing all the
> repos, I want to focus on the ones stopping any repo from moving over for
> now, expanding what we worry about to the cpython repo as we knock blockers
> down until we move everything over and start adding GitHub perks.
>
> The way I see it, we have 4 repos to move: devinabox, benchmarks, peps,
> devguide, and cpython.

On top of this, there is also the test repo
(https://hg.python.org/test) and all the tracker repos
(https://hg.python.org/tracker/).
I think it would be useful to port the former since it will provide a
place for devs to try things out and experiment (a new test repo could
also be created though).
It would be nice to port the tracker repos too and be consistent with
the others, but it's not a priority.  When we switched to HG they kept
being on SVN until I ported them, so I guess the same thing can be
done (unless R. David or Martin prefer to stick to HG).

> I also think that's essentially the order we should
> migrate them over. Some things will need to be universally handled before we
> transition a single repo, while other things are only a blocker for some of
> the repos.
>
> Universal blockers
> ==============
> There are four blockers that must be resolved before we even consider moving
> a repo over. They can be solved in parallel, but they all need to have a
> selected solution before we can move even the devinabox repo.
>
> First, we need to decide how we are going to handle adding all the core devs
> to GitHub. Are we simply going to add all of them to the python
> organization, or do we want something like a specific python-dev gteamroup
> that gets added to all of the relevant repos? Basically I'm not sure how
> picky we want to be about the people who already have organization access on
> GitHub about them implicitly getting access to the cpython repo at the end
> of the day (and the same goes for any of the other repos in the python
> organization). For tracking names, I figure we will create a file in the
> devguide where people can check in their GitHub usernames and I can manually
> add people as people add themselves to the file.
>

I think the current list of core-devs should be converted to a group
and given access to the same repos they have access to now (i.e.
cpython/devguide/peps and possibly others).  Then additional
repo-specific groups can be created in case we want to let specific
contributors work on peps or the devguide.

> Second, CLA enforcement. As of right now people go to
> https://www.python.org/psf/contrib/contrib-form/, fill in the form, and then
> Ewa gets an email where she manually flips a flag in Roundup. If we want to
> use a web hook to verify someone has signed a CLA then we need to decide
> where the ground truth for CLAs are. Do we want to keep using Roundup to
> manage CLA agreements and thus add a GitHub field in bugs.python.org for
> people's profile and a web hook or bot that will signal if someone has the
> flag flipped on bugs.python.org?

This can be done.  We can add a "GitHub" username field to Roundup
users so that we can link the two.


> Or is there some prepackaged service that
> we can use that will keep track of this which would cause us to not use
> Roundup (which might be easier, but depending on the service require
> everyone to re-sign)? There's also the issue of supporting people who want
> to submit code by uploading a patch to bugs.python.org but not use GitHub.
> Either way I don't want to have to ask everyone who submits a PR what their
> bugs.python.org username is and then go check that manually.
>

This also brings up another problem.
Since the discussion about an issue happens on b.p.o and the PRs are
submitted on GitHub, this means that:
1) users with only a GitHub account have to create a b.p.o account if
they want to comment on the issue (exclusing review comments);
2) users with only a b.p.o account have to create a GitHub account if
they want to review a PR;
3) users with both can comment on b.p.o and review on GitHub, but they
might need to login twice.

It would be better if users didn't need to create and use two separate accounts.

> Third, how do we want to do the repo conversions? We need to choose the
> tool(s) and command(s) that we want to use. There was mention of wanting a
> mapping from hg commit ID to git commit ID. If we have that we could have a
> static bugs.python.org/commit/<ID> page that had the mapping embedded in
> some JavaScript and if <ID> matched then we just forward them to the
> corresponding GitHub commit page, otherwise just blindly forward to GitHub
> and assume the ID is git-only, giving us a stable URL for commit web views.
>

As I mentioned on python-committers, we already have
https://hg.python.org/lookup/ .
This is currently used to map SVN->HG (e.g.
https://hg.python.org/lookup/r12345 ), and should be extended to
handle cs ids too.
The b.p.o linkifier can just convert all revision numbers and cs ids
to a https://hg.python.org/lookup/ link and let the lookup page figure
out where to redirect the user.

> Fourth, for the ancillary repos of devinabox, peps, benchmarks, and
> devguide, do we care if we use the GitHub merge button for PRs or do we want
> to enforce a linear history with all repos? We just need to decide if care
> about linear histories and then we can move forward since any bot we create
> won't block us from using GitHub.
>
> Those four things are enough to move devinabox over. It probably is enough
> for the benchmarks suite, but I have an email to speed@ asking if people
> want to use this opportunity to re-evaluate the benchmark suite and make any
> changes that will affect repo size (e.g., use pip to pull in the libraries
> and frameworks used by a benchmark rather than vendoring their code, making
> the repo much smaller).
>
> Website-related stuff
> ================
> This also almost gets us the peps repo, but we do need to figure out how to
> change the website to build from the git checkout rather than an hg one.
> Same goes for the devguide. It would be great if we can set up web hooks to
> immediately trigger rebuilds of those portions of the sites instead of
> having to wait until a cronjob triggers.
>

I think we should make hg.python.org read-only but keep it around and
in sync with the GitHub repo (either via cronjobs or hooks).  This
will allow people to contribute using HG in the same way that the
current GitHub clone allows people to contribute using git.  It will
also avoid breaking all the tools that currently use hg.python.org
(and buys us more time to port them if/when needed).

> CPython requirements
> =================
> There are six things to work out before we move over cpython. First, do we
> want to split out Python 2 branches into their own repo? There might be a
> clone size benefit which obviously is nice for people on slow Internet
> connections. It also clearly separates out Python 2 from 3 and lets those
> who prefer to focus on one compared to the other do that more easily. It
> does potentially make any single fix that spans 2 and 3 require a bit more
> work since it won't be an intra-clone change. We could also contemplate
> sub-repos for things like the Doc/ or Tools/ directories (although I doubt
> it's worth it).
>

I think we should keep 2/3 together.  We could split the stdlib from
the rest, but that's a separate issue.

> Second, do we want all fixes to go into master and then cherry-pick into
> other branches, or do we want to follow our current practice of going into
> the active feature branch and then merge into master? I personally prefer
> the former and I think most everyone else does as well, but I thought it
> should be at least thought about.
>

Master first and cherry-picking for older branches sounds good to me,
but I don't know if switching model will have any implications,
especially while going through the history or using tools like bisect.

> Third, how to handle Misc/NEWS? We can add a NEWS field to bugs.python.org
> and then generate the NEWS file by what issues are tied to what version and
> when they were closed. The other approach is to create a file per NEWS entry
> in a version-specific directory (Larry created code for hg already for this
> to give people an idea: http://bugs.python.org/issue18967). Then when we cut
> a release we run a tool the slurps up all of the relevant files -- which
> includes files in the directory for the next feature release which represent
> fixes which were cherry picked -- and generates the NEWS file for the final
> release. The per-file approach is bot-friendly and also CLI-friendly, but
> potentially requires more tooling and I don't know if people feel news
> entries should be tied to the issue or in the repo (although that assumes
> people find tweaking Roundup easy :).
>
> Fourth, we need to decide exactly what commands we expect core devs to run
> initially for committing code. Since we agreed to a linear history we need
> to document exactly what we expect people to do for a PR to get it into
> their git repo. This will go into the devguide -- probably will want to
> start a github branch at some point -- and will act as the commands the bot
> will want to work off of.
>

I would like to see a complete list of steps from starting to work on
an issue to having it in the repo, at least to understand the new
workflow.  This doesn't have to include all the specific commands, but
at least the basic steps (e.g. after I made a patch to I commit it and
send a pull request to the main repo, or do I push it to my GitHub
clone and push a button to send the PR?  Do I need to create a branch
before I start working on an issue?

> Fifth, what to do about Misc/ACKS? Since we are using PRs, even if we
> flatten them, I believe the PR creators will get credit in the commit as the
> author while the core dev committing will be flagged as the person doing the
> merge (someone correct me if I'm wrong because if I am this whole point is
> silly). With the commits containing credit directly, we can either
> automatically generate Misc/ACKS like the NEWS file or simply drop it for
> future contributors and just leave the file for past contributors since git
> will have kept track for us.
>

We could keep updating for regular patches with no related PR and add
a note about all the other GIT contributors (possibly with a git
command that lists all authors).
Later on we might decide to have a script that automatically adds all
the GIT contributors automatically.

> Six, we will need to update our Buildbot fleet.
>

If we keep hg.p.o around and updated, we might not have to do this now
(even though now is better than never).

> This gets us to the bare minimum needed to function.
>
> Parity with hg.python.org
> ----------------------------------
> For parity, there are some Roundup integrations that will be necessary, like
> auto-generating links, posting commits to #python-dev on IRC, etc. I'm not
> sure if people want to block until that is all in place or not. I do think
> we should make sure there is some web hook that can take an issue # from the
> title of a PR and automatically posts to the corresponding issue on
> bugs.python.org that the PR exists. If people disagree then feel free to say
> so.
>

FWIW I started adding notes to
https://wiki.python.org/moin/TrackerDevelopmentPlanning to track
everything that needs to be done on the Roundup side.
If you prefer I can later move this to the new PEP, but for now I'm
using it to keep track of all the things that come up in the various
threads.

Best Regards,
Ezio Melotti

> Adding perks
> ==========
> ...


More information about the core-workflow mailing list