On Jan 5, 2016, at 1:03 PM, Brett Cannon <brett@python.org> wrote:



On Mon, 4 Jan 2016 at 21:54 Ezio Melotti <ezio.melotti@gmail.com> wrote:
On Tue, Jan 5, 2016 at 2:42 AM, Brett Cannon <brett@python.org> wrote:
> So consider this the starting discussion of the PEP that will be the
> hg.python.org -> GitHub transition PEP that I will be in charge of. Once we
> have general agreement on the steps necessary I will start the actual PEP
> and check it in, but I figure there's no point in have a skeleton PEP if we
> can't agree on the skeleton. :) While I list steps influencing all the
> repos, I want to focus on the ones stopping any repo from moving over for
> now, expanding what we worry about to the cpython repo as we knock blockers
> down until we move everything over and start adding GitHub perks.
>
> The way I see it, we have 4 repos to move: devinabox, benchmarks, peps,
> devguide, and cpython.

On top of this, there is also the test repo
(https://hg.python.org/test) and all the tracker repos
(https://hg.python.org/tracker/).
I think it would be useful to port the former since it will provide a
place for devs to try things out and experiment (a new test repo could
also be created though).
It would be nice to port the tracker repos too and be consistent with
the others, but it's not a priority.  When we switched to HG they kept
being on SVN until I ported them, so I guess the same thing can be
done (unless R. David or Martin prefer to stick to HG).

> I also think that's essentially the order we should
> migrate them over. Some things will need to be universally handled before we
> transition a single repo, while other things are only a blocker for some of
> the repos.
>
> Universal blockers
> ==============
> There are four blockers that must be resolved before we even consider moving
> a repo over. They can be solved in parallel, but they all need to have a
> selected solution before we can move even the devinabox repo.
>
> First, we need to decide how we are going to handle adding all the core devs
> to GitHub. Are we simply going to add all of them to the python
> organization, or do we want something like a specific python-dev gteamroup
> that gets added to all of the relevant repos? Basically I'm not sure how
> picky we want to be about the people who already have organization access on
> GitHub about them implicitly getting access to the cpython repo at the end
> of the day (and the same goes for any of the other repos in the python
> organization). For tracking names, I figure we will create a file in the
> devguide where people can check in their GitHub usernames and I can manually
> add people as people add themselves to the file.
>

I think the current list of core-devs should be converted to a group
and given access to the same repos they have access to now (i.e.
cpython/devguide/peps and possibly others).  Then additional
repo-specific groups can be created in case we want to let specific
contributors work on peps or the devguide.

This seems to be the general consensus, so we will create a python-dev team under the python org and add the core devs there.

Just to expand on this more, GitHub has recently updated their permission model more to also allow “outside contributors” to be added to a single repository without adding them to a group (Not sure if this would be useful for us or not).

I imagine the python-dev team would have “Write Access” (See https://help.github.com/articles/repository-permission-levels-for-an-organization/) unless we think it needs admin access.

 

> Second, CLA enforcement. As of right now people go to
> https://www.python.org/psf/contrib/contrib-form/, fill in the form, and then
> Ewa gets an email where she manually flips a flag in Roundup. If we want to
> use a web hook to verify someone has signed a CLA then we need to decide
> where the ground truth for CLAs are. Do we want to keep using Roundup to
> manage CLA agreements and thus add a GitHub field in bugs.python.org for
> people's profile and a web hook or bot that will signal if someone has the
> flag flipped on bugs.python.org?

This can be done.  We can add a "GitHub" username field to Roundup
users so that we can link the two.

OK, so it sounds like we will stick with our current CLA signing flow and write our own CLA bot that will query Roundup as to whether someone has signed the CLA or not and then throw up a banner signalling if someone has (not) signed and an appropriate link to the CLA. That will require some Roundup work and the creation of the bot.

We can set a commit status that will show red if the user hasn’t signed the CLA (just like if Travis tests failed or so). No need to use a banner or anything.


I should also mention, any bot creations we do should abstract out the code review tool so that when we change providers again in the future it will be more straight-forward to just update some select APIs rather than rewrite every bot we create.
 


> Or is there some prepackaged service that
> we can use that will keep track of this which would cause us to not use
> Roundup (which might be easier, but depending on the service require
> everyone to re-sign)? There's also the issue of supporting people who want
> to submit code by uploading a patch to bugs.python.org but not use GitHub.
> Either way I don't want to have to ask everyone who submits a PR what their
> bugs.python.org username is and then go check that manually.
>

This also brings up another problem.
Since the discussion about an issue happens on b.p.o and the PRs are
submitted on GitHub, this means that:
1) users with only a GitHub account have to create a b.p.o account if
they want to comment on the issue (exclusing review comments);
2) users with only a b.p.o account have to create a GitHub account if
they want to review a PR;
3) users with both can comment on b.p.o and review on GitHub, but they
might need to login twice.

It would be better if users didn't need to create and use two separate accounts.

If we can add GitHub as a login/creation option for b.p.o accounts then that solves that. But I'm willing to bet a majority of people will already have a GitHub account and we have always required the b.p.o account so #1 is the going to be the common case.
 

> Third, how do we want to do the repo conversions? We need to choose the
> tool(s) and command(s) that we want to use. There was mention of wanting a
> mapping from hg commit ID to git commit ID. If we have that we could have a
> static bugs.python.org/commit/<ID> page that had the mapping embedded in
> some JavaScript and if <ID> matched then we just forward them to the
> corresponding GitHub commit page, otherwise just blindly forward to GitHub
> and assume the ID is git-only, giving us a stable URL for commit web views.
>

As I mentioned on python-committers, we already have
https://hg.python.org/lookup/ .
This is currently used to map SVN->HG (e.g.
https://hg.python.org/lookup/r12345 ), and should be extended to
handle cs ids too.
The b.p.o linkifier can just convert all revision numbers and cs ids
to a https://hg.python.org/lookup/ link and let the lookup page figure
out where to redirect the user.

> Fourth, for the ancillary repos of devinabox, peps, benchmarks, and
> devguide, do we care if we use the GitHub merge button for PRs or do we want
> to enforce a linear history with all repos? We just need to decide if care
> about linear histories and then we can move forward since any bot we create
> won't block us from using GitHub.
>
> Those four things are enough to move devinabox over. It probably is enough
> for the benchmarks suite, but I have an email to speed@ asking if people
> want to use this opportunity to re-evaluate the benchmark suite and make any
> changes that will affect repo size (e.g., use pip to pull in the libraries
> and frameworks used by a benchmark rather than vendoring their code, making
> the repo much smaller).
>
> Website-related stuff
> ================
> This also almost gets us the peps repo, but we do need to figure out how to
> change the website to build from the git checkout rather than an hg one.
> Same goes for the devguide. It would be great if we can set up web hooks to
> immediately trigger rebuilds of those portions of the sites instead of
> having to wait until a cronjob triggers.
>

I think we should make hg.python.org read-only but keep it around and
in sync with the GitHub repo (either via cronjobs or hooks).  This
will allow people to contribute using HG in the same way that the
current GitHub clone allows people to contribute using git.  It will
also avoid breaking all the tools that currently use hg.python.org
(and buys us more time to port them if/when needed).

That's easy to say, but someone also has to maintain hg.python.org then and we are doing this move partially to try and cut down on the amount of custom infrastructure that we maintain. If people are that worried about others being so adverse to using GitHub that they won't even do an anonymous clone from their servers then we can get a Bitbucket or GitLab clone set up, but I would rather try and cut out our repo hosting services if possible (who knows, maybe we can even finally retire svn.python.org thanks to shallow clones or something).

I think a mirror is a different level of involvement than running the main repository. We won’t have to worry about backups, if the server goes down people can still contribute, etc. Most likely it will also be able to be resized to a smaller server as well (but maybe not). I don’t think any of the infra team would be upset at one less server to manage though, so I’m more than happy either way.

 

> CPython requirements
> =================
> There are six things to work out before we move over cpython. First, do we
> want to split out Python 2 branches into their own repo? There might be a
> clone size benefit which obviously is nice for people on slow Internet
> connections. It also clearly separates out Python 2 from 3 and lets those
> who prefer to focus on one compared to the other do that more easily. It
> does potentially make any single fix that spans 2 and 3 require a bit more
> work since it won't be an intra-clone change. We could also contemplate
> sub-repos for things like the Doc/ or Tools/ directories (although I doubt
> it's worth it).
>

I think we should keep 2/3 together.  We could split the stdlib from
the rest, but that's a separate issue.

This seems to be the general consensus, so we will plan to keep cpython as a single repo.
 

> Second, do we want all fixes to go into master and then cherry-pick into
> other branches, or do we want to follow our current practice of going into
> the active feature branch and then merge into master? I personally prefer
> the former and I think most everyone else does as well, but I thought it
> should be at least thought about.
>

Master first and cherry-picking for older branches sounds good to me,
but I don't know if switching model will have any implications,
especially while going through the history or using tools like bisect.

This seems to be the general consensus, so we will plan to cherry pick commits into older branches.
 

> Third, how to handle Misc/NEWS? We can add a NEWS field to bugs.python.org
> and then generate the NEWS file by what issues are tied to what version and
> when they were closed. The other approach is to create a file per NEWS entry
> in a version-specific directory (Larry created code for hg already for this
> to give people an idea: http://bugs.python.org/issue18967). Then when we cut
> a release we run a tool the slurps up all of the relevant files -- which
> includes files in the directory for the next feature release which represent
> fixes which were cherry picked -- and generates the NEWS file for the final
> release. The per-file approach is bot-friendly and also CLI-friendly, but
> potentially requires more tooling and I don't know if people feel news
> entries should be tied to the issue or in the repo (although that assumes
> people find tweaking Roundup easy :).
>
> Fourth, we need to decide exactly what commands we expect core devs to run
> initially for committing code. Since we agreed to a linear history we need
> to document exactly what we expect people to do for a PR to get it into
> their git repo. This will go into the devguide -- probably will want to
> start a github branch at some point -- and will act as the commands the bot
> will want to work off of.
>

I would like to see a complete list of steps from starting to work on
an issue to having it in the repo, at least to understand the new
workflow.  This doesn't have to include all the specific commands, but
at least the basic steps (e.g. after I made a patch to I commit it and
send a pull request to the main repo, or do I push it to my GitHub
clone and push a button to send the PR?  Do I need to create a branch
before I start working on an issue?

There will be a step-by-step guide in the devguide to answer all of this before we make any switch.
 

> Fifth, what to do about Misc/ACKS? Since we are using PRs, even if we
> flatten them, I believe the PR creators will get credit in the commit as the
> author while the core dev committing will be flagged as the person doing the
> merge (someone correct me if I'm wrong because if I am this whole point is
> silly). With the commits containing credit directly, we can either
> automatically generate Misc/ACKS like the NEWS file or simply drop it for
> future contributors and just leave the file for past contributors since git
> will have kept track for us.
>

We could keep updating for regular patches with no related PR and add
a note about all the other GIT contributors (possibly with a git
command that lists all authors).
Later on we might decide to have a script that automatically adds all
the GIT contributors automatically.

This seems to be the general consensus, so we will keep Misc/ACKS around and have a tool that updates it based on git PR commits at release-time.
 

> Six, we will need to update our Buildbot fleet.
>

If we keep hg.p.o around and updated, we might not have to do this now
(even though now is better than never).

> This gets us to the bare minimum needed to function.
>
> Parity with hg.python.org
> ----------------------------------
> For parity, there are some Roundup integrations that will be necessary, like
> auto-generating links, posting commits to #python-dev on IRC, etc. I'm not
> sure if people want to block until that is all in place or not. I do think
> we should make sure there is some web hook that can take an issue # from the
> title of a PR and automatically posts to the corresponding issue on
> bugs.python.org that the PR exists. If people disagree then feel free to say
> so.
>

FWIW I started adding notes to
https://wiki.python.org/moin/TrackerDevelopmentPlanning to track
everything that needs to be done on the Roundup side.
If you prefer I can later move this to the new PEP, but for now I'm
using it to keep track of all the things that come up in the various
threads.

Nope, the wiki is fine for that sort of thing.

-Brett
_______________________________________________
core-workflow mailing list
core-workflow@python.org
https://mail.python.org/mailman/listinfo/core-workflow
This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA