[core-workflow] PEP 512: migrating hg.python.org to GitHub

Brett Cannon brett at python.org
Mon Jan 18 12:26:15 EST 2016


[just an FYI to everyone: replying without trimming the PEP will lead to
moderation, so please cut out stuff that doesn't matter when replying]

On Sun, 17 Jan 2016 at 21:37 Ezio Melotti <ezio.melotti at gmail.com> wrote:

> On Mon, Jan 18, 2016 at 4:33 AM, Brett Cannon <brett at python.org> wrote:
> >
> >
> > On Sun, 17 Jan 2016 at 16:30 Ezio Melotti <ezio.melotti at gmail.com>
> wrote:
> [SNIP]
> >> > Adding GitHub username support to bugs.python.org
> >> > +++++++++++++++++++++++++++++++++++++++++++++++++
> >> > To keep tracking of CLA signing under the direct control of the PSF,
> >> > tracking who has signed the PSF CLA will be continued by marking that
> >> > fact as part of someone's bugs.python.org user profile. What this
> >> > means is that an association will be needed between a person's
> >> > bugs.python.org [#b.p.o]_ account and their GitHub account, which
> >> > will be done through a new field in a user's profile.
> >> >
> >>
> >> We have to decide how to deal with users that don't have a b.p.o
> account.
> >> The two options that we discussed in the previous mails are:
> >>   1) require them to create a b.p.o account;
> >>   2) allow them to log in to b.p.o using their github account;
> >> (see also next section)
> >
> >
> > Both require creating an account, it just varies whether they log in
> using
> > GitHub or not. I don't see how we can avoid that if we are going to
> continue
> > to own the CLA dataset. Honestly I don't see this as a big issue as we
> have
> > not seemed to have any issues with people creating accounts up to this
> > point.
> >
>
> It's not a big issue, but there will be people that only have a github
> account, and we will need to explain them that in order to accept
> their PRs we need them to sign the CLA, and in order to sign the CLA
> we need them to create a b.p.o account and link it to their github
> username.
>

I expect this to be the common case. I wish there was a way to avoid it,
but if we want people to participate on the issue tracker they will need
the account anyway. Plus it's no worse than it is today as I expect it's
already the case most people have a GItHub account but not a b.p.o one.


>
> >>
> >> > Linking a pull request to an issue
> >> > ++++++++++++++++++++++++++++++++++
> >> > An association between a pull request and an issue is needed to track
> >> > when a fix has been proposed. The association needs to be many-to-one
> >> > as there can take multiple pull requests to solve a single issue
> >> > (technically it should be a many-to-many association for when a
> >> > single fix solves multiple issues, but this is fairly rare and issues
> >> > can be merged into one using the ``Superceder`` field on the issue
> >> > tracker).
> >> >
> >> > Association between a pull request and an issue will be done based on
> >> > detecting the regular expression``[Ii]ssue #(?P<bpo_id>\d+)``. If
> >> > this is specified in either the title or in the body of a message on
> >> > a pull request then connection will be made on
> >> > bugs.python.org [#b.p.o]_. A label will also be added to the pull
> >> > request to signify that the connection was made successfully. This
> >> > could lead to incorrect associations if the wrong issue or
> >> > referencing another issue was done, but these are rare occasions.
> >> >
> >>
> >> Is there a way to associate the PR to an issue (in case the user
> >> forgot) or change association (in case it got the wrong number) after
> >> the creation of the PR?
> >
> >
> > You tell me. :) I assume any bot we write to handle this will monitor
> > PR-level comments since GitHub doesn't notify on title changes. But we
> can
> > choose any workflow we want, so if we want it to be an explicit command
> like
> > `/bot issue 12345` then we can do that instead.
> >
>
> I was asking about the github side.  On b.p.o it's not a problem
> adding PRs either automatically or manually, but I don't know if the
> same can be done from github (it already happens when a commit message
> includes the wrong issue number -- the commit can not be changed and
> the notification is sent to the wrong issue on b.p.o).
> Even if it's not possible, I guess we could still add a comment to
> github with the correct issue number, and add the PR to the issue and
> the people to the nosy list manually.
>

So are you asking about a b.p.o -> GH association, so that you can make the
association on b.p.o and have it show up somehow? If that's your question
then we can add a comment like we do on b.p.o.


> [SNIP]
> >>
> >> > Backup of pull request data
> >> > '''''''''''''''''''''''''''
> >> > Since GitHub [#github]_ is going to be used for code hosting and code
> >> > review, those two things need to be backed up. In the case of code
> >> > hosting, the backup is implicit as all non-shallow Git [#git]_ clones
> >> > contain the full history of the repository, hence there will be many
> >> > backups of the repository.
> >> >
> >>
> >> If possible I would still prefer an "official" backup.  I don't think
> >> we want to go around asking who has the latest copy of the repo in
> >> case something happens to GitHub.
> >
> >
> > I would be shocked if a core developer doesn't have an up-to-date
> repository
> > (then again I don't expect to have to flee GitHub overnight, giving us
> time
> > to update). But if you want to make it this an optional feature then I'm
> > fine with that (I don't think the PSF infrastructure team will have issue
> > with a regularly updated clone of the repository).
> >
>
> Yes, this is not a particularly realistic issue -- even without an
> official backup we won't lose any code.  It's mostly to have an
> official backup and procedure to restore the repo instead of having to
> figure out what to do when it happens (even if it probably never
> will).  The infra team can decide if this is a reasonable request or
> if it's not worth the extra effort.
>

+1


> [SNIP]
> >> > Test coverage report
> >> > ''''''''''''''''''''
> >> > Getting an up-to-date test coverage report for Python's standard
> >> > library would be extremely beneficial as generating such a report can
> >> > take quite a while to produce.
> >> >
> >> > There are a couple pre-existing services that provide free test
> >> > coverage for open source projects. Which option is best is an open
> >> > issue: `Choosing a test coverage service`_.
> >> >
> >>
> >> Do we want to eventually request that all new code introduced is fully
> >> covered by tests?
> >
> >
> > I have always told sprinters that 80% is a good guideline. I wouldn't
> ever
> > want a rule, though. Obviously lowering coverage would not be great and
> > should be avoided, but if
> >
>
> Maybe we could use red/yellow/green labels to indicate the coverage
> level.  That alone might lead contributors to aim for the green label
> without having to create and enforce any rule.
>

To give you an idea of at least how Coveralls integration looks,
https://github.com/python-modernize/python-modernize/pull/117


>
> >>
> >> I think having an indication of how much code in a PR is covered by
> >> tests would be useful regardless of the answer to the previous
> >> question.
> >
> >
> > I know Coveralls already does this; don't know about Codecov.
> >
> >>
> >>
> >> >
> >> > Link web content back to files that it is generated from
> >> > ''''''''''''''''''''''''''''''''''''''''''''''''''''''''
> >> > It would be helpful for people who find issues with any of the
> >> > documentation that is generated from a file to have a link on each
> >> > page which points back to the file on GitHub [#github]_ that stores
> >> > the content of the page. That would allow for quick pull requests to
> >> > fix simple things such as spelling mistakes.
> >> >
> >>
> >> Here you are talking about PEPs/devguide/docs.p.o, right?
> >
> >
> > Yes.
> >
>
> FWIW the docs.python.org pages already have a "report a bug" link in
> the sidebar and also in the footer, but they both just redirect to
> https://docs.python.org/3/bugs.html .
>

Yep, which is what made me think that it would be nice if we could direct
people directly to the actual page instead of having to figure it out.

But then again, if we think a lot of drive-by PRs will be doc-based and
from people who have never contributed before, then we will have to wait
for them to sign the CLA anyway, so maybe it isn't worth it? I guess the
direct link to the underlying content is only useful if people who have
signed the CLA will use it the most.


>
> >>
> >> FWIW this was problem was supposed to be fixed with pootle, but that
> >> project seems dead (not sure if due to technical reasons, or simply
> >> because no one had time to integrate it).
> >>
> >> > Splitting out parts of the documentation into their own repositories
> >> > ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
> >> > While certain parts of the documentation at https://docs.python.org
> >> > change with the code, other parts are fairly static and are not
> >> > tightly bound to the CPython code itself. The following sections of
> >> > the documentation fit this category of slow-changing,
> >> > loosely-coupled:
> >> >
> >> > * `Tutorial <https://docs.python.org/3/tutorial/index.html>`__
> >> > * `Python Setup and Usage
> >> > <https://docs.python.org/3/using/index.html>`__
> >> > * `HOWTOs <https://docs.python.org/3/howto/index.html>`__
> >> > * `Installing Python Modules
> >> > <https://docs.python.org/3/installing/index.html>`__
> >> > * `Distributing Python Modules
> >> > <https://docs.python.org/3/distributing/index.html>`__
> >> > * `Extending and Embedding
> >> > <https://docs.python.org/3/extending/index.html>`__
> >> > * `FAQs <https://docs.python.org/3/faq/index.html>`__
> >> >
> >> > These parts of the documentation could be broken out into their own
> >> > repositories to simplify their maintenance and to expand who has
> >> > commit rights to them to ease in their maintenance.
> >>
> >> I would still consider these somewhat dynamic (especially between Py 2
> >> and Py 3).
> >> There are other documents that are more version-independent, such as
> >> the whatsnew pages,
> >
> >
> > The What's New docs get updated with changes so that can't be pulled out
> > from the cpython repo (especially if my dream ever comes true of making
> > people be better about making sure that document gets updated with
> changes
> > that warrant being mentioned there).
> >
>
> The reason to keep them in a separate repo is that they are identical
> copies of the same document.
> If you fix a typo in the whatsnew/2.7, you have to fix the same typo
> in all branches, and in theory the page should always be identical in
> each branch.


Sure, but I try to make it a habit to update What's New at the same time as
committing something that warrants a mention. Pulling that out will become
a bigger chore.

Having said that, if we want core developers to be the ones that author
 those kinds of changes then having it be a separate repo won't necessarily
be as critical. We would just probably need to add a "needs What's New
mention" label or something to keep track of what needs to be added so
people didn't forget.


> Howtos, FAQs, etc. might differ between major and even minor versions
> as new features are added and old features deprecated, so it makes
> sense to have (slightly) different versions in each branch (unless you
> want to move them outside the cpython repo and still keep separate
> branches).
> For example
> https://docs.python.org/3/faq/programming.html#is-there-an-equivalent-of-c-s-ternary-operator
> had a different answer before 2.5 :)
>

I suspect that when that kind of situation occurred that I version-specific
branch would be created and then merged in once a release went out.


>
> Also moving them might make things more complicated (from simply
> having to find/clone another repo to building docs for docs.python.org
> from different sources).
>

It's possible, but I doubt it. We could have a docs.python.org repo that
contains everything but the docs carried in the cpython repo. Or we can
have these other doc repos be Git submodules of cpython that you can check
out if you want, but it isn't necessary to.

What I do know is this idea has come up many times in the past from the
perspective of being able to give out commit rights to the docs much more
readily than with the cpython repo and I think that's a good idea.


> [SNIP]
> >> > Git CLI commands for committing a pull request to cpython
> >> > ---------------------------------------------------------
> >> > Because Git [#git]_ may be a new version control system for core
> >> > developers, the commands people are expected to run will need to be
> >> > written down. These commands also need to keep a linear history while
> >> > giving proper attribution to the pull request author.
> >> >
> >> > Another set of commands will also be necessary for when working with
> >> > a patch file uploaded to bugs.python.org [#b.p.o]_. Here the linear
> >> > history will be kept implicitly, but it will need to make sure to
> >> > keep/add attribution.
> >> >
> >>
> >> Nick Coghlan, Pierre-Yves David (a Mercurial dev), and Shiyao Ma (one
> >> of our GSoC student) have been working on an HG extension that
> >> simplifies interaction with the bug tracker (see the list of patches,
> >> download/apply them, upload new patches):
> >> https://bitbucket.org/introom/hg-cpydev
> >> In a previous email, someone mentioned an alias that allows an easier
> >> interaction with PRs.  Would it make sense to write and distribute an
> >> official git extension that provides extra commands/aliases for these
> >> set of commands?  (I guess the answer depends on how many tasks we
> >> have and how straightforward it is to do with plain git commands.)
> >
> >
> > It quite possibly might be. Otherwise shell commands could be written and
> > kept in Tools/. I major perk IMO with Git over Mercurial is Git Bash
> comes
> > with GIt and that gives you Bash on Windows. That makes writing
> > cross-platform shell scripts to help with this sort of thing easy without
> > leaving Windows users stranded.
> >
>
> Isn't that also possible with HG, since extensions are written in Python?
>

I mean .sh files, not extensions, e.g., we have a shell script in Tools
that calls Git directly.


> (BTW, is it possible/reasonable to write git extensions/hooks in Python?)
>

I believe so. From my understanding, Git doesn't go with an API solution
like Mercurial and instead prefers a way to simply specify a naming scheme
that will call commands on your $PATH. See
https://www.atlassian.com/git/articles/extending-git/ as an example.


>
> >>
> >>
> >> > How to handle the Misc/NEWS file
> >> > --------------------------------
> >> > There are two competing approaches to handling
> >> > ``Misc/NEWS`` [#news-file]_. One is to add a news entry for issues on
> >> > bugs.python.org [#b.p.o]_. This would mean an issue that is marked
> >> > as "resolved" could not be closed until a news entry is added in the
> >> > "news" field in the issue tracker. The benefit of tying the news
> >> > entry to the issue is it makes sure that all changes worthy of a news
> >> > entry have an accompanying issue. It also makes classifying a news
> >> > entry automatic thanks to the Component field of the issue. The
> >> > Versions field of the issue also ties the news entry to which Python
> >> > releases were affected. A script would be written to query
> >> > bugs.python.org for relevant new entries for a release and to produce
> >> > the output needed to be checked into the code repository. This
> >> > approach is agnostic to whether a commit was done by CLI or bot.
> >> >
> >> > The competing approach is to use an individual file per news entry,
> >> > containg the text for the entry. In this scenario each feature
> >>
> >> Typo: containing
> >>
> >> > release would have its own directory for news entries and a separate
> >> > file would be created in that directory that was either named after
> >> > the issue it closed or a timestamp value (which prevents collisions).
> >> > Merges across branches would have no issue as the news entry file
> >> > would still be uniqeuely named and in the directory of the latest
> >>
> >> Typo: uniquely
> >>
> >> > version that contained the fix. A script would collect all news entry
> >> > files no matter what directory they reside in and create an
> >> > appropriate news file (the release directory can be ignored as the
> >> > mere fact that the file exists is enough to represent that the entry
> >> > belongs to the release). Classification can either be done by keyword
> >> > in the new entry file itself or by using subdirectories representing
> >> > each news entry classification in each release directory (or
> >> > classification of news entries could be dropped since critical
> >> > information is captured by the "What's New" documents which are
> >> > organized). The benefit of this approach is that it keeps the changes
> >> > with the code that was actually changed. It also ties the message to
> >> > being part of the commit which introduced the change. For a commit
> >> > made through the CLI, a script will be provided to help generate the
> >> > file. In a bot-driven scenario, the merge bot will have a way to
> >> > specify a specific news entry and create the file as part of its
> >> > flattened commit (while most likely also supporting using the first
> >> > line of the commit message if no specific news entry was specified).
> >> > Code for this approach has been written previously for the Mercurial
> >> > workflow at http://bugs.python.org/issue18967. There is also a tool
> >> > from the community at https://pypi.python.org/pypi/towncrier.
> >> >
> >>
> >> Does using git (and fast-forward merges, rebases, etc.) still create
> >> conflicts?
> >> (I guess the answer is yes, but perhaps we should double-check.)
> >
> >
> > I believe so, but I guess I could be wrong since I have not explicitly
> > tested it.
> >
> >>
> >>
> >> If it does, there's also a third option: writing a merge script.
> >> I wrote a basic one for hg and it seemed to work decently, perhaps
> >> with git it's even easier.
> >
> >
> > I'll mention it, but i suspect the file-based solution will win out
> based on
> > the feeling I have gotten from people when this topic has come up before.
> >
>
> I was re-reading the issue, and found due interesting links:
>   1)
> https://mail.python.org/pipermail/python-dev/2014-December/137393.html
> (Pierre-Yves David actually convinced me to write the merge script and
> helped me)
>   2) https://github.com/twisted/newsbuilder (this can be another
> option that can be added to the PEP)
>

Thanks, I'll add them.


>
>
> A few additional points that you might want to add to the PEP:
> * A new workflow for releasing Python should also be defined, and PEP
> 101 updated accordingly.
>

I view this as implicit.


> * The devguide also needs to be updated.
>

Implicit, but I can call this out. I assume it will end up with a
github-migration branch for a while to store updates until the migration
occurs.


> * We should decide what to do with all the other repos at hg.python.org


They are all personal repos, so people can do what they want with them.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160118/05004b07/attachment-0001.html>


More information about the core-workflow mailing list