On Mon, Jan 18, 2016 at 4:33 AM, Brett Cannon
On Sun, 17 Jan 2016 at 16:30 Ezio Melotti
wrote: Repositories to Migrate ======================= While hg.python.org [#h.p.o]_ hosts many repositories, there are only six key repositories that must move:
1. devinabox [#devinabox-repo]_ 2. benchmarks [#benchmarks-repo]_ 3. tracker [#tracker-repo]_
I don't think the tracker repo /must/ move (at least not immediately).
I don't think so either, just that it should probably happen at some point, especially if hg.python.org gets shut down.
The tracker repo includes 7 subrepos (our own forks of Roundup, Rietveld, and django-gae2django,
Is that being used by anyone?
gae2django is necessary to run Rietveld. If we remove Rietveld, we won't need gae2django anymore. Roundup is used to run all the other 4 instances.
and the instances for b.p.o, the meta tracker, the Jython tracker,
They might not stick with their issue tracker (the Jython team knows what's going on and they are going to have a discussion over what they will end up doing).
I (and RDM/MvL) usually apply security patches and other minor fixes to the Jython tracker as well, but otherwise no one maintains it. Since it requires basically no maintenance, I wouldn't mind converting and maintaining this repo together with the other tracker instances, like I already did when we moved from SVN to HG.
and the setuptools tracker
Is that being used? https://bitbucket.org/pypa/setuptools/issues seems to be actively used.
I think this exists to keep record of all issues, but it's not actively used (the bitbucket one is used instead). They probably didn't migrate all the issues to bitbucket, so shutting it down would mean losing all the old issues. As for the Jython tracker, migrating and maintaining this repo with the others shouldn't be a problem.
) and migrating it now means that:
1) The total number of repos to be converted will be 12* instead of 5; 2) I (and RDM/MvL) will have to learn a new workflow while updating the tracker for the migration;
* 10 if we exclude Rietveld and gae2django (but that means holding up the migration until they are removed)
I think it would be best to: 1) migrate the other repos; 2) work on the b.p.o/github integration using HG; 3) remove Rietveld; 4) (possibly) migrate the 4 tracker instances to GitHub
That's fine by me. As I said, migration is dependent on what happens to hg.python.org as to whether it has to happen or not.
Our Roundup fork is also an HG clone of upstream Roundup (with a separate branch with our changes), so migrating that to git, while doable, might be problematic. FWIW when we switched from SVN to HG it took over a year and half before I migrated the tracker repos, and I don't remember anyone being particularly concerned that we were still using SVN or relieved when we eventually moved it to HG. Personally I wouldn't mind if we kept them on HG (RDM/MvL might disagree though), but if it's a burden for the infra-team to keep hg.python.org alive, I would prefer if we moved them later rather than sooner.
Another option is to move it to Bitbucket if you want to keep it on Mercurial if hg.python.org doesn't stick around.
True. I already have a semi-official clone of it on BitBucket at https://bitbucket.org/ezio_melotti/roundup-bpo/overview
4. peps [#peps-repo]_ 5. devguide [#devguide-repo]_ 6. cpython [#cpython-repo]_
The devinabox, benchmarksm and tracker repositories are code-only.
Typo: benchmarks
The peps and devguide repositories involve the generation of webpages. And the cpython repository has special requirements for integration with bugs.python.org [#b.p.o]_.
Adding GitHub username support to bugs.python.org +++++++++++++++++++++++++++++++++++++++++++++++++ To keep tracking of CLA signing under the direct control of the PSF, tracking who has signed the PSF CLA will be continued by marking that fact as part of someone's bugs.python.org user profile. What this means is that an association will be needed between a person's bugs.python.org [#b.p.o]_ account and their GitHub account, which will be done through a new field in a user's profile.
We have to decide how to deal with users that don't have a b.p.o account. The two options that we discussed in the previous mails are: 1) require them to create a b.p.o account; 2) allow them to log in to b.p.o using their github account; (see also next section)
Both require creating an account, it just varies whether they log in using GitHub or not. I don't see how we can avoid that if we are going to continue to own the CLA dataset. Honestly I don't see this as a big issue as we have not seemed to have any issues with people creating accounts up to this point.
It's not a big issue, but there will be people that only have a github account, and we will need to explain them that in order to accept their PRs we need them to sign the CLA, and in order to sign the CLA we need them to create a b.p.o account and link it to their github username.
Linking a pull request to an issue ++++++++++++++++++++++++++++++++++ An association between a pull request and an issue is needed to track when a fix has been proposed. The association needs to be many-to-one as there can take multiple pull requests to solve a single issue (technically it should be a many-to-many association for when a single fix solves multiple issues, but this is fairly rare and issues can be merged into one using the ``Superceder`` field on the issue tracker).
Association between a pull request and an issue will be done based on detecting the regular expression``[Ii]ssue #(?P
\d+)``. If this is specified in either the title or in the body of a message on a pull request then connection will be made on bugs.python.org [#b.p.o]_. A label will also be added to the pull request to signify that the connection was made successfully. This could lead to incorrect associations if the wrong issue or referencing another issue was done, but these are rare occasions. Is there a way to associate the PR to an issue (in case the user forgot) or change association (in case it got the wrong number) after the creation of the PR?
You tell me. :) I assume any bot we write to handle this will monitor PR-level comments since GitHub doesn't notify on title changes. But we can choose any workflow we want, so if we want it to be an explicit command like `/bot issue 12345` then we can do that instead.
I was asking about the github side. On b.p.o it's not a problem adding PRs either automatically or manually, but I don't know if the same can be done from github (it already happens when a commit message includes the wrong issue number -- the commit can not be changed and the notification is sent to the wrong issue on b.p.o). Even if it's not possible, I guess we could still add a comment to github with the correct issue number, and add the PR to the issue and the people to the nosy list manually.
Create https://git.python.org ''''''''''''''''''''''''''''' Just as hg.python.org [#h.p.o]_ currently points to the Mercurial repository for Python, git.python.org should do the equivalent for the Git repository.
Is this simply a redirect to GitHub or something else? We might also want to encourage the use of git.python.org in the Devguide and elsewhere to make future migrations simpler.
From my understanding it can't be any else but a redirect. But yes, we will encourage using git.python.org to keep the coupling to GitHub loose for when we make this kind of change again in the future.
Backup of pull request data ''''''''''''''''''''''''''' Since GitHub [#github]_ is going to be used for code hosting and code review, those two things need to be backed up. In the case of code hosting, the backup is implicit as all non-shallow Git [#git]_ clones contain the full history of the repository, hence there will be many backups of the repository.
If possible I would still prefer an "official" backup. I don't think we want to go around asking who has the latest copy of the repo in case something happens to GitHub.
I would be shocked if a core developer doesn't have an up-to-date repository (then again I don't expect to have to flee GitHub overnight, giving us time to update). But if you want to make it this an optional feature then I'm fine with that (I don't think the PSF infrastructure team will have issue with a regularly updated clone of the repository).
Yes, this is not a particularly realistic issue -- even without an official backup we won't lose any code. It's mostly to have an official backup and procedure to restore the repo instead of having to figure out what to do when it happens (even if it probably never will). The infra team can decide if this is a reasonable request or if it's not worth the extra effort.
The code review history does not have the same implicit backup mechanism as the repository itself. That means a daily backup of code review history should be done so that it is not lost in case of any issues with GitHub. It also helps guarantee that a migration from GitHub to some other code review system is feasible were GitHub to disappear overnight.
Is there an API to export this or do we need to write another bot and decide how to collect/store the reviews?
I don't know; maybe someone else can speak up on this topic who has experience. I know for a fact that the webhooks can be used, but whether there is some specific API to get a dump I don't know.
Test coverage report '''''''''''''''''''' Getting an up-to-date test coverage report for Python's standard library would be extremely beneficial as generating such a report can take quite a while to produce.
There are a couple pre-existing services that provide free test coverage for open source projects. Which option is best is an open issue: `Choosing a test coverage service`_.
Do we want to eventually request that all new code introduced is fully covered by tests?
I have always told sprinters that 80% is a good guideline. I wouldn't ever want a rule, though. Obviously lowering coverage would not be great and should be avoided, but if
Maybe we could use red/yellow/green labels to indicate the coverage level. That alone might lead contributors to aim for the green label without having to create and enforce any rule.
I think having an indication of how much code in a PR is covered by tests would be useful regardless of the answer to the previous question.
I know Coveralls already does this; don't know about Codecov.
Link web content back to files that it is generated from '''''''''''''''''''''''''''''''''''''''''''''''''''''''' It would be helpful for people who find issues with any of the documentation that is generated from a file to have a link on each page which points back to the file on GitHub [#github]_ that stores the content of the page. That would allow for quick pull requests to fix simple things such as spelling mistakes.
Here you are talking about PEPs/devguide/docs.p.o, right?
Yes.
FWIW the docs.python.org pages already have a "report a bug" link in the sidebar and also in the footer, but they both just redirect to https://docs.python.org/3/bugs.html .
FWIW this was problem was supposed to be fixed with pootle, but that project seems dead (not sure if due to technical reasons, or simply because no one had time to integrate it).
Splitting out parts of the documentation into their own repositories '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' While certain parts of the documentation at https://docs.python.org change with the code, other parts are fairly static and are not tightly bound to the CPython code itself. The following sections of the documentation fit this category of slow-changing, loosely-coupled:
* `Tutorial https://docs.python.org/3/tutorial/index.html`__ * `Python Setup and Usage https://docs.python.org/3/using/index.html`__ * `HOWTOs https://docs.python.org/3/howto/index.html`__ * `Installing Python Modules https://docs.python.org/3/installing/index.html`__ * `Distributing Python Modules https://docs.python.org/3/distributing/index.html`__ * `Extending and Embedding https://docs.python.org/3/extending/index.html`__ * `FAQs https://docs.python.org/3/faq/index.html`__
These parts of the documentation could be broken out into their own repositories to simplify their maintenance and to expand who has commit rights to them to ease in their maintenance.
I would still consider these somewhat dynamic (especially between Py 2 and Py 3). There are other documents that are more version-independent, such as the whatsnew pages,
The What's New docs get updated with changes so that can't be pulled out from the cpython repo (especially if my dream ever comes true of making people be better about making sure that document gets updated with changes that warrant being mentioned there).
The reason to keep them in a separate repo is that they are identical copies of the same document. If you fix a typo in the whatsnew/2.7, you have to fix the same typo in all branches, and in theory the page should always be identical in each branch. Howtos, FAQs, etc. might differ between major and even minor versions as new features are added and old features deprecated, so it makes sense to have (slightly) different versions in each branch (unless you want to move them outside the cpython repo and still keep separate branches). For example https://docs.python.org/3/faq/programming.html#is-there-an-equivalent-of-c-s... had a different answer before 2.5 :) Also moving them might make things more complicated (from simply having to find/clone another repo to building docs for docs.python.org from different sources).
the meta information section at the bottom of docs.p.o/index (that includes pages such as https://docs.python.org/3/bugs.html ) and a new page with the current status of the releases proposed in http://bugs.python.org/issue25296
Open Issues =========== For this PEP, open issues are ones where a decision needs to be made to how to approach or solve a problem. Open issues do not entail coordination issues such as who is going to write a certain bit of code.
The fate of hg.python.org ------------------------- With the code repositories moving over to Git [#git]_, there is no technical need to keep hg.python.org [#h.p.o]_ running. Having said that, some in the community would like to have it stay functioning as a Mercurial [#hg]_ mirror of the Git repositories. Others have said that they still want a mirror, but one using Git.
As maintaining hg.python.org is not necessary, it will be up to the PSF infrastructure committee to decide if they want to spend the time and resources to keep it running. They may also choose whether they want to host a Git mirror on PSF infrastructure.
+1 to keep a read-only mirror. Also see above about the tracker repo and our Roundup fork.
Git CLI commands for committing a pull request to cpython --------------------------------------------------------- Because Git [#git]_ may be a new version control system for core developers, the commands people are expected to run will need to be written down. These commands also need to keep a linear history while giving proper attribution to the pull request author.
Another set of commands will also be necessary for when working with a patch file uploaded to bugs.python.org [#b.p.o]_. Here the linear history will be kept implicitly, but it will need to make sure to keep/add attribution.
Nick Coghlan, Pierre-Yves David (a Mercurial dev), and Shiyao Ma (one of our GSoC student) have been working on an HG extension that simplifies interaction with the bug tracker (see the list of patches, download/apply them, upload new patches): https://bitbucket.org/introom/hg-cpydev In a previous email, someone mentioned an alias that allows an easier interaction with PRs. Would it make sense to write and distribute an official git extension that provides extra commands/aliases for these set of commands? (I guess the answer depends on how many tasks we have and how straightforward it is to do with plain git commands.)
It quite possibly might be. Otherwise shell commands could be written and kept in Tools/. I major perk IMO with Git over Mercurial is Git Bash comes with GIt and that gives you Bash on Windows. That makes writing cross-platform shell scripts to help with this sort of thing easy without leaving Windows users stranded.
Isn't that also possible with HG, since extensions are written in Python? (BTW, is it possible/reasonable to write git extensions/hooks in Python?)
How to handle the Misc/NEWS file -------------------------------- There are two competing approaches to handling ``Misc/NEWS`` [#news-file]_. One is to add a news entry for issues on bugs.python.org [#b.p.o]_. This would mean an issue that is marked as "resolved" could not be closed until a news entry is added in the "news" field in the issue tracker. The benefit of tying the news entry to the issue is it makes sure that all changes worthy of a news entry have an accompanying issue. It also makes classifying a news entry automatic thanks to the Component field of the issue. The Versions field of the issue also ties the news entry to which Python releases were affected. A script would be written to query bugs.python.org for relevant new entries for a release and to produce the output needed to be checked into the code repository. This approach is agnostic to whether a commit was done by CLI or bot.
The competing approach is to use an individual file per news entry, containg the text for the entry. In this scenario each feature
Typo: containing
release would have its own directory for news entries and a separate file would be created in that directory that was either named after the issue it closed or a timestamp value (which prevents collisions). Merges across branches would have no issue as the news entry file would still be uniqeuely named and in the directory of the latest
Typo: uniquely
version that contained the fix. A script would collect all news entry files no matter what directory they reside in and create an appropriate news file (the release directory can be ignored as the mere fact that the file exists is enough to represent that the entry belongs to the release). Classification can either be done by keyword in the new entry file itself or by using subdirectories representing each news entry classification in each release directory (or classification of news entries could be dropped since critical information is captured by the "What's New" documents which are organized). The benefit of this approach is that it keeps the changes with the code that was actually changed. It also ties the message to being part of the commit which introduced the change. For a commit made through the CLI, a script will be provided to help generate the file. In a bot-driven scenario, the merge bot will have a way to specify a specific news entry and create the file as part of its flattened commit (while most likely also supporting using the first line of the commit message if no specific news entry was specified). Code for this approach has been written previously for the Mercurial workflow at http://bugs.python.org/issue18967. There is also a tool from the community at https://pypi.python.org/pypi/towncrier.
Does using git (and fast-forward merges, rebases, etc.) still create conflicts? (I guess the answer is yes, but perhaps we should double-check.)
I believe so, but I guess I could be wrong since I have not explicitly tested it.
If it does, there's also a third option: writing a merge script. I wrote a basic one for hg and it seemed to work decently, perhaps with git it's even easier.
I'll mention it, but i suspect the file-based solution will win out based on the feeling I have gotten from people when this topic has come up before.
I was re-reading the issue, and found due interesting links: 1) https://mail.python.org/pipermail/python-dev/2014-December/137393.html (Pierre-Yves David actually convinced me to write the merge script and helped me) 2) https://github.com/twisted/newsbuilder (this can be another option that can be added to the PEP) A few additional points that you might want to add to the PEP: * A new workflow for releasing Python should also be defined, and PEP 101 updated accordingly. * The devguide also needs to be updated. * We should decide what to do with all the other repos at hg.python.org Best Regards, Ezio Melotti
Naming the commit bot --------------------- As naming things can lead to bikeshedding of epic proportions, Brett Cannon will choose the final name of the commit bot (the name of the project for the bot itself can be anything, this is purely for the name used in giving commands to the bot). The name will come from Monty Python, which is only fitting since Python is named after the comedy troupe. It will most likely come from 'Monty Python and the Holy Grail' [#holy-grail]_ (which happens to be how Brett was introduced to Monty Python). Current ideas on the name include:
"Black Knight" sketch [#black-knight-sketch]_:
* black-knight * none-shall-pass * just-a-flesh-wound
"Bridge of Death" sketch [#bridge-of-death-sketch]_:
* bridge-keeper * man-from-scene-24 * five-questions * what-is-your-quest * blue-no-green * air-speed-velocity * your-favourite-colour (and that specific spelling; Monty Python is British, after all)
"Killer rabbit" sketch [#killer-rabbit-sketch]_:
* killer-rabbit * holy-hand-grenade * 5-is-right-out
"Witch Village" sketch [#witch-village-sketch]_:
* made-of-wood * burn-her
"French Taunter" sketch [#french-taunter-sketch]_:
* elderberries * kanigget
"Constitutional Peasants" sketch [#constitutional-peasants-sketch]_:
* dennis * from-the-masses * watery-tart
"Knights Who Say Ni" sketch [#ni-sketch]_:
* shubbery * ni
From "Monty Python and the Holy Grail" in general:
* brave-sir-robin
FWIW the bot that currently reports HG commits on IRC is called deadparrot.
I'll take that into consideration. :)
-Brett
Best Regards, Ezio Melotti