Backing up GitHub data has been brought up since the time we migrated to GitHub, and being tracked here:

TL;DR We'll be using GitHub's new Migrations API to download archived GitHub data of CPython. Ernest is helping us get set up with daily backups of CPython repo to be stored within The PSF's infrastructure.


On Thu, Jun 7, 2018 at 11:24 AM, Chris Angelico <> wrote:
On Fri, Jun 8, 2018 at 3:33 AM, Chris Barker - NOAA Federal via
Python-Dev <> wrote:
> Any service could change or fail. Period.
> So we shouldn’t want valuable information about Python development
> only in gitHub.
> I don’t know how hard it is to backup / mirror an entire repo — but it
> sure seems like a good idea.

There are two separate concerns here:

1) How do we get a full copy of all of CPython and its change history?

2) How do we get all the non-code content - issues, pull requests, comments?

The first one is trivially easy. *Everyone* who has a clone of the
repository [1] has a full copy of the code and all history, updated
every time 'git pull' is run.

The second one depends on GitHub's exporting facilities; but it also
depends on a definition of what's important. Maybe the PSF doesn't
care if people's comments at the bottoms of commits are lost (not to
be confused with commit messages themselves, which are part of the
repo proper), so it wouldn't matter if they're lost. Or maybe it's
important to have the contents of such commits, but it's okay to
credit them to an email address rather than linking to an actual
username. Or whatever. Unlike with the code/history repo, an imperfect
export is still of partial value.


[1] Barring shallow clones, but most people don't do those