[Distutils] Indexing modules in Python distributions

Nick Coghlan ncoghlan at gmail.com
Thu Feb 9 09:20:19 EST 2017

On 8 February 2017 at 19:14, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> What I'm proposing differs in that it would need to download files from PyPI
> - basically all of them, if we're thorough about it. I imagine that's going
> to involve a lot of data transfer. Do we know what order of magnitude we're
> talking about? Is it so large that we should be thinking of running the
> scanner in the same data centre as the file storage?

Last time I asked Donald about doing things like this, he noted that a
full mirror is ~215 GiB. That was a year or two ago so I assume the
number has gone up since then, but it should still be in the same
order of magnitude.

>From an ecosystem resilience point of view, there's also a lot to be
said for having copies of the full PyPI bulk artifact store in both
AWS S3 (which is where the production PyPI data lives) and in Azure :)


