[Catalog-sig] High-availability for PyPI, mirroring infrastructures?

"Martin v. Löwis" martin at v.loewis.de
Mon Aug 11 20:10:38 CEST 2008

> What is the perspective for addressing this issue?

It all depends on the community.

> What is the current recommended way for building a (private) mirror?

The easiest way is to mirror just the index, not the files. To do so,
mirror the simple index, and watch the log file per XML-RPC for changes.

If you also plan to mirror the files, please consider providing a way of
updating the download counters on the central server. This hasn't been
done before, so an API would need to be designed and implemented first.

> How big (in GB) is currently PI?

The mere files are 3.9GiB. Disk space for the index itself depends on
how you store it, as the web pages are all generated. The postgres dump
is 85MiB.

> If a company would be interested to
> provide a mirroring server, how much diskspace (and bandwidth) would be
> needed?

PyPI currently sees roughly 150GiB downloads per month (including
generated content).


More information about the Catalog-SIG mailing list