[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability

"Martin v. Löwis" martin at v.loewis.de
Tue Jun 15 21:46:06 CEST 2010

> PyPI itself has in recent months been mostly maintained by one
> developer: Martin von Loewis.  Projects are underway to enhance PyPI
> in various ways, including a proposal to add external mirroring (PEP
> 381), but these are all far from being finalized or implemented.

That's not at all accurate: PEP 381 is almost completely implemented
in the mirroring tools. Client-side support is missing, but isn't
strictly necessary as users could manually point their setuptools
installation to a mirror.

> While the /simple package listing is currently dynamically created
> from the database in real-time, this is not really needed for normal
> operation. A static copy created every 10-20 minutes would provide the
> same level of service in much the same way.

For normal operation (i.e. on the master copy), this would be really 
insufficient. Users expect, in automated build processes, that the 
packages they upload are available for *immediate* download.

> Under the proposal the static information stored in PyPI
> (meta-information as well as package download files and documentation)
> is moved to a content delivery network (CDN).

There is a good chance that, before that proposal is implemented,
the PEP 381 implementation is completed.

> At the same intervals, another script will scan the package and
> documentation files under /packages for updates and upload any changes
> to the CDN for neartime availability.

Not sure why you wouldn't push every change immediately to the CDN, though.

> Cloudfront itself has been around since Nov 2008.

Please add that Amazon considers Cloudfront as a beta service.

> The pypi.python.org domain would then have to be setup to map to
> multiple IP addresses via DNS round-robin, one entry for each
> redirection server, e.g.
>   pypi.python.org. IN A
>   pypi.python.org. IN A
>   pypi.python.org. IN A
>   pypi.python.org. IN A

I don't think this works if one of the servers fails (or, worse,
produces a hanging connection). What piece of software would implement 
the fallback to the next machine?


More information about the Catalog-SIG mailing list