[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability

M.-A. Lemburg mal at egenix.com
Tue Jun 15 23:03:29 CEST 2010

"Martin v. Löwis" wrote:
>> PyPI itself has in recent months been mostly maintained by one
>> developer: Martin von Loewis.  Projects are underway to enhance PyPI
>> in various ways, including a proposal to add external mirroring (PEP
>> 381), but these are all far from being finalized or implemented.
> That's not at all accurate: PEP 381 is almost completely implemented
> in the mirroring tools. 

Which parts of PEP 381 are implemented ?

> Client-side support is missing, but isn't
> strictly necessary as users could manually point their setuptools
> installation to a mirror.

That's not a good argument. Users like setuptools because
they can run: "easy_install stuff" and let it do whatever it
needs to do.

It's important not to require changes on the client side.

>> While the /simple package listing is currently dynamically created
>> from the database in real-time, this is not really needed for normal
>> operation. A static copy created every 10-20 minutes would provide the
>> same level of service in much the same way.
> For normal operation (i.e. on the master copy), this would be really
> insufficient. Users expect, in automated build processes, that the
> packages they upload are available for *immediate* download.

Power users and developers will probably want that, but those
can hook up to the PyPI server directly if they have such a

For the majority, waiting 10-20 minutes should be fine.

Note that the push idea is part of the plan, but won't happen
in the initial rollout.

>> Under the proposal the static information stored in PyPI
>> (meta-information as well as package download files and documentation)
>> is moved to a content delivery network (CDN).
> There is a good chance that, before that proposal is implemented,
> the PEP 381 implementation is completed.

Including getting all client side package tools updated and
deployed to the existing users ?

>> At the same intervals, another script will scan the package and
>> documentation files under /packages for updates and upload any changes
>> to the CDN for neartime availability.
> Not sure why you wouldn't push every change immediately to the CDN, though.

The proposal wants to do without changing PyPI code where
possible. This is planned for a later release. If this can
be had without any major changes, we can also add it to phase

>> Cloudfront itself has been around since Nov 2008.
> Please add that Amazon considers Cloudfront as a beta service.

I don't think that makes a difference. The "beta" term is
a web 2.0 marketing term, nothing more. But I'll add it anyway.

>> The pypi.python.org domain would then have to be setup to map to
>> multiple IP addresses via DNS round-robin, one entry for each
>> redirection server, e.g.
>>   pypi.python.org. IN A
>>   pypi.python.org. IN A
>>   pypi.python.org. IN A
>>   pypi.python.org. IN A
> I don't think this works if one of the servers fails (or, worse,
> produces a hanging connection). What piece of software would implement
> the fallback to the next machine?

AFAIK, the package tools don't currently implement any kind of fail-

While this would be good to have and provide a better
user experience, it's not required. The user would just need
to restart the command and then get a new server IP address
to try - just like you do in a web browser if a page doesn't
load. That's still a lot better than not being able to download
anything at all.

The alternative would be a proxy setup, which then again introduces
a single point of failure (unless you setup a HA cluster).

The mirror PEP shares this problem with the cloud proposal.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 15 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK                33 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list