[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability

Tarek Ziadé ziade.tarek at gmail.com
Tue Jun 15 20:38:58 CEST 2010

On Tue, Jun 15, 2010 at 7:34 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> So I think it would be better to focus on PEP 381, and make those
>> existing mirrors comply with it. And maybe work on the legal issues
>> you've mentioned
> That can all happen in parallel.

I really doubt it.

You have come with a cloud proposal and want it to be funded by the PSF.

Your proposal is basically a proprietary mirroring system, and it competes
with the mirroring protocol we wanted to build, based on the existing
mirrors the community has.

So far I don't see any advantage in a cloud-based mirror managed by the PSF,
compared to a round of community mirrors.

Given the lack of time and resources we had to finish the work, this
means that if your proposal
is accepted, it will be done whereas PEP 381 will stay as it is today.

So if you want this to happen in parralell, a funding should also be granted
to build the implementation of PEP 381 (in z3c.pypimirror I guess)

>> Not at all because the registered mirrors would be in the DNS round robin,
>> and the clients would just have to switch to another mirror if a mirror
>> is down. (that's explained in PEP 381)
> Someone would still have to provide the system administration for
> those servers and also make sure that the servers do actually provide
> up-to-date snapshots. DNS round-robin will help with finding the
> servers, not with the other aspects.
> Something the PEP should focus a bit more on is the freshness
> guarantee of the mirror data. It currently puts this
> important detail into the hands of the client software,
> so every package tool will have to find it's own way of
> determining whether to use a mirror or not.
> Another important feature missing from the PEP is data consistency.
> Since a client tool would only communicate with one mirror, it
> will ultimately have to trust the information on that server,
> including the MD5 sums. This makes it rather easy to manipulate
> data on the servers (not by the admins, but by hackers manipulating
> those servers).

Your PyPI cloud infrastructure be hacked as well.

The mirrors are trusted, because they are registered manually and
they are managed by people in the community, we trust.

>> Such a decentralized system is far more reliable than any centralized
>> system, and won't cost anything to the PSF.
> We'll see :-)

Hehe, not sure what you mean here. Did the PSF voted yes on your
proposal already ?  ;)

>>> The latter is what the proposal is all about: we're outsourcing
>>> the administration and monitoring to a service provider.
>> Having a better PyPI server is of course a good idea, don't get me wrong.
>> But it doesn't really solve anything at this point.
> Obviously I have a different opinion, otherwise I wouldn't have
> written the proposal :-)

Well technically the problem is already solved by the existing mirrors we have
in the community: when PyPI is down, other servers can take the relay.

I have no doubt you can enhance the PyPI main server
and make its uptime approaching 100% by putting money and time.

But having a documented protocol and a library anyone who has a spare
server can use to provide a mirror will always beat your cloud system for the
reasons I've already mentioned.

Putting all the eggs in the same basket (PSF+Amazon?) can't be as reliable
as a distributed networks of mirrors


Tarek Ziadé | http://ziade.org

More information about the Catalog-SIG mailing list