[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability

M.-A. Lemburg mal at egenix.com
Tue Jun 15 19:34:42 CEST 2010

Tarek Ziadé wrote:
> On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Alexis Métaireau wrote:
>>> Hello,
>>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381
>>> about making a mirroring infrastructure.
>>> Having a infrastructure hosted on a cloud platform may be confortable, and
>>> probably needed to have a 24/7 running system, but
>>> we need to take care of letting possible the creation of new public mirrors,
>>> outside from the Amazon (or whatever) cloud infrastructure.
>> The proposal doesn't prevent that. However, please note that
>> setting up public mirrors not under PSF control has its own
>> set of (legal) problems, which the PSF hosted cloud setup avoids.
> Mirrors already exists out there, so unless you ban them (which would
> be a really bad idea)
> setting up a cloud will not fix any legal issue if you think there's a
> legal issue.
> In any case, you can't prevent people from creating mirrors even if you
> would say its illegal. Moreover, having mirrors provided by the community
> is way better than relying on one single entity (the PSF) for this.
> (if we think "decentralized")
> So I think it would be better to focus on PEP 381, and make those
> existing mirrors comply with it. And maybe work on the legal issues
> you've mentioned

That can all happen in parallel.

>>> On Tue, Jun 15, 2010 at 1:49 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> PyPI is currently run from a single server hosted in The Netherlands
>>>> (ximinez.python.org).  This server is run by a very small team of sys
>>>> admin.
>>> As Martin von Löwis said, this already exists. "a.mirrors.pypi.python.org
>>>  and b.mirrors.pypi.python.org are already there and could be used by
>>> clients". Maybe Martin can you explain us (apologies if this is already done
>>> somewhere) how things are working from now ? Is this possible to rely on the
>>> existing work rather than using a cloud system ? What's the in place
>>> infrastructure ?
>> In order to use those two servers, you'd still need to implement
>> the redirection changes or client side tool changes and, what's
>> more important, you'd need to administer and monitor those servers
>> 24/7 to achieve similar uptime.
> Not at all because the registered mirrors would be in the DNS round robin,
> and the clients would just have to switch to another mirror if a mirror
> is down. (that's explained in PEP 381)

Someone would still have to provide the system administration for
those servers and also make sure that the servers do actually provide
up-to-date snapshots. DNS round-robin will help with finding the
servers, not with the other aspects.

Something the PEP should focus a bit more on is the freshness
guarantee of the mirror data. It currently puts this
important detail into the hands of the client software,
so every package tool will have to find it's own way of
determining whether to use a mirror or not.

Another important feature missing from the PEP is data consistency.
Since a client tool would only communicate with one mirror, it
will ultimately have to trust the information on that server,
including the MD5 sums. This makes it rather easy to manipulate
data on the servers (not by the admins, but by hackers manipulating
those servers).

Having digitally signed packages, like you do on many Linux repository
servers, would solve this issue, but also require a complete verification
infrastructure on the client side.

You don't need any of this with the cloud caching approach.

> Such a decentralized system is far more reliable than any centralized
> system, and won't cost anything to the PSF.

We'll see :-)

>> The latter is what the proposal is all about: we're outsourcing
>> the administration and monitoring to a service provider.
> Having a better PyPI server is of course a good idea, don't get me wrong.
> But it doesn't really solve anything at this point.

Obviously I have a different opinion, otherwise I wouldn't have
written the proposal :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 15 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK                33 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list