[Distutils] Fwd: The state of PyPI

Jim Fulton jim at zope.com
Tue Sep 27 17:35:49 CEST 2011

On Tue, Sep 27, 2011 at 8:40 AM, Tarek Ziadé <ziade.tarek at gmail.com> wrote:
> On Tue, Sep 27, 2011 at 2:27 PM, Jim Fulton <jim at zope.com> wrote:
> ...
>> I understand where you're coming from but, ..
> Sorry, I don't understand what you imply here.

I understand why you don't want to rely on a proprietary solution.

>> I think it's saner to rely on proven technology
>> than to invent our own protocol. NIH?
> Ah sorry I misunderstood then. I thought CloudFront was a proprietary
> platform, with its own protocol.

It's a reverse proxy.  You point it at s3 and at a web server and it caches.
Of course, it has aspects that are specific to it's implementation.

> If you're saying that we can move away from CloudFront at any time and
> have the same feature elsewhere, then it's perfect.

If we move to something else, *some* changes will be necessary,
but we can certainly move. I agree, it's perfect. ;)

> If you're saying that CloudFront is proven technology and that we
> should not worry about relying on them, then I think we can do better
> for the community to get locked-in for this, and continue to work on
> an open protocol where everyone can participate by providing a spare
> server.  But maybe that's just me ?

It's nice to have a hobby. :)

But I don't want to have to update buildout *just* because of an itch
to have a custom protocol.

> Most of the mirroring protocol was inspired by Perl's CPAN btw.


> But the use case is usually: PyPI is down, we fallback to a mirror. I
> don't think it's more complicated than this.

I don't agree.  On multiple levels.  PYPI is often up but slow.
It's also in the wrong place.  A CDN should provide better performance,
reliability and locality.

A client has to:

- try pypi
- fallback to "last"
- If that's down, decide what other indexes to check

I don't see how having timestamps help unless you know
what the current timestamp is, unless you say that you'll reject
a mirror with a timestamp more than some period in the past.
It's not clear what this time delta should be and, in any case,
the client needs to first validate a mirror by checking it's timestamp.

I think this protocol is going to be hard to get right.

>> - It either requires extra dns calls or relies to heavily on the last
>> mirror, which is probably likely
>>  to be the least reliable.
> Once you have the list, I don't think you require extra call.
> see http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py

It has to make extra dns calls to resolve the other mirror names to ips.

>> Life is short. We don't have to invent this ourselves.
> Ah well, yeah -- Not sure what you are proposing right now.
> If you imply that everything should be solved on server-side, and that
> we should not have mirroring

I think we should pick a good CDN and use it.


Jim Fulton

More information about the Distutils-SIG mailing list