[Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

Donald Stufft donald at stufft.io
Tue May 28 14:20:25 CEST 2013


On May 28, 2013, at 5:04 AM, Christian Theune <ct at gocept.com> wrote:

> Hi,
> 
> 
> On 27. May2013, at 10:41 PM, Donald Stufft <donald at stufft.io> wrote:
>> Just to assure folks. I do consider Mirroring a first class citizen and an important feature.
> 
> Thanks for that acknowledgement. Lets sort out what to do now - this is becoming urgent for me as the author of the currently recommended mirroring tool for public mirrors and as an operator of a mirror that is being relied upon.
> 
> I agree with Holgers points.
> 
> I don't think the mirroring is completely backwards right now. I agree there's been an incomplete PEP that's been hanging around too long. 
> 
> My current client implementation is pretty simple and has had reliable semantics until now.
> 
> A couple of things I noticed in the discussion that I'd like to point out:
> 
> - We mirror simple pages because the PEP requires us to - this is part of the existing validation approach. I can drop that to get mirrors not to rely on simple pages from the CDN but then authentication of the simple pages will be broken.
> 
> - Release files are replaced all the time.
> 
> The semantics that I like to keep with the mirrors is this:
> 
> When I get a changelog for serial X and I start copying simple pages and files then I (as a mirror) promise my clients that I have incorporated *at least* all changes up until serial X  (but maybe also partial changes from X+n).
> 
> I'm afraid that the mirrors data are now inconsistent - we can repair that once we have a stable mirroring approach again, but until then people will start getting annoyed again. 
> 
> I'm also concerned that I don't really have time to follow up on what's happening with TUF regarding mirroring on top of what happened regarding the CDN. My feeling is that will result in more fire fighting.
> 
> So - what's the next step that can happen ASAP?

Options)

1) When mirroring retain N minutes worth of old serials and redo them. Mirroring is idempotent you can repeat it with no negative side effects.  Conditional HTTP requests should also be supported to minimize the bandwidth.
2) Wait a few seconds after fetching the change log to begin processing.
3) Use front.python.org with the pypi.python.org HOST header with the caveat this is not guaranteed to be stable in the long term.
4) ???

Of them 1) is more likely to give you the best results within the constraints of HTTP. All it takes is someone to run your mirroring script behind a caching proxy and pre-CDN you'd have the exact situation we have now.

Mirroring is in a bad state because it comes (and has always) with absolutely no guarantees of consistency. You dismiss the issues of having serial n+1 changes, but that is a serious problem. If you fetch up to serial N of package1 which has the released version of 1.0, and then you fetch serial N+2 of package2 which has a hard requirement on package 1.1 (which was released in serial N+1) you now have packages that are not installable via your mirror because of inconsistent state.

If someone comes up with a better option that doesn't require a large rearch of the storage code in PyPI I'm happy to review and deploy it.

> 
> Christian
> 
> -- 
> Christian Theune · ct at gocept.com
> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> http://gocept.com · Tel +49 345 1229889-7
> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> 


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130528/b2af0a7f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130528/b2af0a7f/attachment-0001.pgp>


More information about the Distutils-SIG mailing list