[Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

holger krekel holger at merlinux.eu
Tue May 28 16:36:20 CEST 2013


On Tue, May 28, 2013 at 10:23 -0400, Donald Stufft wrote:
> On May 28, 2013, at 8:20 AM, Donald Stufft <donald at stufft.io> wrote:
> 
> > 
> > On May 28, 2013, at 5:04 AM, Christian Theune <ct at gocept.com> wrote:
> > 
> >> Hi,
> >> 
> >> 
> >> On 27. May2013, at 10:41 PM, Donald Stufft <donald at stufft.io> wrote:
> >>> Just to assure folks. I do consider Mirroring a first class citizen and an important feature.
> >> 
> >> Thanks for that acknowledgement. Lets sort out what to do now - this is becoming urgent for me as the author of the currently recommended mirroring tool for public mirrors and as an operator of a mirror that is being relied upon.
> >> 
> >> I agree with Holgers points.
> >> 
> >> I don't think the mirroring is completely backwards right now. I agree there's been an incomplete PEP that's been hanging around too long. 
> >> 
> >> My current client implementation is pretty simple and has had reliable semantics until now.
> >> 
> >> A couple of things I noticed in the discussion that I'd like to point out:
> >> 
> >> - We mirror simple pages because the PEP requires us to - this is part of the existing validation approach. I can drop that to get mirrors not to rely on simple pages from the CDN but then authentication of the simple pages will be broken.
> >> 
> >> - Release files are replaced all the time.
> >> 
> >> The semantics that I like to keep with the mirrors is this:
> >> 
> >> When I get a changelog for serial X and I start copying simple pages and files then I (as a mirror) promise my clients that I have incorporated *at least* all changes up until serial X  (but maybe also partial changes from X+n).
> >> 
> >> I'm afraid that the mirrors data are now inconsistent - we can repair that once we have a stable mirroring approach again, but until then people will start getting annoyed again. 
> >> 
> >> I'm also concerned that I don't really have time to follow up on what's happening with TUF regarding mirroring on top of what happened regarding the CDN. My feeling is that will result in more fire fighting.
> >> 
> >> So - what's the next step that can happen ASAP?
> > 
> > Options)
> > 
> > 1) When mirroring retain N minutes worth of old serials and redo them. Mirroring is idempotent you can repeat it with no negative side effects.  Conditional HTTP requests should also be supported to minimize the bandwidth.
> > 2) Wait a few seconds after fetching the change log to begin processing.
> > 3) Use front.python.org with the pypi.python.org HOST header with the caveat this is not guaranteed to be stable in the long term.
> > 4) ???
> > 
> 
> Option 4: We add the expected hash of the simple page to the change log. Mirror clients can then assert their state consistent.
> 
> Should also probably assert the file hashes that are in the simple index. 

yes, i also thought of option 4.  Is that easy to implement on the side of pypi?
If we checksum the simple-page, we need idem-potent generation of simple pages
and ordering to begin with -- which is probably anyway a good idea.  
It doesn't need to be version-ordering, just some consistent ordering.

As mentioned in the other mail, for the short-term i'd go for 3) once Noah
and you confirm you are not going to kill it before we have settled on
a new solution (maybe option 4). 

best,
holger


> > Of them 1) is more likely to give you the best resultshttp://mail.python.org/pipermail/distutils-sig/2013-May/020855.html the constraints of HTTP. All it takes is someone to run your mirroring script behind a caching proxy and pre-CDN you'd have the exact situation we have now.
> > 
> > Mirroring is in a bad state because it comes (and has always) with absolutely no guarantees of consistency. You dismiss the issues of having serial n+1 changes, but that is a serious problem. If you fetch up to serial N of package1 which has the released version of 1.0, and then you fetch serial N+2 of package2 which has a hard requirement on package 1.1 (which was released in serial N+1) you now have packages that are not installable via your mirror because of inconsistent state.
> > 
> > If someone comes up with a better option that doesn't require a large rearch of the storage code in PyPI I'm happy to review and deploy it.
> > 
> >> 
> >> Christian
> >> 
> >> -- 
> >> Christian Theune · ct at gocept.com
> >> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> >> http://gocept.com · Tel +49 345 1229889-7
> >> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> > 
> > 
> > -----------------
> > Donald Stufft
> > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> > 
> > _______________________________________________
> > Distutils-SIG maillist  -  Distutils-SIG at python.org
> > http://mail.python.org/mailman/listinfo/distutils-sig

> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig



More information about the Distutils-SIG mailing list