[Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)
holger krekel
holger at merlinux.eu
Mon May 27 20:54:53 CEST 2013
On Mon, May 27, 2013 at 13:50 -0400, Donald Stufft wrote:
> On May 27, 2013, at 12:39 PM, Donald Stufft <donald at stufft.io> wrote:
>
> >
> > On May 27, 2013, at 8:08 AM, holger krekel <holger at merlinux.eu> wrote:
> >
> >> Hi Noah, Donald, (CC also Richard, Christian),
> >>
> >> i just checked with a test package and think we might have a cache
> >> consistency / changelog API problem. It took me a while but here is
> >> the basic thing: I uploaded a test package, changelog API reports it has
> >> changed, then i go to its simple page, and some of the time the new release
> >> file shows up, sometimes not.
> >>
> >> Tools like bandersnatch, pep381 and devpi-server (and probably others)
> >> use PyPI's changelog API to determine if there are changes. It seems
> >> those changes are signalled faster than they become consistently accessible
> >> through the CDN. This can lead to inconsistent mirrors because when
> >> the CDN has the files there is no change event anymore. Such mirrors
> >> are run by companies in-house so i think it's a real problem.
> >>
> >> Even without mirroring there can be problems because installs are not
> >> directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1,
> >> then 2.0.0 a minute later. I had hoped that a particular ip address
> >> sees things consistently.
> >>
> >> I am not familiar with Fastly's caching properties -- can they notify
> >> about the fact that a page/file is consistently up-to-date everywhere?
> >> Or can the cache be globally invalidated for a particular page/file?
> >> Any other ideas?
> >>
> >> Failing customizing Fastly usage and also maybe for the short term,
> >> is/could there be a special location provided by pypi.python.org which
> >> the above tools could use to get at the actual non-cached data? We
> >> could then maybe mitigate the problem through updates of the respective tools.
> >> That would at least solve the problem for one of my customers i think.
> >>
> >> best,
> >> holger
> >>
> >>
> >> On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote:
> >>> </farnsworth>
> >>>
> >>> but seriously, at long last today it was my honor to throw the DNS switch to move PyPI to the Fastly caching CDN. I would like to thank Donald Stufft for doing much of the heavy lifting on the PyPI side, and to Fastly for graciously offering to host us. What does this mean for everyone? Well the biggest change is PyPI should get a whole lot faster. There are two major downsides however. There will now be a delay of several minutes in some cases between updating a package and having it be installable, and download counts will now be even more incorrect than they were before. The PyPI admins are discussing what to do about download counts long-term, but for now we all feel that the performance and availability benefits outweigh the loss. If anyone has any questions, or hears anything about issues with PyPI please don't hesitate to contact me.
> >>>
> >>> --Noah
> >>>
> >>
> >>
> >>
> >>> _______________________________________________
> >>> Distutils-SIG maillist - Distutils-SIG at python.org
> >>> http://mail.python.org/mailman/listinfo/distutils-sig
> >>
> >> _______________________________________________
> >> Distutils-SIG maillist - Distutils-SIG at python.org
> >> http://mail.python.org/mailman/listinfo/distutils-sig
> >
> > I mentioned it on twitter but might as well mention it here as well.
> >
> > Currently there is no invalidation going on. The effect on the mirroring was unanticipated and I'm currently getting the invalidation API setup within PyPI.
> >
> > -----------------
> > Donald Stufft
> > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> >
> > _______________________________________________
> > Distutils-SIG maillist - Distutils-SIG at python.org
> > http://mail.python.org/mailman/listinfo/distutils-sig
>
>
>
> /simple/ Pages should now be immediately invalidated when a new package is released.
thanks Donald. Looking at the implementation, i wonder what happens if
after ``self._conn.commit()`` a changelog API call arrives, returns changes
and a client uses it to retrieve changes before the fastly-purging takes
place. It's still a potential race-condition or am i missing something?
best,
holger
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
More information about the Distutils-SIG
mailing list