[Catalog-sig] PyPI replication project

Ian Bicking ianb at colorstudy.com
Fri Oct 10 20:34:33 CEST 2008


Martin v. Löwis wrote:
>> Mirrors help every other packaging system.  So it stands to reason
>> that it would help pypi too.  I think since many zope people have been
>> using mirrors instead of using pypi directly... pypi has been more
>> available.  It's running lots better for other reasons too... but less
>> load is probably also nice for pypi :)
> 
> I'm fine with people operating their own mirrors. I just don't think
> it can be made *invisible* to users that they use a mirror. In the
> mirroring systems for Linux distributions, for example, people have
> to explicitly select which mirror they want to use (and accept that
> the mirror may lag behind by a day or so).

I vaguely remember CPAN doing something like having machine-readable 
lists of mirrors, and those lists are available at a couple reliable 
locations, and those locations are hardcoded into the tool.

That doesn't speak to how well updated the mirror is, but I think some 
Linux distributions have clever solutions to that aspect too.

If some component of the system was built in a push manner (i.e., a 
static file), and that file was kept synced between a couple reliable 
servers (I don't think it's really important if one of these servers is 
a couple seconds out of date), then we'd have something fairly reliable. 
  So... the static file(s) could be a list of mirrors, and maybe a 
last-modified time for the entire system, then you could get a mirror 
and ask check against the last-modified of the mirror list to see if the 
mirror was fully up-to-date.  The problem there is that mirrors might be 
out of date, but not in a way you care about (i.e., some package is 
uploaded that you don't care about).  And there I vaguely remember 
someone talking about a more clever algorithm where you could tell if 
the mirror was up to date for the packages you care about.

But, if mirrors are pinged about updates, they should really be able to 
keep up to date quickly, as most packages are small and new releases 
happen at a rate more like every couple hours.

Sorry... this is more speculation than based on actual knowledge, but I 
think there are feasible ways to do these things.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org


More information about the Catalog-SIG mailing list