[Catalog-sig] PyPI replication project
ianb at colorstudy.com
Fri Oct 10 20:34:33 CEST 2008
Martin v. Löwis wrote:
>> Mirrors help every other packaging system. So it stands to reason
>> that it would help pypi too. I think since many zope people have been
>> using mirrors instead of using pypi directly... pypi has been more
>> available. It's running lots better for other reasons too... but less
>> load is probably also nice for pypi :)
> I'm fine with people operating their own mirrors. I just don't think
> it can be made *invisible* to users that they use a mirror. In the
> mirroring systems for Linux distributions, for example, people have
> to explicitly select which mirror they want to use (and accept that
> the mirror may lag behind by a day or so).
I vaguely remember CPAN doing something like having machine-readable
lists of mirrors, and those lists are available at a couple reliable
locations, and those locations are hardcoded into the tool.
That doesn't speak to how well updated the mirror is, but I think some
Linux distributions have clever solutions to that aspect too.
If some component of the system was built in a push manner (i.e., a
static file), and that file was kept synced between a couple reliable
servers (I don't think it's really important if one of these servers is
a couple seconds out of date), then we'd have something fairly reliable.
So... the static file(s) could be a list of mirrors, and maybe a
last-modified time for the entire system, then you could get a mirror
and ask check against the last-modified of the mirror list to see if the
mirror was fully up-to-date. The problem there is that mirrors might be
out of date, but not in a way you care about (i.e., some package is
uploaded that you don't care about). And there I vaguely remember
someone talking about a more clever algorithm where you could tell if
the mirror was up to date for the packages you care about.
But, if mirrors are pinged about updates, they should really be able to
keep up to date quickly, as most packages are small and new releases
happen at a rate more like every couple hours.
Sorry... this is more speculation than based on actual knowledge, but I
think there are feasible ways to do these things.
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
More information about the Catalog-SIG