[Catalog-sig] PyPI mirror selection

"Martin v. Löwis" martin at v.loewis.de
Mon Sep 20 11:09:06 CEST 2010


I started a library that tools can use to select a PyPI mirror, see

https://svn.python.org/packages/trunk/pypi/tools/mirrorlib.py

It currently only deals with mirror selection, but will be extended
to deal with mirror validation and key rollover as well.

For mirror selection, the objective is to find a mirror that is both
fast and current.

1. The caller specifies a maximum acceptable response time
   for an HTTP request (to /last-modified), and a maximum acceptable
   age. The caller can also specify whether a.pypi.python.org should
   be included in the scan or not. Finally, the caller can specify
   a timeout for slow mirrors.

2. The library contacts the mirrors in order, interleaving DNS lookups
   with connecting to the mirrors whose IP addresses have been computed.
   No threads are created in that process.

3. If a mirror is found that meets the requirements, it is returned;
   this might mean that not all mirrors have been contacted.

4. If no mirror is found that meets the requirements, it contacts
   all mirrors. When the slow-mirrors timeout has passed, the youngest
   of all responding mirrors is returned.

5. If all mirrors are slow, the first one responding is returned.

6. If none respond, ValueError is raised (which will happen after
   the TCP connection timeout).

For specific parameters, I found the following defaults useful:
- the acceptable mirror age defaults to 30min. Within this time,
  all mirrors should have synchronized, otherwise, they are
  considered down. The only exception is when the central mirror
  is down, then the mirrors will all age.
- the acceptable response time defaults to 1s - all mirrors
  should be able to respond within this time, and it will then
  use the one that responds first, in enumeration order.
  Specifying 0.1s might also be useful in some applications;
  this will rule out slower mirrors (in particular, GAE).
- the slow mirrors timeout defaults to 5s. If the master is
  down, and some mirror is slow, this will be the time until
  selection completes (with the mirror that claims to have the
  latest copy).
- OTOH, if the master is down, and all mirrors respond to the
  TCP connect quickly (either accepting or refusing the connection),
  then it will quickly pick the newest mirror.

If there are any questions, feel free to ask.

Regards,
Martin




More information about the Catalog-SIG mailing list