[Catalog-sig] PyPI mirror selection
"Martin v. Löwis"
martin at v.loewis.de
Mon Sep 20 11:09:06 CEST 2010
I started a library that tools can use to select a PyPI mirror, see
It currently only deals with mirror selection, but will be extended
to deal with mirror validation and key rollover as well.
For mirror selection, the objective is to find a mirror that is both
fast and current.
1. The caller specifies a maximum acceptable response time
for an HTTP request (to /last-modified), and a maximum acceptable
age. The caller can also specify whether a.pypi.python.org should
be included in the scan or not. Finally, the caller can specify
a timeout for slow mirrors.
2. The library contacts the mirrors in order, interleaving DNS lookups
with connecting to the mirrors whose IP addresses have been computed.
No threads are created in that process.
3. If a mirror is found that meets the requirements, it is returned;
this might mean that not all mirrors have been contacted.
4. If no mirror is found that meets the requirements, it contacts
all mirrors. When the slow-mirrors timeout has passed, the youngest
of all responding mirrors is returned.
5. If all mirrors are slow, the first one responding is returned.
6. If none respond, ValueError is raised (which will happen after
the TCP connection timeout).
For specific parameters, I found the following defaults useful:
- the acceptable mirror age defaults to 30min. Within this time,
all mirrors should have synchronized, otherwise, they are
considered down. The only exception is when the central mirror
is down, then the mirrors will all age.
- the acceptable response time defaults to 1s - all mirrors
should be able to respond within this time, and it will then
use the one that responds first, in enumeration order.
Specifying 0.1s might also be useful in some applications;
this will rule out slower mirrors (in particular, GAE).
- the slow mirrors timeout defaults to 5s. If the master is
down, and some mirror is slow, this will be the time until
selection completes (with the mirror that claims to have the
- OTOH, if the master is down, and all mirrors respond to the
TCP connect quickly (either accepting or refusing the connection),
then it will quickly pick the newest mirror.
If there are any questions, feel free to ask.
More information about the Catalog-SIG