[Catalog-sig] Mirror list detection/construction - PEP 381

Paul Nasrat pnasrat at google.com
Wed Jul 21 14:10:50 CEST 2010


I was looking through PEP 381, which gives the following about mirror
list construction for clients:

Clients that are browsing PyPI should be able to use alternative
mirrors, by getting the list of the mirrors using `last.pypi.python.org`.

Code example::

    >>> import socket
    >>> socket.gethostbyname_ex('last.pypi.python.org')[0]
    'h.pypi.python.org'

My reading of this is that the intent is for a client to be able to
resolve this to find the last mirror, eg h,zz, etc. Obviously smart
clients can then use this information to figure closest/fastest mirror
etc.

As documented this is not a robust way to resolve this, on OS X I get
the following:

>>> import socket
>>> socket.gethostbyname_ex('last.pypi.python.org')
('pypi.websushi.org',
 ['last.pypi.python.org', 'd.pypi.python.org'],
 ['88.198.109.79'])

Which resolves to the actual host not the intermediate alias, but the
alias is preserved in the aliaslist. Whilst discussing this on
#distutils we discovered that some resolvers behave quite differently:

Python 2.6.5 on Windows 2008R2
>>> socket.gethostbyname_ex('last.pypi.python.org')
('pypi.websushi.org', [], ['88.198.109.79'])

Given the fragility of this it seems that we might want to consider
alternative mirrorlist discovery mechanism. Talking with Alexis he'd
already implemented mirrorlist construction for distutils2

http://bitbucket.org/ametaireau/distutils2/src/tip/src/distutils2/index/mirrors.py

There should be a reliable way to construct the list for clients.
Mechanisms could be a static page to scrape, a known url with
redirect, DNS SRV records or something else.

Thoughts?

Paul


More information about the Catalog-SIG mailing list