[Distutils] The Simple API - What URLs are "supported"

Donald Stufft donald at stufft.io
Thu Sep 18 02:59:41 CEST 2014


Right now pip (and originally setuptools, which does it as well) will do this
sort of dance when looking for things on the PyPI simple index. This isn't the
actual code though:

    thing_to_install = "foo==1.0"
    page = None

    if thing_to_install.contains("=="):  # First look at a versioned url if ==
        page = request_url(
            "https://pypi.python.org/simple/" + thing_to_install.name
            + "/" + thing_to_install.version
        )

    if not page:  # If we don't have something, look for unversioned
        page = request_url(
            "https://pypi.python.org/simple/" + thing_to_install.name
        )

    if not page:  # Finally, look at the /simple/ index itself
        page = request_url("https://pypi.python.org/simple/")

    # From here, look at the page to discover things.


As far as I can tell a lot of this is largely historical.

The /simple/{name}/{verison}/ pages come from a time when there wasn't a simple
index I think and sometimes packages would need to go to /pypi/foo/version/
in order to actually get a list of things. However we now do have the simple
API and AFAICT the /simple/ API does not nor has ever had a reponse for
/simple/{name}/{version}/. This always 404's and falls back to the
/simple/{name}/. I would like to consider this URL unsupported in pip and remove
checking for it. It will reduce the number of needless HTTP requests by one
per pinned version.

Does anyone know anything this will break? 

The other thing that happens is if the /simple/{name}/ thing 404's it'll
fallback to /simple/. This is done so that if someone mistypes a name in a way
that is still considered equivilant after normalization is applied, instead
of a 404 they get the /simple/ page and the tooling can discover the name from
there.

If you remember back a little while ago I changed PyPI so that it considered
the normalized form of of the name the "cannonical" name for the simple index,
this means that tooling will be able to know ahead of time if a project called
say "Django" should be requested with /simple/Django/ or /simple/django/.

What I would like to do now is remove the fallback to /simple/. If we fall back
to that it is a 2.1MB download that occurs which is a fairly big deal and can
slow down a ``pip install`` quite signifcantly. I have a PR against
bandersnatch which will make bandersnatch generate a /simple/{name}/ URL where
name is the normalized form, and another PR against pip which will cause it
to always request the normalized form. When both of these land it would mean
that the only time pip will fallback to /simple/ is:

1. If someone is using a *new* pip (with the PR merged) but with a mirror that
   doesn't support the normalized form of the URLs (old bandersnatch,
   pep381client, maybe others?)
2. If someone typed ``pip install <foo>`` where <foo> is a thing that doesn't
   actually exist.

Does anyone have any complaints if pip stopped falling back to /simple/ once
the bandersnatch PR is merged and released?

Further more does anyone have any problems with narrowing the "supported URLS"
of the "simple API" to /simple/ and /simple/<normalized-name>/ and make
fetching /simple/ optional?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140917/6ce4227c/attachment-0001.html>


More information about the Distutils-SIG mailing list