On Aug 28, 2014, at 6:09 PM, Donald Stufft <donald@stufft.io> wrote:


On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald@stufft.io> wrote:

Right now the “canonical” page for a particular project on PyPI is whatever the
author happened to name their package (e.g. Django). This requires PyPI to have
some "smarts" so that it can redirect things like /simple/django/ to
/simple/Django/ otherwise someone doing ``pip install django`` would fall back
to a much worse behavior.

If this redirect doesn't happen, then pip will issue a request for just
/simple/ and look for a link that, when both sides are normalized, compares
equal to the name it's looking for. It will then follow the link, get
/simple/Django/ and everything works... Except it doesn't. The problem here
comes from the external link classification that we have now. Pip sees the
link to /simple/Django/ as an external link (because it lacks the required
rels) and the installation finally fails.

The /simple/ case rarely happens when installing from PyPI itself because of
the redirect, however it happens quite often when someone is attempting to
instal from a mirror instead. Even when everything works correctly the penality
for not knowing exactly what name to type in results in at least 1 extra http
request, one of which (/simple/) requires pulling down a 2.1MB file.

To fix this I'm going to modify PyPI so that it uses the normalized name in
the /simple/ URL and redirects everything else to the non-normalized name. I'm
also going to submit a PR to bandersnatch so that it will use normalized names
for it's directories and such as well. These two changes will make it so that
the client side will know ahead of time exactly what form the server expects
any given name to be in. This will allow a change in pip to happen which
will pre-normalize all names which will make the interaction with mirrors better
and will reduce the number of HTTP requests that a single ``pip install`` needs
to make.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Hm, so here’s the problem.

I have this implemented and deployed to TestPyPI, it works great!

However, the next step is to make the change to bandersnatch so that it saves
things using their normalized name instead of using their "proper" name. Doing
this will trigger it so that everyone using pip 1.5 won't be able to install
anything from that mirror unless it's name is specified as the normalized name
(e.g. ``pip install Django`` will fail without --allow-unverified but
``pip install django`` will work). This would be fixed with pip 1.6 (since
it would know to "normalize" the name before fetching the URL).

The same thing will occur if we make the change in pip first, it would
normalize names so you'd need to use --allow-unverified for everything because
it would act as if you typed ``pip install django`` instead of ``pip install
Django``.

To my knowledge, this *only* will affect pip 1.5.x.

So the only way forward I can see to make this change, which I think is a good
change and will remove a big "gotcha" from using a mirror, is to coordinate
a release of bandersnatch that coincides with pip 1.6, and tell people they
need to upgrade in lockstep.

Does anyone have any other ideas?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Just thought of this, if the normalized name doesn’t match the "real" name,
then add entries for both. This will make it so that pip 1.5 continues to work
and pip 1.6+.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA