[Distutils] Handling Case/Normalization Differences

Donald Stufft donald at stufft.io
Fri Aug 29 00:09:51 CEST 2014


> On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald at stufft.io> wrote:
> 
> Right now the “canonical” page for a particular project on PyPI is whatever the
> author happened to name their package (e.g. Django). This requires PyPI to have
> some "smarts" so that it can redirect things like /simple/django/ to
> /simple/Django/ otherwise someone doing ``pip install django`` would fall back
> to a much worse behavior.
> 
> If this redirect doesn't happen, then pip will issue a request for just
> /simple/ and look for a link that, when both sides are normalized, compares
> equal to the name it's looking for. It will then follow the link, get
> /simple/Django/ and everything works... Except it doesn't. The problem here
> comes from the external link classification that we have now. Pip sees the
> link to /simple/Django/ as an external link (because it lacks the required
> rels) and the installation finally fails.
> 
> The /simple/ case rarely happens when installing from PyPI itself because of
> the redirect, however it happens quite often when someone is attempting to
> instal from a mirror instead. Even when everything works correctly the penality
> for not knowing exactly what name to type in results in at least 1 extra http
> request, one of which (/simple/) requires pulling down a 2.1MB file.
> 
> To fix this I'm going to modify PyPI so that it uses the normalized name in
> the /simple/ URL and redirects everything else to the non-normalized name. I'm
> also going to submit a PR to bandersnatch so that it will use normalized names
> for it's directories and such as well. These two changes will make it so that
> the client side will know ahead of time exactly what form the server expects
> any given name to be in. This will allow a change in pip to happen which
> will pre-normalize all names which will make the interaction with mirrors better
> and will reduce the number of HTTP requests that a single ``pip install`` needs
> to make.
> 
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig


Hm, so here’s the problem.

I have this implemented and deployed to TestPyPI, it works great!

However, the next step is to make the change to bandersnatch so that it saves
things using their normalized name instead of using their "proper" name. Doing
this will trigger it so that everyone using pip 1.5 won't be able to install
anything from that mirror unless it's name is specified as the normalized name
(e.g. ``pip install Django`` will fail without --allow-unverified but
``pip install django`` will work). This would be fixed with pip 1.6 (since
it would know to "normalize" the name before fetching the URL).

The same thing will occur if we make the change in pip first, it would
normalize names so you'd need to use --allow-unverified for everything because
it would act as if you typed ``pip install django`` instead of ``pip install
Django``.

To my knowledge, this *only* will affect pip 1.5.x.

So the only way forward I can see to make this change, which I think is a good
change and will remove a big "gotcha" from using a mirror, is to coordinate
a release of bandersnatch that coincides with pip 1.6, and tell people they
need to upgrade in lockstep.

Does anyone have any other ideas?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140828/279a0cd4/attachment.html>


More information about the Distutils-SIG mailing list