Handling Case/Normalization Differences

Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior. If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails. The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file. To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald@stufft.io> wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Hm, so here’s the problem. I have this implemented and deployed to TestPyPI, it works great! However, the next step is to make the change to bandersnatch so that it saves things using their normalized name instead of using their "proper" name. Doing this will trigger it so that everyone using pip 1.5 won't be able to install anything from that mirror unless it's name is specified as the normalized name (e.g. ``pip install Django`` will fail without --allow-unverified but ``pip install django`` will work). This would be fixed with pip 1.6 (since it would know to "normalize" the name before fetching the URL). The same thing will occur if we make the change in pip first, it would normalize names so you'd need to use --allow-unverified for everything because it would act as if you typed ``pip install django`` instead of ``pip install Django``. To my knowledge, this *only* will affect pip 1.5.x. So the only way forward I can see to make this change, which I think is a good change and will remove a big "gotcha" from using a mirror, is to coordinate a release of bandersnatch that coincides with pip 1.6, and tell people they need to upgrade in lockstep. Does anyone have any other ideas? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Aug 28, 2014, at 6:09 PM, Donald Stufft <donald@stufft.io> wrote:
On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org <mailto:Distutils-SIG@python.org> https://mail.python.org/mailman/listinfo/distutils-sig <https://mail.python.org/mailman/listinfo/distutils-sig>
Hm, so here’s the problem.
I have this implemented and deployed to TestPyPI, it works great!
However, the next step is to make the change to bandersnatch so that it saves things using their normalized name instead of using their "proper" name. Doing this will trigger it so that everyone using pip 1.5 won't be able to install anything from that mirror unless it's name is specified as the normalized name (e.g. ``pip install Django`` will fail without --allow-unverified but ``pip install django`` will work). This would be fixed with pip 1.6 (since it would know to "normalize" the name before fetching the URL).
The same thing will occur if we make the change in pip first, it would normalize names so you'd need to use --allow-unverified for everything because it would act as if you typed ``pip install django`` instead of ``pip install Django``.
To my knowledge, this *only* will affect pip 1.5.x.
So the only way forward I can see to make this change, which I think is a good change and will remove a big "gotcha" from using a mirror, is to coordinate a release of bandersnatch that coincides with pip 1.6, and tell people they need to upgrade in lockstep.
Does anyone have any other ideas?
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Just thought of this, if the normalized name doesn’t match the "real" name, then add entries for both. This will make it so that pip 1.5 continues to work and pip 1.6+. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Naive question- does pip send over a UserAgent (or something) that contains a version number the server can use to determine which behavior to default to? That would allow a deprecation cycle of N months or so that will let people upgrade from 1.5 to 1.6. We could then watch usage of 1.5 decrease over time until it's a non-factor. On Thu, Aug 28, 2014 at 3:26 PM, Donald Stufft <donald@stufft.io> wrote:
On Aug 28, 2014, at 6:09 PM, Donald Stufft <donald@stufft.io> wrote:
On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald@stufft.io> wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Hm, so here’s the problem.
I have this implemented and deployed to TestPyPI, it works great!
However, the next step is to make the change to bandersnatch so that it saves things using their normalized name instead of using their "proper" name. Doing this will trigger it so that everyone using pip 1.5 won't be able to install anything from that mirror unless it's name is specified as the normalized name (e.g. ``pip install Django`` will fail without --allow-unverified but ``pip install django`` will work). This would be fixed with pip 1.6 (since it would know to "normalize" the name before fetching the URL).
The same thing will occur if we make the change in pip first, it would normalize names so you'd need to use --allow-unverified for everything because it would act as if you typed ``pip install django`` instead of ``pip install Django``.
To my knowledge, this *only* will affect pip 1.5.x.
So the only way forward I can see to make this change, which I think is a good change and will remove a big "gotcha" from using a mirror, is to coordinate a release of bandersnatch that coincides with pip 1.6, and tell people they need to upgrade in lockstep.
Does anyone have any other ideas?
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Just thought of this, if the normalized name doesn’t match the "real" name, then add entries for both. This will make it so that pip 1.5 continues to work and pip 1.6+.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Since pip 1.4 it does yes, however the problem here is that typically bandersnatch mirrors are simply hosted by plain static web servers and don’t require any sort of runtime logic.
On Aug 28, 2014, at 6:39 PM, Joe Smith <yasumoto7@gmail.com> wrote:
Naive question- does pip send over a UserAgent (or something) that contains a version number the server can use to determine which behavior to default to?
That would allow a deprecation cycle of N months or so that will let people upgrade from 1.5 to 1.6. We could then watch usage of 1.5 decrease over time until it's a non-factor.
On Thu, Aug 28, 2014 at 3:26 PM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
On Aug 28, 2014, at 6:09 PM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org <mailto:Distutils-SIG@python.org> https://mail.python.org/mailman/listinfo/distutils-sig <https://mail.python.org/mailman/listinfo/distutils-sig>
Hm, so here’s the problem.
I have this implemented and deployed to TestPyPI, it works great!
However, the next step is to make the change to bandersnatch so that it saves things using their normalized name instead of using their "proper" name. Doing this will trigger it so that everyone using pip 1.5 won't be able to install anything from that mirror unless it's name is specified as the normalized name (e.g. ``pip install Django`` will fail without --allow-unverified but ``pip install django`` will work). This would be fixed with pip 1.6 (since it would know to "normalize" the name before fetching the URL).
The same thing will occur if we make the change in pip first, it would normalize names so you'd need to use --allow-unverified for everything because it would act as if you typed ``pip install django`` instead of ``pip install Django``.
To my knowledge, this *only* will affect pip 1.5.x.
So the only way forward I can see to make this change, which I think is a good change and will remove a big "gotcha" from using a mirror, is to coordinate a release of bandersnatch that coincides with pip 1.6, and tell people they need to upgrade in lockstep.
Does anyone have any other ideas?
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org <mailto:Distutils-SIG@python.org> https://mail.python.org/mailman/listinfo/distutils-sig <https://mail.python.org/mailman/listinfo/distutils-sig>
Just thought of this, if the normalized name doesn’t match the "real" name, then add entries for both. This will make it so that pip 1.5 continues to work and pip 1.6+.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org <mailto:Distutils-SIG@python.org> https://mail.python.org/mailman/listinfo/distutils-sig <https://mail.python.org/mailman/listinfo/distutils-sig>
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Ah, I didn't think of that- good point. +1 to your suggested approach. On Thu, Aug 28, 2014 at 3:41 PM, Donald Stufft <donald@stufft.io> wrote:
Since pip 1.4 it does yes, however the problem here is that typically bandersnatch mirrors are simply hosted by plain static web servers and don’t require any sort of runtime logic.
On Aug 28, 2014, at 6:39 PM, Joe Smith <yasumoto7@gmail.com> wrote:
Naive question- does pip send over a UserAgent (or something) that contains a version number the server can use to determine which behavior to default to?
That would allow a deprecation cycle of N months or so that will let people upgrade from 1.5 to 1.6. We could then watch usage of 1.5 decrease over time until it's a non-factor.
On Thu, Aug 28, 2014 at 3:26 PM, Donald Stufft <donald@stufft.io> wrote:
On Aug 28, 2014, at 6:09 PM, Donald Stufft <donald@stufft.io> wrote:
On Aug 28, 2014, at 2:58 PM, Donald Stufft <donald@stufft.io> wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Hm, so here’s the problem.
I have this implemented and deployed to TestPyPI, it works great!
However, the next step is to make the change to bandersnatch so that it saves things using their normalized name instead of using their "proper" name. Doing this will trigger it so that everyone using pip 1.5 won't be able to install anything from that mirror unless it's name is specified as the normalized name (e.g. ``pip install Django`` will fail without --allow-unverified but ``pip install django`` will work). This would be fixed with pip 1.6 (since it would know to "normalize" the name before fetching the URL).
The same thing will occur if we make the change in pip first, it would normalize names so you'd need to use --allow-unverified for everything because it would act as if you typed ``pip install django`` instead of ``pip install Django``.
To my knowledge, this *only* will affect pip 1.5.x.
So the only way forward I can see to make this change, which I think is a good change and will remove a big "gotcha" from using a mirror, is to coordinate a release of bandersnatch that coincides with pip 1.6, and tell people they need to upgrade in lockstep.
Does anyone have any other ideas?
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Just thought of this, if the normalized name doesn’t match the "real" name, then add entries for both. This will make it so that pip 1.5 continues to work and pip 1.6+.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 29 Aug 2014 08:27, "Donald Stufft" <donald@stufft.io> wrote:
Just thought of this, if the normalized name doesn’t match the "real"
name,
then add entries for both. This will make it so that pip 1.5 continues to work and pip 1.6+.
Having bandersnatch mirrors publish under both names sounds like a good approach. Then the pip 1.6 release notes can just be explicit that using older mirrors will need the extra option - earlier versions won't have a problem. Cheers, Nick.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Thu, Aug 28, 2014 at 14:58 -0400, Donald Stufft wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name.
Of course you mean redirecting everything to the normalized name.
I'm also going to submit a PR to bandersnatch so that it will use normalized names ...
devpi-server also broke and I did a hotfix release today. Older installs will still have a problem, though (not all companies run the newest version all the time). Apart form the fact i was on vacation and on business travels, the notice for that breaking change was only one day which i think is a bit too quick. I'd really appreciate if you send a mail to Christian for bandersnatch and me for devpi before such changes happen and with a bit more reasonable ahead time. Besides, i think it's a good change in principle. best and thanks, holger
for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Sep 1, 2014, at 4:53 PM, holger krekel <holger@merlinux.eu> wrote:
On Thu, Aug 28, 2014 at 14:58 -0400, Donald Stufft wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name.
Of course you mean redirecting everything to the normalized name.
I'm also going to submit a PR to bandersnatch so that it will use normalized names ...
devpi-server also broke and I did a hotfix release today. Older installs will still have a problem, though (not all companies run the newest version all the time). Apart form the fact i was on vacation and on business travels, the notice for that breaking change was only one day which i think is a bit too quick. I'd really appreciate if you send a mail to Christian for bandersnatch and me for devpi before such changes happen and with a bit more reasonable ahead time.
Besides, i think it's a good change in principle.
best and thanks, holger
I can only really replete this with https://xkcd.com/1172/. This shouldn’t have been a breaking change, anyone following the HTTP spec dealt with this change just fine. As far as I can tell the only reason it broke devpi was because of an assertion in the code that was asserting against an implementation detail, an implementation detail that I changed. I’m sorry it broke devpi and that it happened at a time when you were on vacation, but honestly I don’t think it’s reasonable to expect every little thing to have to be run past a list of people. Due to the undocumented nature of these tools people have put a lot of (also undocumented) assumptions into their code, many of which are simply depending on implementation details. I try to test my changes against what I can, in this case pip, setuptools, and bandersnatch, but I can’t test against everything. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

FWIW, as a community member it doesn't seem unreasonable to me to expect that a certain amount of advance notice be given for changes like this, *especially* given that the tools are undocumented. Also, there's a difference between notifying people and "running it by" people (for permission). I think Holger is just asking for enough notice, which shouldn't slow you down like getting sign-off would, say. --Chris On Mon, Sep 1, 2014 at 4:07 PM, Donald Stufft <donald@stufft.io> wrote:
On Sep 1, 2014, at 4:53 PM, holger krekel <holger@merlinux.eu> wrote:
On Thu, Aug 28, 2014 at 14:58 -0400, Donald Stufft wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name.
Of course you mean redirecting everything to the normalized name.
I'm also going to submit a PR to bandersnatch so that it will use normalized names ...
devpi-server also broke and I did a hotfix release today. Older installs will still have a problem, though (not all companies run the newest version all the time). Apart form the fact i was on vacation and on business travels, the notice for that breaking change was only one day which i think is a bit too quick. I'd really appreciate if you send a mail to Christian for bandersnatch and me for devpi before such changes happen and with a bit more reasonable ahead time.
Besides, i think it's a good change in principle.
best and thanks, holger
I can only really replete this with https://xkcd.com/1172/.
This shouldn’t have been a breaking change, anyone following the HTTP spec dealt with this change just fine. As far as I can tell the only reason it broke devpi was because of an assertion in the code that was asserting against an implementation detail, an implementation detail that I changed.
I’m sorry it broke devpi and that it happened at a time when you were on vacation, but honestly I don’t think it’s reasonable to expect every little thing to have to be run past a list of people. Due to the undocumented nature of these tools people have put a lot of (also undocumented) assumptions into their code, many of which are simply depending on implementation details. I try to test my changes against what I can, in this case pip, setuptools, and bandersnatch, but I can’t test against everything.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Changes like what exactly? This was a fairly minor change which is why there wasn't more notice.
On Sep 1, 2014, at 7:44 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
FWIW, as a community member it doesn't seem unreasonable to me to expect that a certain amount of advance notice be given for changes like this, *especially* given that the tools are undocumented.

I don't know exactly. I'd say a change that in your judgment you think has a non-trivial chance of breaking existing tools. Holger is probably in a better position to say. I was just speaking in support of his request, which seemed reasonable to me. --Chris On Mon, Sep 1, 2014 at 5:03 PM, Donald Stufft <donald@stufft.io> wrote:
Changes like what exactly? This was a fairly minor change which is why there wasn't more notice.
On Sep 1, 2014, at 7:44 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
FWIW, as a community member it doesn't seem unreasonable to me to expect that a certain amount of advance notice be given for changes like this, *especially* given that the tools are undocumented.

On Mon, Sep 1, 2014, at 08:15 PM, Chris Jerdonek wrote:
I don't know exactly. I'd say a change that in your judgment you think has a non-trivial chance of breaking existing tools. Holger is probably in a better position to say. I was just speaking in support of his request, which seemed reasonable to me.
--Chris
Which is exactly my point. This change was minor. It didn't break anything but devpi and it wouldn't have broken devpi to my knowledge except for an assert statement that wasn't particularly needed. I already give notice (and discussion, often times even PEPs) for any change that I believe to be breaking. Wanting more is wanting notice on every single change on the off chance someone somewhere might have some dependency on any random implementation detail.
On Mon, Sep 1, 2014 at 5:03 PM, Donald Stufft <donald@stufft.io> wrote:
Changes like what exactly? This was a fairly minor change which is why there wasn't more notice.
On Sep 1, 2014, at 7:44 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
FWIW, as a community member it doesn't seem unreasonable to me to expect that a certain amount of advance notice be given for changes like this, *especially* given that the tools are undocumented.

On Mon, Sep 1, 2014 at 7:15 PM, Donald Stufft <donald@stufft.io> wrote:
On Mon, Sep 1, 2014, at 08:15 PM, Chris Jerdonek wrote:
I don't know exactly. I'd say a change that in your judgment you think has a non-trivial chance of breaking existing tools. Holger is probably in a better position to say. I was just speaking in support of his request, which seemed reasonable to me.
--Chris
Which is exactly my point. This change was minor. It didn't break anything but devpi and it wouldn't have broken devpi to my knowledge except for an assert statement that wasn't particularly needed.
I already give notice (and discussion, often times even PEPs) for any change that I believe to be breaking. Wanting more is wanting notice on every single change on the off chance someone somewhere might have some dependency on any random implementation detail.
If you don't have a good sense of what changes might break existing tools and don't want to notify people, one possibility is to build in a delay between committing to the repo and deploying to production. Interested folks could monitor commits to the repo -- giving them a chance to ask questions and update their tools if necessary. --Chris
On Mon, Sep 1, 2014 at 5:03 PM, Donald Stufft <donald@stufft.io> wrote:
Changes like what exactly? This was a fairly minor change which is why there wasn't more notice.
On Sep 1, 2014, at 7:44 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
FWIW, as a community member it doesn't seem unreasonable to me to expect that a certain amount of advance notice be given for changes like this, *especially* given that the tools are undocumented.

On 2 September 2014 12:54, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Sep 1, 2014 at 7:15 PM, Donald Stufft <donald@stufft.io> wrote:
I already give notice (and discussion, often times even PEPs) for any change that I believe to be breaking. Wanting more is wanting notice on every single change on the off chance someone somewhere might have some dependency on any random implementation detail.
If you don't have a good sense of what changes might break existing tools and don't want to notify people, one possibility is to build in a delay between committing to the repo and deploying to production. Interested folks could monitor commits to the repo -- giving them a chance to ask questions and update their tools if necessary.
That will pick up noise from internal or web only changes that don't affect the programmatic APIs. Ideally, we'd have an integration environment where tests for pip, bandersnatch and devpi were all automatically run against pypi commits before they went live, but that's rather a lot of work to set up. Until we have such a system, we may continue to see occasional incidents like this one. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Sep 01, 2014 at 19:07 -0400, Donald Stufft wrote:
On Sep 1, 2014, at 4:53 PM, holger krekel <holger@merlinux.eu> wrote:
On Thu, Aug 28, 2014 at 14:58 -0400, Donald Stufft wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name.
Of course you mean redirecting everything to the normalized name.
I'm also going to submit a PR to bandersnatch so that it will use normalized names ...
devpi-server also broke and I did a hotfix release today. Older installs will still have a problem, though (not all companies run the newest version all the time). Apart form the fact i was on vacation and on business travels, the notice for that breaking change was only one day which i think is a bit too quick. I'd really appreciate if you send a mail to Christian for bandersnatch and me for devpi before such changes happen and with a bit more reasonable ahead time.
Besides, i think it's a good change in principle.
best and thanks, holger
I can only really replete this with https://xkcd.com/1172/. This shouldn’t have been a breaking change, anyone following the HTTP spec dealt with this change just fine. As far as I can tell the only reason it broke devpi was because of an assertion in the code that was asserting against an implementation detail, an implementation detail that I changed.
Right, the assertion was there to ensure pypi's "realname" and devpi's internal "realname" of a project are the same. This check is now relaxed. FWIW I'd prefer it we just said in all pypi APIs (http and xmlrpc/json) that a project name is always kept in canonical form, i.e. you can maybe register "HeLlo_World" but it just means "hello-world" next time someone asks for it. What is the relevance of the "realname" anyway? Do you keep "realnames" in warehouse?
I’m sorry it broke devpi and that it happened at a time when you were on vacation, but honestly I don’t think it’s reasonable to expect every little thing to have to be run past a list of people. Due to the undocumented nature of these tools people have put a lot of (also undocumented) assumptions into their code, many of which are simply depending on implementation details. I try to test my changes against what I can, in this case pip, setuptools, and bandersnatch, but I can’t test against everything.
Thanks for all your work and eagerness to improve things. I think it's safe to assume that any change in PyPI's pip/bandersnatch/devpi facing http API has potential for disruption even if some http specification says otherwise -- at least until we have some specification of how tool/pypi interactions work. best, holger

On Sep 2, 2014, at 5:36 AM, holger krekel <holger@merlinux.eu> wrote:
On Mon, Sep 01, 2014 at 19:07 -0400, Donald Stufft wrote:
On Sep 1, 2014, at 4:53 PM, holger krekel <holger@merlinux.eu> wrote:
On Thu, Aug 28, 2014 at 14:58 -0400, Donald Stufft wrote:
Right now the “canonical” page for a particular project on PyPI is whatever the author happened to name their package (e.g. Django). This requires PyPI to have some "smarts" so that it can redirect things like /simple/django/ to /simple/Django/ otherwise someone doing ``pip install django`` would fall back to a much worse behavior.
If this redirect doesn't happen, then pip will issue a request for just /simple/ and look for a link that, when both sides are normalized, compares equal to the name it's looking for. It will then follow the link, get /simple/Django/ and everything works... Except it doesn't. The problem here comes from the external link classification that we have now. Pip sees the link to /simple/Django/ as an external link (because it lacks the required rels) and the installation finally fails.
The /simple/ case rarely happens when installing from PyPI itself because of the redirect, however it happens quite often when someone is attempting to instal from a mirror instead. Even when everything works correctly the penality for not knowing exactly what name to type in results in at least 1 extra http request, one of which (/simple/) requires pulling down a 2.1MB file.
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name.
Of course you mean redirecting everything to the normalized name.
I'm also going to submit a PR to bandersnatch so that it will use normalized names ...
devpi-server also broke and I did a hotfix release today. Older installs will still have a problem, though (not all companies run the newest version all the time). Apart form the fact i was on vacation and on business travels, the notice for that breaking change was only one day which i think is a bit too quick. I'd really appreciate if you send a mail to Christian for bandersnatch and me for devpi before such changes happen and with a bit more reasonable ahead time.
Besides, i think it's a good change in principle.
best and thanks, holger
I can only really replete this with https://xkcd.com/1172/. This shouldn’t have been a breaking change, anyone following the HTTP spec dealt with this change just fine. As far as I can tell the only reason it broke devpi was because of an assertion in the code that was asserting against an implementation detail, an implementation detail that I changed.
Right, the assertion was there to ensure pypi's "realname" and devpi's internal "realname" of a project are the same. This check is now relaxed.
FWIW I'd prefer it we just said in all pypi APIs (http and xmlrpc/json) that a project name is always kept in canonical form, i.e. you can maybe register "HeLlo_World" but it just means "hello-world" next time someone asks for it. What is the relevance of the "realname" anyway? Do you keep "realnames" in warehouse?
As of right now we do, although I think it’s likely that Warehouse will end up with the normalized name being used as the “identifier” for a project and the name that an author typed in being used as the “display name”.
I’m sorry it broke devpi and that it happened at a time when you were on vacation, but honestly I don’t think it’s reasonable to expect every little thing to have to be run past a list of people. Due to the undocumented nature of these tools people have put a lot of (also undocumented) assumptions into their code, many of which are simply depending on implementation details. I try to test my changes against what I can, in this case pip, setuptools, and bandersnatch, but I can’t test against everything.
Thanks for all your work and eagerness to improve things. I think it's safe to assume that any change in PyPI's pip/bandersnatch/devpi facing http API has potential for disruption even if some http specification says otherwise -- at least until we have some specification of how tool/pypi interactions work.
best, holger
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 28 August 2014 19:58, Donald Stufft <donald@stufft.io> wrote:
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
Just to clarify, this means that if I want to find the simple index page for a distribution, without hitting redirects, I should first normalise the project name (so "Django" becomes "django") and then request https://pypi.python.org/simple/<normalised_name>/ (with a slash on the end). Is that correct? It seems to match what I see in practice (in particular, the version without a terminating slash redirects to the version with a terminating slash). The JSON API has the opposite behaviour - https://pypi.python.org/pypi/Django/json redirects to https://pypi.python.org/pypi/django/json. Should that not be changed to match? Will it be? Paul

On 30 September 2014 15:25, Paul Moore <p.f.moore@gmail.com> wrote:
On 28 August 2014 19:58, Donald Stufft <donald@stufft.io> wrote:
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
Just to clarify, this means that if I want to find the simple index page for a distribution, without hitting redirects, I should first normalise the project name (so "Django" becomes "django") and then request https://pypi.python.org/simple/<normalised_name>/ (with a slash on the end). Is that correct? It seems to match what I see in practice (in particular, the version without a terminating slash redirects to the version with a terminating slash).
The JSON API has the opposite behaviour - https://pypi.python.org/pypi/Django/json redirects to https://pypi.python.org/pypi/django/json. Should that not be changed to match? Will it be?
One further thought. Where is the definition of how to normalise a name? I could probably dig through the pip sources and find it, but it would be nice if it were documented somewhere. From experiment, it seems like lowercase, and with hyphens rather than underscores, is the definition. Does PyPI allow names not allowed by http://legacy.python.org/dev/peps/pep-0426/#name and if it does, how are they normalised? In case it's not obvious, I'm writing a client for the PyPI API, and these questions are coming out of that process. Paul. PS The Python wiki has pages for the XMLRPC and JSON API. Any objections to me adding a page for the simple API? (The obvious objection being that it's documented somewhere else, and I should just put a pointer to the real documentation...) Paul

On Sep 30, 2014, at 11:14 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 30 September 2014 15:25, Paul Moore <p.f.moore@gmail.com> wrote:
On 28 August 2014 19:58, Donald Stufft <donald@stufft.io> wrote:
To fix this I'm going to modify PyPI so that it uses the normalized name in the /simple/ URL and redirects everything else to the non-normalized name. I'm also going to submit a PR to bandersnatch so that it will use normalized names for it's directories and such as well. These two changes will make it so that the client side will know ahead of time exactly what form the server expects any given name to be in. This will allow a change in pip to happen which will pre-normalize all names which will make the interaction with mirrors better and will reduce the number of HTTP requests that a single ``pip install`` needs to make.
Just to clarify, this means that if I want to find the simple index page for a distribution, without hitting redirects, I should first normalise the project name (so "Django" becomes "django") and then request https://pypi.python.org/simple/<normalised_name>/ (with a slash on the end). Is that correct? It seems to match what I see in practice (in particular, the version without a terminating slash redirects to the version with a terminating slash).
The JSON API has the opposite behaviour - https://pypi.python.org/pypi/Django/json redirects to https://pypi.python.org/pypi/django/json. Should that not be changed to match? Will it be?
One further thought. Where is the definition of how to normalise a name? I could probably dig through the pip sources and find it, but it would be nice if it were documented somewhere. From experiment, it seems like lowercase, and with hyphens rather than underscores, is the definition. Does PyPI allow names not allowed by http://legacy.python.org/dev/peps/pep-0426/#name and if it does, how are they normalised?
In case it's not obvious, I'm writing a client for the PyPI API, and these questions are coming out of that process.
Paul.
PS The Python wiki has pages for the XMLRPC and JSON API. Any objections to me adding a page for the simple API? (The obvious objection being that it's documented somewhere else, and I should just put a pointer to the real documentation...)
Paul
PyPI follows PEP 426, I think we even include the confusables support. Generally the normalization is done with pkg_resources.safe_name(…).lower(). I don’t think there’s any reason not to document it, setuptools has it’s routine documented but that does’t have everything that the /simple/ API supports documented since it’s really documentation for what setuptools does. The URL redirect for the json endpoint was made to match what happens with /pypi/django/. Lately I’ve been thinking that maybe we should just use the normalized form in URLs always and use the author provided name for display purposes. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
participants (6)
-
Chris Jerdonek
-
Donald Stufft
-
holger krekel
-
Joe Smith
-
Nick Coghlan
-
Paul Moore