What to do about the PyPI mirrors
Hi all, I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem. I think it's probably time we thought about how to demote the mirrors: - they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work So, things to do: - links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote These are just rough thoughts that occurred to me just now. Richard
On Jul 24, 2013, at 10:38 PM, Richard Jones wrote:
Hi all,
I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem.
I think it's probably time we thought about how to demote the mirrors:
- they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work
So, things to do:
- links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote
These are just rough thoughts that occurred to me just now.
+1, as envoy of infrastructure team we would like to formally retire the [a-z].pypi.python.org names. Anyone with an existing mirror should be encouraged to continue maintaining it, but it will be for their own use (or the use of their company/internal network). --Noah
On Jul 25, 2013, at 2:14 AM, Noah Kantrowitz <noah@coderanger.net> wrote:
On Jul 24, 2013, at 10:38 PM, Richard Jones wrote:
Hi all,
I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem.
I think it's probably time we thought about how to demote the mirrors:
- they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work
So, things to do:
- links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote
These are just rough thoughts that occurred to me just now.
+1, as envoy of infrastructure team we would like to formally retire the [a-z].pypi.python.org names. Anyone with an existing mirror should be encouraged to continue maintaining it, but it will be for their own use (or the use of their company/internal network).
--Noah
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
+1 as well. 2/6 of the mirrors are gone already and I don't think anyone actually implemented the mirror authenticity protocol so afaik installing from mirrors is completely insecure since it's all via HTTP and moving to HTTPS would require getting SSL certs for each specific one which isn't likely to be something that we can do. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Jul 25, 2013, at 1:38 AM, Richard Jones <r1chardj0n3s@gmail.com> wrote:
Hi all,
I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem.
I think it's probably time we thought about how to demote the mirrors:
- they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work
So, things to do:
- links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote
These are just rough thoughts that occurred to me just now.
Richard _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Just a quick note though to be clear. I still plan on supporting (and even improving) the ability to mirror PyPI. I think it's an important ability to have especially for companies, or projects like OpenStack who use a mirror for their massive CI infrastructure. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Jul 25, 2013, at 1:38 AM, Richard Jones <r1chardj0n3s@gmail.com> wrote:
Hi all,
I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem.
I think it's probably time we thought about how to demote the mirrors:
- they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work
So, things to do:
- links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote
These are just rough thoughts that occurred to me just now.
Richard _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Can we close the loop on this? Ideally I think any public mirrors should need to register their own domain name. We can either maintain a list of unofficial mirrors, or Ken Cochrane has been doing a good job I think of keeping a list (as well as tracking some basic stats) at http://pypi-mirrors.org/ so maybe we can just point people to that as the list of mirrors? Ideally we should get all of them off the *.python.org namespace. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Aug 3, 2013, at 5:17 PM, Donald Stufft wrote:
On Jul 25, 2013, at 1:38 AM, Richard Jones <r1chardj0n3s@gmail.com> wrote:
Hi all,
I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem.
I think it's probably time we thought about how to demote the mirrors:
- they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work
So, things to do:
- links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote
These are just rough thoughts that occurred to me just now.
Richard _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Can we close the loop on this? Ideally I think any public mirrors should need to register their own domain name. We can either maintain a list of unofficial mirrors, or Ken Cochrane has been doing a good job I think of keeping a list (as well as tracking some basic stats) at http://pypi-mirrors.org/ so maybe we can just point people to that as the list of mirrors?
Ideally we should get all of them off the *.python.org namespace.
As the one with the finger on the not-the-metaphorical button, I think we should say that two (2) months from now, on October 1st 2013, the [a-g].pypi.python.org DNS names will all be redirected to front.python.org and another two months beyond that (2013-12-01) they will all be deleted (along with last.pypi.python.org). That seems like a very generous deprecation schedule, especially given that all the needs to change is some domain registrations. --Noah
On Aug 4, 2013, at 3:14 AM, Noah Kantrowitz <noah@coderanger.net> wrote:
On Aug 3, 2013, at 5:17 PM, Donald Stufft wrote:
On Jul 25, 2013, at 1:38 AM, Richard Jones <r1chardj0n3s@gmail.com> wrote:
Hi all,
I've just been contacted by someone who's set up a new public mirror of PyPI and would like it integrated into the mirror ecosystem.
I think it's probably time we thought about how to demote the mirrors:
- they cause problems with security (being under the python.org domain causes various issues including inability to use HTTPS and cookie issues) - they're no longer necessary thanks to the CDN work
So, things to do:
- links and information on PyPI itself can be removed - tools that use mirrors still need to be able to but mention of using public mirrors is probably something to demote
These are just rough thoughts that occurred to me just now.
Richard _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Can we close the loop on this? Ideally I think any public mirrors should need to register their own domain name. We can either maintain a list of unofficial mirrors, or Ken Cochrane has been doing a good job I think of keeping a list (as well as tracking some basic stats) at http://pypi-mirrors.org/ so maybe we can just point people to that as the list of mirrors?
Ideally we should get all of them off the *.python.org namespace.
As the one with the finger on the not-the-metaphorical button, I think we should say that two (2) months from now, on October 1st 2013, the [a-g].pypi.python.org DNS names will all be redirected to front.python.org and another two months beyond that (2013-12-01) they will all be deleted (along with last.pypi.python.org). That seems like a very generous deprecation schedule, especially given that all the needs to change is some domain registrations.
--Noah
Personally I +1 this proposal, it's been near 10 days with basically zero response of any kind, and no response to the negative. The only change I'd possibly make is change the deletion period to some period of time after the pip 1.5 release. 5 days ago my branch to remove mirroring support from pip was merged into pip's develop branch. I don't see any direct support for mirroring in setuptools nor do I see any in buildout so I think it makes sense to hold off on the final deletion until after the only of the 3 major installers that seems to have any direct support for mirrors has released a version without it plus a bit of lead time for people to switch. So I guess revised I'd say in roughly two months on Oct 1st the [a-g].pypi.python.rg DNS names will be redirected to front.python.org and then roughly 2 months after pip has released version 1.5 with the removal of the mirroring support they will be deleted along with last.pypi.python.org. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 4 August 2013 18:48, Donald Stufft <donald@stufft.io> wrote:
5 days ago my branch to remove mirroring support from pip was merged into pip's develop branch. I don't see any direct support for mirroring in setuptools nor do I see any in buildout so I think it makes sense to hold off on the final deletion until after the only of the 3 major installers that seems to have any direct support for mirrors has released a version without it plus a bit of lead time for people to switch.
So I guess revised I'd say in roughly two months on Oct 1st the [a-g].pypi.python.rg DNS names will be redirected to front.python.org and then roughly 2 months after pip has released version 1.5 with the removal of the mirroring support they will be deleted along with last.pypi.python.org.
Sounds good to me. You may not love this next part though: since the mirror naming scheme was originally defined in a PEP (http://www.python.org/dev/peps/pep-0381/#mirror-listing-and-registering), I think its retirement should also be published that way. The PEP doesn't need to be long, it should just say that PyPI will continue to support the mirroring protocol (and perhaps even mention that the current preferred mirroring tool is bandersnatch rather than pep381client), but the CDN is considered to replace the public mirror network and those DNS names will all be retired. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Aug 4, 2013, at 5:13 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 August 2013 18:48, Donald Stufft <donald@stufft.io> wrote:
5 days ago my branch to remove mirroring support from pip was merged into pip's develop branch. I don't see any direct support for mirroring in setuptools nor do I see any in buildout so I think it makes sense to hold off on the final deletion until after the only of the 3 major installers that seems to have any direct support for mirrors has released a version without it plus a bit of lead time for people to switch.
So I guess revised I'd say in roughly two months on Oct 1st the [a-g].pypi.python.rg DNS names will be redirected to front.python.org and then roughly 2 months after pip has released version 1.5 with the removal of the mirroring support they will be deleted along with last.pypi.python.org.
Sounds good to me. You may not love this next part though: since the mirror naming scheme was originally defined in a PEP (http://www.python.org/dev/peps/pep-0381/#mirror-listing-and-registering), I think its retirement should also be published that way.
The PEP doesn't need to be long, it should just say that PyPI will continue to support the mirroring protocol (and perhaps even mention that the current preferred mirroring tool is bandersnatch rather than pep381client), but the CDN is considered to replace the public mirror network and those DNS names will all be retired.
OTOH that PEP was never accepted and is currently a draft but if you really think it should be a PEP I can write one up.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 4 August 2013 19:18, Donald Stufft <donald@stufft.io> wrote:
On Aug 4, 2013, at 5:13 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 August 2013 18:48, Donald Stufft <donald@stufft.io> wrote:
5 days ago my branch to remove mirroring support from pip was merged into pip's develop branch. I don't see any direct support for mirroring in setuptools nor do I see any in buildout so I think it makes sense to hold off on the final deletion until after the only of the 3 major installers that seems to have any direct support for mirrors has released a version without it plus a bit of lead time for people to switch.
So I guess revised I'd say in roughly two months on Oct 1st the [a-g].pypi.python.rg DNS names will be redirected to front.python.org and then roughly 2 months after pip has released version 1.5 with the removal of the mirroring support they will be deleted along with last.pypi.python.org.
Sounds good to me. You may not love this next part though: since the mirror naming scheme was originally defined in a PEP (http://www.python.org/dev/peps/pep-0381/#mirror-listing-and-registering), I think its retirement should also be published that way.
The PEP doesn't need to be long, it should just say that PyPI will continue to support the mirroring protocol (and perhaps even mention that the current preferred mirroring tool is bandersnatch rather than pep381client), but the CDN is considered to replace the public mirror network and those DNS names will all be retired.
OTOH that PEP was never accepted and is currently a draft
Alas, the nominal status of the packaging PEPs isn't always a great guide to their real world usage :P
but if you really think it should be a PEP I can write one up.
I think we're at a point where overcommunicating is definitely better than the alternative :) In this case, what I suggest we do is put the new PEP to Accepted with a Replaces: header for 381, and then set 381 to Superseded. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Okies I can write one up. On Aug 4, 2013, at 5:45 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 August 2013 19:18, Donald Stufft <donald@stufft.io> wrote:
On Aug 4, 2013, at 5:13 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 August 2013 18:48, Donald Stufft <donald@stufft.io> wrote:
5 days ago my branch to remove mirroring support from pip was merged into pip's develop branch. I don't see any direct support for mirroring in setuptools nor do I see any in buildout so I think it makes sense to hold off on the final deletion until after the only of the 3 major installers that seems to have any direct support for mirrors has released a version without it plus a bit of lead time for people to switch.
So I guess revised I'd say in roughly two months on Oct 1st the [a-g].pypi.python.rg DNS names will be redirected to front.python.org and then roughly 2 months after pip has released version 1.5 with the removal of the mirroring support they will be deleted along with last.pypi.python.org.
Sounds good to me. You may not love this next part though: since the mirror naming scheme was originally defined in a PEP (http://www.python.org/dev/peps/pep-0381/#mirror-listing-and-registering), I think its retirement should also be published that way.
The PEP doesn't need to be long, it should just say that PyPI will continue to support the mirroring protocol (and perhaps even mention that the current preferred mirroring tool is bandersnatch rather than pep381client), but the CDN is considered to replace the public mirror network and those DNS names will all be retired.
OTOH that PEP was never accepted and is currently a draft
Alas, the nominal status of the packaging PEPs isn't always a great guide to their real world usage :P
but if you really think it should be a PEP I can write one up.
I think we're at a point where overcommunicating is definitely better than the alternative :)
In this case, what I suggest we do is put the new PEP to Accepted with a Replaces: header for 381, and then set 381 to Superseded.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Here's my PEP for Deprecating and Removing the Official Public Mirrors It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst Abstract ======== This PEP provides a path to deprecate and ultimately remove the official public mirroring infrastructure for `PyPI`_. It does not propose the removal of mirroring support in general. Rationale ========= The PyPI mirroring infrastructure (defined in `PEP381`_) provides a means to mirror the content of PyPI used by the automatic installers. It also provides a method for autodiscovery of mirrors and a consistent naming scheme. There are a number of problems with the official public mirrors: * They give control over a \*.python.org domain name to a third party, allowing that third party to set or read cookies on the pypi.python.org and python.org domain name. * The use of a sub domain of pypi.python.org means that the mirror operators will never be able to get a certificate of their own, and giving them one for a python.org domain name is unlikely to happen. * They are often out of date, most often by several hours to a few days, but regularly several days and even months. * With the introduction of the CDN on PyPI the public mirroring infrastructure is not as important as it once was as the CDN is also a globally distributed network of servers which will function even if PyPI is down. * Although there is provisions in place for it, there is currently no known installer which uses the authenticity checks discussed in `PEP381`_ which means that any download from a mirror is subject to attack by a malicious mirror operator, but further more due to the lack of TLS it also means that any download from a mirror is also subject to a MITM attack. * They have only ever been implemented by one installer (pip), and its implementation, besides being insecure, has serious issues with performance and is slated for removal with it's next release (1.5). Due to the number of issues, some of them very serious, and the CDN which more or less provides much of the same benefits this PEP proposes to first deprecate and then remove the public mirroring infrastructure. The ability to mirror and the method of mirroring will not be affected and the existing public mirrors are encouraged to acquire their own domains to host their mirrors on if they wish to continue hosting them. Plan for Deprecation & Removal ============================== Immediately upon acceptance of this PEP documentation on PyPI will be updated to reflect the deprecated nature of the official public mirrors and will direct users to external resources like http://www.pypi-mirrors.org/ to discover unofficial public mirrors if they wish to use one. On October 1st, 2013, roughly 2 months from the date of this PEP, the DNS names of the public mirrors ([a-g].pypi.python.org) will be changed to point back to PyPI which will be modified to accept requests from those domains. At this point in time the public mirrors will be considered deprecated. Then, roughly 2 months after the release of the first version of pip to have mirroring support removed (currently slated for pip 1.5) the DNS entries for [a-g].pypi.python.org and last.pypi.python.org will be removed and PyPI will no longer accept requests at those domains. Unofficial Public or Private Mirrors ==================================== The mirroring protocol will continue to exist as defined in `PEP381`_ and people are encouraged to utilize to host unofficial public and private mirrors if they so desire. For operators of unofficial public or private mirrors the recommended mirroring client is `Bandersnatch`_. .. _PyPI: https://pypi.python.org/ .. _PEP381: http://www.python.org/dev/peps/pep-0381/ .. _Bandersnatch: https://pypi.python.org/pypi/bandersnatch ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 5 August 2013 08:25, Donald Stufft <donald@stufft.io> wrote:
Here's my PEP for Deprecating and Removing the Official Public Mirrors
It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst
Donald's proposal is now PEP 449: http://www.python.org/dev/peps/pep-0449/ Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi, looks like I'm late to the party to figure out that I'm going to be hurt again. I'd like to suggest explicitly considering what is going to break due to this and how much work you are forcefully inflicting on others. My whole experience around the packaging (distribute/setuptools) and mirroring/CDN in this year estimates cost for my company somewhere between 10k-20k EUR just for keeping up with the breakage those changes incure. It might be that we're wonderfully stupid (..enough to contribute) and all of this causes no headaches for anybody else …. Overall, guessing that the packaging infrastructure is used by probably multiple thousands of companies then I'd expect that at least 100 of them might be experiencing problems like us. Juggling arbritrary numbers I can see that we're inflicting around a million EURs of cost that nobody asked for. More specific statements below. On 2013-08-04 22:25:01 +0000, Donald Stufft said:
Here's my PEP for Deprecating and Removing the Official Public Mirrors
It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst
Abstract ======= This PEP provides a path to deprecate and ultimately remove the official public mirroring infrastructure for `PyPI`_. It does not propose the removal of mirroring support in general.
-1 - maybe I don't have the right to speak up on CDN usage, but personally I feel it's a bad idea to delegate overall PyPI availability exclusively to a commercial third party. It's OK for me that we're using them to improve PyPI availability, but completely putting our faith in their hands, doesn't sound right to me.
Rationale ======== The PyPI mirroring infrastructure (defined in `PEP381`_) provides a means to mirror the content of PyPI used by the automatic installers. It also provides a method for autodiscovery of mirrors and a consistent naming scheme.
There are a number of problems with the official public mirrors:
* They give control over a \*.python.org domain name to a third party, allowing that third party to set or read cookies on the pypi.python.org and python.org domain name.
Agreed, that's a problem.
* The use of a sub domain of pypi.python.org means that the mirror operators will never be able to get a certificate of their own, and giving them one for a python.org domain name is unlikely to happen.
Agreed.
* They are often out of date, most often by several hours to a few days, but regularly several days and even months.
That's something that the mirroring infrastructure should have been constructed for. I completely agree that the way the mirroring was established was way sub-optimal. I think we can do better.
* With the introduction of the CDN on PyPI the public mirroring infrastructure is not as important as it once was as the CDN is also a globally distributed network of servers which will function even if PyPI is down.
Well, now we have one breakage point more which keeps annoying me. This argument is not completely true. They may be getting better over time but we have invested heavily to accomodate the breakage - that needs to be balanced with some benefit in the near future.
* Although there is provisions in place for it, there is currently no known installer which uses the authenticity checks discussed in `PEP381`_ which means that any download from a mirror is subject to attack by a malicious mirror operator, but further more due to the lack of TLS it also means that any download from a mirror is also subject to a MITM attack.
Again, I think that was a mistake during the introduction of the mirroring infrastructure: too few people, too confusing PEP.
* They have only ever been implemented by one installer (pip), and its implementation, besides being insecure, has serious issues with performance and is slated for removal with it's next release (1.5).
Only if you consider the mirror auto-discovery protocol. I'm not sure whether using DNS was such a smart move. A simple HTTP request to find mirrors would have been nice. I think we can still do that. Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that. We are also thinking about providing system-level default configuration to hint tools like PIP and setuptools to a different default index that is closer from a network perspective. From a customer perspective this should be "PyPI". I'd like to avoid breakage. Again, if you don't let me choose where to spend my time, I'd rather invest the time I need for cleaning up the breakage into something constructive. The indices are in active use. f.pypi.python.org is seeing between 150-300GB of traffic per month, the patterns widely ranging over the last month. This is traffic that is not used internally from gocept.
Due to the number of issues, some of them very serious, and the CDN which more or less provides much of the same benefits this PEP proposes to first deprecate and then remove the public mirroring infrastructure. The ability to mirror and the method of mirroring will not be affected and the existing public mirrors are encouraged to acquire their own domains to host their mirrors on if they wish to continue hosting them.
The biggest benefit of the mirroring infrastructure is that it is intended to be de-centralized. As a community member I can step up and take over responsibility of availability, performance, and security of a mirror. As a community member I have to completely submit to whatever the CDN does and contacting another community member who hopefully will be with us for a long time and stay in good contact with the CDN for us. That's centralization and I don't like that a bit.
Plan for Deprecation & Removal ============================= Immediately upon acceptance of this PEP documentation on PyPI will be updated to reflect the deprecated nature of the official public mirrors and will direct users to external resources like http://www.pypi-mirrors.org/ to discover unofficial public mirrors if they wish to use one.
On October 1st, 2013, roughly 2 months from the date of this PEP, the DNS names of the public mirrors ([a-g].pypi.python.org) will be changed to point back to PyPI which will be modified to accept requests from those domains. At this point in time the public mirrors will be considered deprecated.
Then, roughly 2 months after the release of the first version of pip to have mirroring support removed (currently slated for pip 1.5) the DNS entries for [a-g].pypi.python.org and last.pypi.python.org will be removed and PyPI will no longer accept requests at those domains.
Oh great. That means in about 4 months I have to go through *any installation that my company maintains* and sift through whether we're still referencing f.pypi.python.org anywhere. Can I write a check?
Unofficial Public or Private Mirrors =================================== The mirroring protocol will continue to exist as defined in `PEP381`_ and people are encouraged to utilize to host unofficial public and private mirrors if they so desire. For operators of unofficial public or private mirrors the recommended mirroring client is `Bandersnatch`_.
Thanks for the recommendation. Instead of this dance breaking many things yet again, I'd love if we could find a way forward keeping the infrastructure. Some ideas: - Take control of *.pypi.python.org back - Record other public names of the mirrors - Use 301 redirects to send old installations over to the new mirror names. - Make it easier for community members to help maintain the list of mirrors. - Make a better (faster) removal policy of mirrors if the owners are not responsive. - Make it easier for other community members to set up and maintain mirrors. I'm happy to improve bandersnatch where needed. Lastly, again, and I might be getting on everyones nerves. Why does it seem that other communities have figured this out much simpler, with less hassle, and with no significant changes for years and we need to keep changing stuff over and over and over and break things over and over and over. It's really hard for me to write this mail without cussing - the situation is very frustrating: the community dynamics seem to "want to move forward" where they from my perspective "wander left and right and break stuff like a drunken elephant driving a tank throught the Louvre". Christian
Two more things: why is the CDN not suffering from the security problems you describe for the mirrors? a) Fastly seems to be the one owning the certificate for pypi.python.org. What?!? b) What does stop Fastly from introducing incorrect/rogue code in package downloads? Christian
On Aug 5, 2013, at 11:11 PM, Christian Theune <ct@gocept.com> wrote:
Two more things:
why is the CDN not suffering from the security problems you describe for the mirrors?
a) Fastly seems to be the one owning the certificate for pypi.python.org. What?!?
They have a delegated SAN for it, which digicert (the CA) authorizes with the domain contact (the board in this case).
b) What does stop Fastly from introducing incorrect/rogue code in package downloads?
Basically this one boils down to personal trust from me to the Fastly team combined with the other companies using them being very reputable. At the end of the day, there is not currently any cryptographic mechanism preventing Fastly from doing bad things. --Noah
On Aug 6, 2013, at 2:31 AM, Noah Kantrowitz <noah@coderanger.net> wrote:
On Aug 5, 2013, at 11:11 PM, Christian Theune <ct@gocept.com> wrote:
Two more things:
why is the CDN not suffering from the security problems you describe for the mirrors?
a) Fastly seems to be the one owning the certificate for pypi.python.org. What?!?
They have a delegated SAN for it, which digicert (the CA) authorizes with the domain contact (the board in this case).
b) What does stop Fastly from introducing incorrect/rogue code in package downloads?
Basically this one boils down to personal trust from me to the Fastly team combined with the other companies using them being very reputable. At the end of the day, there is not currently any cryptographic mechanism preventing Fastly from doing bad things.
To further expand on this answer, you need to trust *someone*. If we cut out Fastly here you could say, well what prevents Dyn Inc (DNS host) from simply redirecting the DNS to a different host? What prevents OSUOL from simply accessing the machines stored there and doing bad things (™). Hell, how many people here know the entire infrastructure team and has personally decided to trust them? At the end of the day you need to pick and choose who you trust. Right now we're working on narrowing down the number of people trusted. The Python Infrastructure has decided it is willing to extend trust to Fastly to cover PyPI the same as it was willing to extend trust to Dyn, and OSOUL, and even the members of the Infra team. Now that being said narrowing the list of people you need to trust is an ongoing goal, and one that isn't going to stop with limiting the number of places able to publish at varying python.org domain names who don't need to be. We're not in a particularly well off position yet but we are getting better all the time.
--Noah
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Hi, Thanks for all the feedback, I'll calm down a bit and ponder some more structured reply. However, you're responding to the technicalities. I didn't see any consideration to the user pain. It seems irrelevant. Almost like arguing with the TSA about taking off your shoes. f.pypi.python.org is going to go away. And *everyone* using it needs to change it. Manually - or else. Other communities, like the Linux distributions, are doing simple, file-based stuff for ages. They did not learn from us, and AFAICT we didn't learn from them? My overall feeling is that we're telling the story to the outside world that they should not rely on Python. We can break anything, will find a good technical argument, and be done. If we're getting so much better with all the changes, then this goodness should be available to anyone invested with the platform already. Currently this seems to be: hurt the people who are with us now, so we can get more new ones when they leave. Sorry for the sarcasm, as promised, I'll come back with a more structured technical response later - need to go and calm down. Christian -- Christian Theune · ct@gocept.com gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · Tel +49 345 1229889-7 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
On 6 August 2013 16:59, Christian Theune <ct@gocept.com> wrote:
Hi,
Thanks for all the feedback, I'll calm down a bit and ponder some more structured reply.
However, you're responding to the technicalities. I didn't see any consideration to the user pain. It seems irrelevant. Almost like arguing with the TSA about taking off your shoes.
User pain is the only reason for not making the change tomorrow. People need time to adjust, or to propose alternative solutions.
f.pypi.python.org is going to go away. And *everyone* using it needs to change it. Manually - or else.
Delegating subdomains of python.org without a contractual relationship in place was a fundamental mistake. It should never have happened. We can either admit "We screwed up and set up a seriously flawed mirroring system" and take steps to fix it, or we can leave the HTTP-only mirrors open as a security hole forever. One means by which I could see an f.pypi.python.org DNS record being left in place indefinitely is if the TUF folks are able to come up with a scheme for offering end-to-end security for the *existing* PyPI metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the TUF client side integrity checks are invoked by pip. In that case, the security argument regarding the lack of TLS on the subdomains would be rendered moot, and the backwards compatibility argument for keeping it active would win. Another potential alternative might be for Gocept to approach the PSF about getting an SSL certificate for that domain, ensuring pip and setuptools both support HSTS, and then switching that mirror over to using HSTS (so even configurations hardcoded to use http://f.pypi.python.org will still get a validated secure connection). Both of those approaches would close the security hole, while leaving the domain in place. If upgrading pip and easy_install clients is a more acceptable solution than updating affected configurations to use a different domain name, then these are certainly options we should discuss. The only option which I consider completely out of the question is leaving f.pypi.python.org (or any other *.pypi.python.org subdomain) in place indefinitely as an insecure HTTP-only endpoint.
Other communities, like the Linux distributions, are doing simple, file-based stuff for ages. They did not learn from us, and AFAICT we didn't learn from them?
This case *is* a matter of us learning from other mirroring systems: none of them are based on delegating subdomains to third parties, they're all based on lists of mirror URLs, and some mechanism for retrieving that list. However, as long as the flawed way remains blessed as the official mirroring network, it's difficult for an alternative model to gain any traction. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
One means by which I could see an f.pypi.python.org DNS record being
left in place indefinitely is if the TUF folks are able to come up with a scheme for offering end-to-end security for the *existing* PyPI metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the TUF client side integrity checks are invoked by pip. In that case, the security argument regarding the lack of TLS on the subdomains would be rendered moot, and the backwards compatibility argument for keeping it active would win.
It seems like you've been reading our minds (or at least our mailing list)! Thanks, Justin
On Aug 6, 2013, at 8:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 August 2013 16:59, Christian Theune <ct@gocept.com> wrote:
Hi,
Thanks for all the feedback, I'll calm down a bit and ponder some more structured reply.
However, you're responding to the technicalities. I didn't see any consideration to the user pain. It seems irrelevant. Almost like arguing with the TSA about taking off your shoes.
User pain is the only reason for not making the change tomorrow. People need time to adjust, or to propose alternative solutions.
f.pypi.python.org is going to go away. And *everyone* using it needs to change it. Manually - or else.
Delegating subdomains of python.org without a contractual relationship in place was a fundamental mistake. It should never have happened. We can either admit "We screwed up and set up a seriously flawed mirroring system" and take steps to fix it, or we can leave the HTTP-only mirrors open as a security hole forever.
One means by which I could see an f.pypi.python.org DNS record being left in place indefinitely is if the TUF folks are able to come up with a scheme for offering end-to-end security for the *existing* PyPI metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the TUF client side integrity checks are invoked by pip. In that case, the security argument regarding the lack of TLS on the subdomains would be rendered moot, and the backwards compatibility argument for keeping it active would win.
It would be rendered moot as far as any tooling that was updated to work with it (such as pip etc) however the browser level attacks would still be in play. If those can be solved essentially require giving mirror operators a SSL certificate for N.pypi.python.org (which still does not preclude a malicious mirror operator or a compromised mirror from being used to steal logins on PyPI).
Another potential alternative might be for Gocept to approach the PSF about getting an SSL certificate for that domain, ensuring pip and setuptools both support HSTS, and then switching that mirror over to using HSTS (so even configurations hardcoded to use http://f.pypi.python.org will still get a validated secure connection).
FWIW this doesn't make it secure unless they change their configuration to point to https://f.pypi.python.org/simple instead of http://f.pypi.python.org/simple.* Programmatic libraries typically don't support HSTS (as HSTS it primarily used to prevent attacks that don't typically apply to command line clients). And certainly none of the existing tools support it. So given that we'd be relying on the redirect that upgrades the connection to HTTPS an attacker could simply return HTML instead of the redirect to HTTPS. Hence why it makes sense to remove, because either way users will need to edit their existing configurations if they intend to be secure and moving to a different domain will prevent the in browser attacks as well. * This is also true of anyone who has hard coded an url to http://pypi.python.org/simple/ however there's no reasonable way to fix that.
Both of those approaches would close the security hole, while leaving the domain in place. If upgrading pip and easy_install clients is a more acceptable solution than updating affected configurations to use a different domain name, then these are certainly options we should discuss.
The only option which I consider completely out of the question is leaving f.pypi.python.org (or any other *.pypi.python.org subdomain) in place indefinitely as an insecure HTTP-only endpoint.
Other communities, like the Linux distributions, are doing simple, file-based stuff for ages. They did not learn from us, and AFAICT we didn't learn from them?
This case *is* a matter of us learning from other mirroring systems: none of them are based on delegating subdomains to third parties, they're all based on lists of mirror URLs, and some mechanism for retrieving that list. However, as long as the flawed way remains blessed as the official mirroring network, it's difficult for an alternative model to gain any traction.
Regards, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 7 August 2013 01:58, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 8:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One means by which I could see an f.pypi.python.org DNS record being left in place indefinitely is if the TUF folks are able to come up with a scheme for offering end-to-end security for the *existing* PyPI metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the TUF client side integrity checks are invoked by pip. In that case, the security argument regarding the lack of TLS on the subdomains would be rendered moot, and the backwards compatibility argument for keeping it active would win.
It would be rendered moot as far as any tooling that was updated to work with it (such as pip etc) however the browser level attacks would still be in play. If those can be solved essentially require giving mirror operators a SSL certificate for N.pypi.python.org (which still does not preclude a malicious mirror operator or a compromised mirror from being used to steal logins on PyPI).
We're not talking about "mirror operators" in an abstract sense any more: we're talking specifically about the possibility of a formal relationship with Gocept to keep the f.pypi.python.org domain alive indefinitely. It isn't Gocept's fault that the upstream mirroring system design was broken, so the burden is on us to work with Christian to come up with a transition plan that is acceptable to both sides. That may mean retiring the subdomain and Gocept changing the configuration at affected sites, or it may mean Gocept negotiating a more formal relationship with the PSF to continue operating f.pypi.python.org in particular. Both options need to be on the table from the upstream side, giving Gocept a chance to assess the alternatives (i.e. change existing sites to point to a new domain, or accept that some existing sites will remain insecure until they have been upgraded, just as we have to accept that old clients will remain insecure when accessing the main site over HTTP).
Another potential alternative might be for Gocept to approach the PSF about getting an SSL certificate for that domain, ensuring pip and setuptools both support HSTS, and then switching that mirror over to using HSTS (so even configurations hardcoded to use http://f.pypi.python.org will still get a validated secure connection).
FWIW this doesn't make it secure unless they change their configuration to point to https://f.pypi.python.org/simple instead of http://f.pypi.python.org/simple.*
Programmatic libraries typically don't support HSTS (as HSTS it primarily used to prevent attacks that don't typically apply to command line clients). And certainly none of the existing tools support it. So given that we'd be relying on the redirect that upgrades the connection to HTTPS an attacker could simply return HTML instead of the redirect to HTTPS.
Yeah, I realised this flaw after posting. An alternative hack that would allow the problem to be solved through an "upgrade pip/easy_install" solution rather than an "ensure all configs have been edited" approach would be a simple URL translation map that converts legacy "http://f.pypi.python.org" references to a new URL.
Hence why it makes sense to remove, because either way users will need to edit their existing configurations if they intend to be secure and moving to a different domain will prevent the in browser attacks as well.
* This is also true of anyone who has hard coded an url to http://pypi.python.org/simple/ however there's no reasonable way to fix that.
Hardcoded references to http://f.pypi.python.org/simple/ aren't *that* different from hardcoded references to the main site. The only addition is the inclusion of Gocept in the chain of trust. Given that Christian wrote the now recommended mirroring client, trusting Christian/Gocept is fairly unavoidable at this point :) We don't have a hard deadline for fixing this on the upstream side - it's in the "important but not currently urgent" category. If we can get the active legacy mirrors down to just f.pypi.python.org that will be solid progress, and then we can work out a specific arrangement for that last mirror which works for Gocept as well. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Aug 6, 2013, at 9:10 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 August 2013 01:58, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 8:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One means by which I could see an f.pypi.python.org DNS record being left in place indefinitely is if the TUF folks are able to come up with a scheme for offering end-to-end security for the *existing* PyPI metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the TUF client side integrity checks are invoked by pip. In that case, the security argument regarding the lack of TLS on the subdomains would be rendered moot, and the backwards compatibility argument for keeping it active would win.
It would be rendered moot as far as any tooling that was updated to work with it (such as pip etc) however the browser level attacks would still be in play. If those can be solved essentially require giving mirror operators a SSL certificate for N.pypi.python.org (which still does not preclude a malicious mirror operator or a compromised mirror from being used to steal logins on PyPI).
We're not talking about "mirror operators" in an abstract sense any more: we're talking specifically about the possibility of a formal relationship with Gocept to keep the f.pypi.python.org domain alive indefinitely.
It isn't Gocept's fault that the upstream mirroring system design was broken, so the burden is on us to work with Christian to come up with a transition plan that is acceptable to both sides.
I recognize this. The problem is that Gocept isn't the sole user of their mirror. If they were (or we knew the set of users who were) then we could approach each one of them.
That may mean retiring the subdomain and Gocept changing the configuration at affected sites, or it may mean Gocept negotiating a more formal relationship with the PSF to continue operating f.pypi.python.org in particular. Both options need to be on the table from the upstream side, giving Gocept a chance to assess the alternatives (i.e. change existing sites to point to a new domain, or accept that some existing sites will remain insecure until they have been upgraded, just as we have to accept that old clients will remain insecure when accessing the main site over HTTP).
Gocept can't make that decision for installations other than their own. Christian said that his mirror is seeing 150-300GB of traffic besides the traffic Gocept generates. Some of that is coming from ``pip --use-mirrors`` I'm sure, but some of it is also coming from other people who have hardcoded f.pypi.python.org in their config for whom we don't know who they are and we don't have a good way of informing them so they can make an informed consent on using an insecure transport.
Another potential alternative might be for Gocept to approach the PSF about getting an SSL certificate for that domain, ensuring pip and setuptools both support HSTS, and then switching that mirror over to using HSTS (so even configurations hardcoded to use http://f.pypi.python.org will still get a validated secure connection).
FWIW this doesn't make it secure unless they change their configuration to point to https://f.pypi.python.org/simple instead of http://f.pypi.python.org/simple.*
Programmatic libraries typically don't support HSTS (as HSTS it primarily used to prevent attacks that don't typically apply to command line clients). And certainly none of the existing tools support it. So given that we'd be relying on the redirect that upgrades the connection to HTTPS an attacker could simply return HTML instead of the redirect to HTTPS.
Yeah, I realised this flaw after posting. An alternative hack that would allow the problem to be solved through an "upgrade pip/easy_install" solution rather than an "ensure all configs have been edited" approach would be a simple URL translation map that converts legacy "http://f.pypi.python.org" references to a new URL.
Hence why it makes sense to remove, because either way users will need to edit their existing configurations if they intend to be secure and moving to a different domain will prevent the in browser attacks as well.
* This is also true of anyone who has hard coded an url to http://pypi.python.org/simple/ however there's no reasonable way to fix that.
Hardcoded references to http://f.pypi.python.org/simple/ aren't *that* different from hardcoded references to the main site. The only addition is the inclusion of Gocept in the chain of trust. Given that Christian wrote the now recommended mirroring client, trusting Christian/Gocept is fairly unavoidable at this point :)
The main difference is hard coded references to PyPI is unlikely. The clients defaulted to that so there was no reason to point to it in your configuration. This is representative of the traffic we see coming into PyPI and the decline of HTTP connections to /simple/.
We don't have a hard deadline for fixing this on the upstream side - it's in the "important but not currently urgent" category. If we can get the active legacy mirrors down to just f.pypi.python.org that will be solid progress, and then we can work out a specific arrangement for that last mirror which works for Gocept as well.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
How about building a deprecation period into the tooling? pip 1.5+ could warn users who are using *.pypi.python.org of the error in their ways and encourage them to switch to the new system and gives a date of total removal. After removal the code could also be removed from pip 1.x+. - Michael On Tue, Aug 6, 2013 at 8:36 PM, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 9:10 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 August 2013 01:58, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 8:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One means by which I could see an f.pypi.python.org DNS record being left in place indefinitely is if the TUF folks are able to come up with a scheme for offering end-to-end security for the *existing* PyPI metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the TUF client side integrity checks are invoked by pip. In that case, the security argument regarding the lack of TLS on the subdomains would be rendered moot, and the backwards compatibility argument for keeping it active would win.
It would be rendered moot as far as any tooling that was updated to work with it (such as pip etc) however the browser level attacks would still be in play. If those can be solved essentially require giving mirror operators a SSL certificate for N.pypi.python.org (which still does not preclude a malicious mirror operator or a compromised mirror from being used to steal logins on PyPI).
We're not talking about "mirror operators" in an abstract sense any more: we're talking specifically about the possibility of a formal relationship with Gocept to keep the f.pypi.python.org domain alive indefinitely.
It isn't Gocept's fault that the upstream mirroring system design was broken, so the burden is on us to work with Christian to come up with a transition plan that is acceptable to both sides.
I recognize this. The problem is that Gocept isn't the sole user of their mirror. If they were (or we knew the set of users who were) then we could approach each one of them.
That may mean retiring the subdomain and Gocept changing the configuration at affected sites, or it may mean Gocept negotiating a more formal relationship with the PSF to continue operating f.pypi.python.org in particular. Both options need to be on the table from the upstream side, giving Gocept a chance to assess the alternatives (i.e. change existing sites to point to a new domain, or accept that some existing sites will remain insecure until they have been upgraded, just as we have to accept that old clients will remain insecure when accessing the main site over HTTP).
Gocept can't make that decision for installations other than their own. Christian said that his mirror is seeing 150-300GB of traffic besides the traffic Gocept generates. Some of that is coming from ``pip --use-mirrors`` I'm sure, but some of it is also coming from other people who have hardcoded f.pypi.python.org in their config for whom we don't know who they are and we don't have a good way of informing them so they can make an informed consent on using an insecure transport.
Another potential alternative might be for Gocept to approach the PSF about getting an SSL certificate for that domain, ensuring pip and setuptools both support HSTS, and then switching that mirror over to using HSTS (so even configurations hardcoded to use http://f.pypi.python.org will still get a validated secure connection).
FWIW this doesn't make it secure unless they change their configuration
point to https://f.pypi.python.org/simple instead of http://f.pypi.python.org/simple.*
Programmatic libraries typically don't support HSTS (as HSTS it
to prevent attacks that don't typically apply to command line clients). And certainly none of the existing tools support it. So given that we'd be relying on
to primarily used the redirect
that upgrades the connection to HTTPS an attacker could simply return HTML instead of the redirect to HTTPS.
Yeah, I realised this flaw after posting. An alternative hack that would allow the problem to be solved through an "upgrade pip/easy_install" solution rather than an "ensure all configs have been edited" approach would be a simple URL translation map that converts legacy "http://f.pypi.python.org" references to a new URL.
Hence why it makes sense to remove, because either way users will need to edit their existing configurations if they intend to be secure and moving to a different domain will prevent the in browser attacks as well.
* This is also true of anyone who has hard coded an url to http://pypi.python.org/simple/ however there's no reasonable way to fix that.
Hardcoded references to http://f.pypi.python.org/simple/ aren't *that* different from hardcoded references to the main site. The only addition is the inclusion of Gocept in the chain of trust. Given that Christian wrote the now recommended mirroring client, trusting Christian/Gocept is fairly unavoidable at this point :)
The main difference is hard coded references to PyPI is unlikely. The clients defaulted to that so there was no reason to point to it in your configuration. This is representative of the traffic we see coming into PyPI and the decline of HTTP connections to /simple/.
We don't have a hard deadline for fixing this on the upstream side - it's in the "important but not currently urgent" category. If we can get the active legacy mirrors down to just f.pypi.python.org that will be solid progress, and then we can work out a specific arrangement for that last mirror which works for Gocept as well.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On Aug 6, 2013, at 9:50 PM, Michael Merickel <mmericke@gmail.com> wrote:
How about building a deprecation period into the tooling? pip 1.5+ could warn users who are using *.pypi.python.org of the error in their ways and encourage them to switch to the new system and gives a date of total removal. After removal the code could also be removed from pip 1.x+.
- Michael
pip 1.5 already warns if you use ``--use-mirrors`` or ``--mirrors``. I suppose a warning could be added if you use -i N.pypi.python.org as well. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 08/06/2013 09:59 PM, Donald Stufft wrote:
On Aug 6, 2013, at 9:50 PM, Michael Merickel <mmericke@gmail.com <mailto:mmericke@gmail.com>> wrote:
How about building a deprecation period into the tooling? pip 1.5+ could warn users who are using *.pypi.python.org <http://pypi.python.org/> of the error in their ways and encourage them to switch to the new system and gives a date of total removal. After removal the code could also be removed from pip 1.x+.
- Michael
pip 1.5 already warns if you use ``--use-mirrors`` or ``--mirrors``. I suppose a warning could be added if you use -i N.pypi.python.org <http://N.pypi.python.org> as well.
Does anyone use anything other than pip to download from N.pypi.python.org?
On Aug 6, 2013, at 10:11 PM, Trishank Karthik Kuppusamy <tk47@students.poly.edu> wrote:
On 08/06/2013 09:59 PM, Donald Stufft wrote:
On Aug 6, 2013, at 9:50 PM, Michael Merickel <mmericke@gmail.com> wrote:
How about building a deprecation period into the tooling? pip 1.5+ could warn users who are using *.pypi.python.org of the error in their ways and encourage them to switch to the new system and gives a date of total removal. After removal the code could also be removed from pip 1.x+.
- Michael
pip 1.5 already warns if you use ``--use-mirrors`` or ``--mirrors``. I suppose a warning could be added if you use -i N.pypi.python.org as well.
Does anyone use anything other than pip to download from N.pypi.python.org?
Yes. Other tooling can be pointed to N.pypi.python.org by specifying them as an index. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Aug 6, 2013, at 5:22 AM, Nick Coghlan wrote:
On 6 August 2013 16:59, Christian Theune <ct@gocept.com> wrote:
Hi,
Thanks for all the feedback, I'll calm down a bit and ponder some more structured reply.
However, you're responding to the technicalities. I didn't see any consideration to the user pain. It seems irrelevant. Almost like arguing with the TSA about taking off your shoes.
User pain is the only reason for not making the change tomorrow. People need time to adjust, or to propose alternative solutions.
My reasoning for picking 4 months total on the migration is that an individual user switching their mirror hostnames is a relatively quick process (maybe a few days in a really big case) and anyone that doesn't hear about this change within a few months is highly unlikely to learn about it in a larger period of time. Humans are generally deadline-driven, so moving the dealing back doesn't get us much except moving the conversion work back with it. Basically I think paste the 6-8 weeks mark, we are just hitting the long tail in terms of actual benefit to users, and it is better to just break the system and force them to notice they need to fix things (since one reason for doing this is current system is unsafe and allowing that to exist for another year is not really on my list). --Noah
On Mon, Aug 05, 2013 at 23:31 -0700, Noah Kantrowitz wrote:
On Aug 5, 2013, at 11:11 PM, Christian Theune <ct@gocept.com> wrote:
Two more things:
why is the CDN not suffering from the security problems you describe for the mirrors?
a) Fastly seems to be the one owning the certificate for pypi.python.org. What?!?
They have a delegated SAN for it, which digicert (the CA) authorizes with the domain contact (the board in this case).
b) What does stop Fastly from introducing incorrect/rogue code in package downloads?
Basically this one boils down to personal trust from me to the Fastly team combined with the other companies using them being very reputable. At the end of the day, there is not currently any cryptographic mechanism preventing Fastly from doing bad things.
The problem is not so much trusting individuals but that the companies in question are based in the US. If its government wants to temporarily serve backdoored packages to select regions, they could silently force Fastly to do it. I guess the only way around this is to work with pypi- and eventually author/maintainer-signatures and verification. best, holger
On Aug 5, 2013, at 11:56 PM, holger krekel <holger@merlinux.eu> wrote:
On Mon, Aug 05, 2013 at 23:31 -0700, Noah Kantrowitz wrote:
On Aug 5, 2013, at 11:11 PM, Christian Theune <ct@gocept.com> wrote:
Two more things:
why is the CDN not suffering from the security problems you describe for the mirrors?
a) Fastly seems to be the one owning the certificate for pypi.python.org. What?!?
They have a delegated SAN for it, which digicert (the CA) authorizes with the domain contact (the board in this case).
b) What does stop Fastly from introducing incorrect/rogue code in package downloads?
Basically this one boils down to personal trust from me to the Fastly team combined with the other companies using them being very reputable. At the end of the day, there is not currently any cryptographic mechanism preventing Fastly from doing bad things.
The problem is not so much trusting individuals but that the companies in question are based in the US. If its government wants to temporarily serve backdoored packages to select regions, they could silently force Fastly to do it. I guess the only way around this is to work with pypi- and eventually author/maintainer-signatures and verification.
No, I have carefully selected whom I trust to work with on the PSF infrastructure. I can promise you there is a 100% chance that the head of Fastly would sooner shut down the company than allow a government interdiction of any kind. I extend this trust to Dyn and OSL as well, and I do not do so lightly. --Noah
On Aug 6, 2013, at 2:56 AM, holger krekel <holger@merlinux.eu> wrote:
On Mon, Aug 05, 2013 at 23:31 -0700, Noah Kantrowitz wrote:
On Aug 5, 2013, at 11:11 PM, Christian Theune <ct@gocept.com> wrote:
Two more things:
why is the CDN not suffering from the security problems you describe for the mirrors?
a) Fastly seems to be the one owning the certificate for pypi.python.org. What?!?
They have a delegated SAN for it, which digicert (the CA) authorizes with the domain contact (the board in this case).
b) What does stop Fastly from introducing incorrect/rogue code in package downloads?
Basically this one boils down to personal trust from me to the Fastly team combined with the other companies using them being very reputable. At the end of the day, there is not currently any cryptographic mechanism preventing Fastly from doing bad things.
The problem is not so much trusting individuals but that the companies in question are based in the US. If its government wants to temporarily serve backdoored packages to select regions, they could silently force Fastly to do it. I guess the only way around this is to work with pypi- and eventually author/maintainer-signatures and verification.
PyPI is hosted in the US. Anything the Government could do to Fastly it could do to OSUOL where PyPI is hosted. The solution to that is signature validation but I think it's premature to worry too much about that when there are lower hanging fruit that don't require the US Government deciding to backdoor packages.
best, holger _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Quoting holger krekel <holger@merlinux.eu>:
The problem is not so much trusting individuals but that the companies in question are based in the US. If its government wants to temporarily serve backdoored packages to select regions, they could silently force Fastly to do it. I guess the only way around this is to work with pypi- and eventually author/maintainer-signatures and verification.
Both are actually in place, just not widely used. Each simple page gets a pypi signature, in /serversig, which would allow to validate that a mirror or the CDN has the copy that is also on the master. For author signatures, PGP has been available for quite some time. As with any author signature, you then need to convince yourself that the key actually belongs to the author. Regards, Martin
On Aug 6, 2013, at 3:03 AM, martin@v.loewis.de wrote:
Quoting holger krekel <holger@merlinux.eu>:
The problem is not so much trusting individuals but that the companies in question are based in the US. If its government wants to temporarily serve backdoored packages to select regions, they could silently force Fastly to do it. I guess the only way around this is to work with pypi- and eventually author/maintainer-signatures and verification.
Both are actually in place, just not widely used. Each simple page gets a pypi signature, in /serversig, which would allow to validate that a mirror or the CDN has the copy that is also on the master.
Unless I'm forgetting something there's no real way to get the server key without going through Fastly, and even if there was Fastly could just hijack an upload (and murder their entire business in the process).
Regards, Martin
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Quoting Donald Stufft <donald@stufft.io>:
Unless I'm forgetting something there's no real way to get the server key without going through Fastly
You should have a copy of the server key upfront, on your disk. You can still get it directly from pypi with HTTP request to pypi.into.python.org/serverkey.
and even if there was Fastly could just hijack an upload (and murder their entire business in the process).
Couldn't you also use pypi.int.python.org for uploading? Regards, Martin
On Aug 6, 2013, at 3:29 AM, martin@v.loewis.de wrote:
Quoting Donald Stufft <donald@stufft.io>:
Unless I'm forgetting something there's no real way to get the server key without going through Fastly
You should have a copy of the server key upfront, on your disk.
You can still get it directly from pypi with HTTP request to pypi.into.python.org/serverkey.
and even if there was Fastly could just hijack an upload (and murder their entire business in the process).
Couldn't you also use pypi.int.python.org for uploading?
Regards, Martin
pypi.int.python.org is not a public name and has no promise on existing tomorrow. Even if it was it's HTTP only and thus now you have an attacker who can substitute his own key for the server key and his own serversig for packages downloaded over HTTP from a mirror. The same thing applies to uploading, so you remove the possibility of Fastly attacking you and open up the much wider chance that a MITM would attack you. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Aug 6, 2013, at 2:09 AM, Christian Theune <ct@gocept.com> wrote:
Hi,
looks like I'm late to the party to figure out that I'm going to be hurt again.
I'd like to suggest explicitly considering what is going to break due to this and how much work you are forcefully inflicting on others. My whole experience around the packaging (distribute/setuptools) and mirroring/CDN in this year estimates cost for my company somewhere between 10k-20k EUR just for keeping up with the breakage those changes incure. It might be that we're wonderfully stupid (..enough to contribute) and all of this causes no headaches for anybody else …. Overall, guessing that the packaging infrastructure is used by probably multiple thousands of companies then I'd expect that at least 100 of them might be experiencing problems like us. Juggling arbritrary numbers I can see that we're inflicting around a million EURs of cost that nobody asked for.
More specific statements below.
On 2013-08-04 22:25:01 +0000, Donald Stufft said:
Here's my PEP for Deprecating and Removing the Official Public Mirrors
It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst
Abstract ======= This PEP provides a path to deprecate and ultimately remove the official public mirroring infrastructure for `PyPI`_. It does not propose the removal of mirroring support in general.
-1 - maybe I don't have the right to speak up on CDN usage, but personally I feel it's a bad idea to delegate overall PyPI availability exclusively to a commercial third party. It's OK for me that we're using them to improve PyPI availability, but completely putting our faith in their hands, doesn't sound right to me.
Hm. Maybe I wasn't clear here? The mirrors don't go away, the only thing that goes away is the *.pypi.python.org names and the DNS discovery protocol.
Rationale ======== The PyPI mirroring infrastructure (defined in `PEP381`_) provides a means to mirror the content of PyPI used by the automatic installers. It also provides a method for autodiscovery of mirrors and a consistent naming scheme.
There are a number of problems with the official public mirrors:
* They give control over a \*.python.org domain name to a third party, allowing that third party to set or read cookies on the pypi.python.org and python.org domain name.
Agreed, that's a problem.
* The use of a sub domain of pypi.python.org means that the mirror operators will never be able to get a certificate of their own, and giving them one for a python.org domain name is unlikely to happen.
Agreed.
* They are often out of date, most often by several hours to a few days, but regularly several days and even months.
That's something that the mirroring infrastructure should have been constructed for. I completely agree that the way the mirroring was established was way sub-optimal. I think we can do better.
Better mirroring protocol is on my TODO list as well but isn't particularly related to this PEP except that the poor protocol certainly influences how useful the global mirrors can be.
* With the introduction of the CDN on PyPI the public mirroring infrastructure is not as important as it once was as the CDN is also a globally distributed network of servers which will function even if PyPI is down.
Well, now we have one breakage point more which keeps annoying me. This argument is not completely true. They may be getting better over time but we have invested heavily to accomodate the breakage - that needs to be balanced with some benefit in the near future.
Can you expand further what you mean here? I don't believe I understand what you're saying.
* Although there is provisions in place for it, there is currently no known installer which uses the authenticity checks discussed in `PEP381`_ which means that any download from a mirror is subject to attack by a malicious mirror operator, but further more due to the lack of TLS it also means that any download from a mirror is also subject to a MITM attack.
Again, I think that was a mistake during the introduction of the mirroring infrastructure: too few people, too confusing PEP.
See above about a new protocol being a TODO item for me, will likely be done in Warehouse.
* They have only ever been implemented by one installer (pip), and its implementation, besides being insecure, has serious issues with performance and is slated for removal with it's next release (1.5).
Only if you consider the mirror auto-discovery protocol. I'm not sure whether using DNS was such a smart move. A simple HTTP request to find mirrors would have been nice. I think we can still do that.
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
The auto detection is the main thing going away. You'll still be able to hand pick a mirror, it will just have a domain name owned by the mirror operator instead of one owned under *.pypi.python.org.
We are also thinking about providing system-level default configuration to hint tools like PIP and setuptools to a different default index that is closer from a network perspective. From a customer perspective this should be "PyPI".
I'd like to avoid breakage. Again, if you don't let me choose where to spend my time, I'd rather invest the time I need for cleaning up the breakage into something constructive.
The indices are in active use. f.pypi.python.org is seeing between 150-300GB of traffic per month, the patterns widely ranging over the last month. This is traffic that is not used internally from gocept.
Due to the number of issues, some of them very serious, and the CDN which more or less provides much of the same benefits this PEP proposes to first deprecate and then remove the public mirroring infrastructure. The ability to mirror and the method of mirroring will not be affected and the existing public mirrors are encouraged to acquire their own domains to host their mirrors on if they wish to continue hosting them.
The biggest benefit of the mirroring infrastructure is that it is intended to be de-centralized. As a community member I can step up and take over responsibility of availability, performance, and security of a mirror.
As a community member I have to completely submit to whatever the CDN does and contacting another community member who hopefully will be with us for a long time and stay in good contact with the CDN for us. That's centralization and I don't like that a bit.
Just to reiterate this doesn't remove the concept of mirroring at all, but it removes them from living under the PSF banner to living under the banner of the mirror operators.
Plan for Deprecation & Removal ============================= Immediately upon acceptance of this PEP documentation on PyPI will be updated to reflect the deprecated nature of the official public mirrors and will direct users to external resources like http://www.pypi-mirrors.org/ to discover unofficial public mirrors if they wish to use one.
On October 1st, 2013, roughly 2 months from the date of this PEP, the DNS names of the public mirrors ([a-g].pypi.python.org) will be changed to point back to PyPI which will be modified to accept requests from those domains. At this point in time the public mirrors will be considered deprecated.
Then, roughly 2 months after the release of the first version of pip to have mirroring support removed (currently slated for pip 1.5) the DNS entries for [a-g].pypi.python.org and last.pypi.python.org will be removed and PyPI will no longer accept requests at those domains.
Oh great. That means in about 4 months I have to go through *any installation that my company maintains* and sift through whether we're still referencing f.pypi.python.org anywhere.
Can I write a check?
Unofficial Public or Private Mirrors =================================== The mirroring protocol will continue to exist as defined in `PEP381`_ and people are encouraged to utilize to host unofficial public and private mirrors if they so desire. For operators of unofficial public or private mirrors the recommended mirroring client is `Bandersnatch`_.
Thanks for the recommendation.
Instead of this dance breaking many things yet again, I'd love if we could find a way forward keeping the infrastructure.
Some ideas:
- Take control of *.pypi.python.org back - Record other public names of the mirrors
It's planned that a externally maintained list (at least for the time being) will be used for recording the public names of the mirrors. Most likely http://pypi-mirrors.org/ which also includes some information to be able to select which mirror you'd like to use and is already recommended on the mirroring page.
- Use 301 redirects to send old installations over to the new mirror names.
If this is done, and it'd need to be cleared with Infra as far as us serving redirects. However we could lengthen the timeframe to give more time to handle the migration and as a mirror operator you can handle redirecting from N.pypi.python.org to your new domain until control is taken back. At some point though the hammer needs to come down on the N.pypi.python.org names because a long term goal is requiring TLS on all of python.org.
- Make it easier for community members to help maintain the list of mirrors.
This is what part of the goals of offloading mirror listing to something like pypi-mirrors.org would be (as well as any other site that wants to maintain a list).
- Make a better (faster) removal policy of mirrors if the owners are not responsive.
This becomes up to the decentralized sites to handle their own policy of what constitutes a reasonable removal policy.
- Make it easier for other community members to set up and maintain mirrors. I'm happy to improve bandersnatch where needed.
This is partially handled by the future TODO of a better protocol, as well as pushing the maintenance of lists onto the community itself.
Lastly, again, and I might be getting on everyones nerves.
Why does it seem that other communities have figured this out much simpler, with less hassle, and with no significant changes for years and we need to keep changing stuff over and over and over and break things over and over and over.
Other communities had the benefit of learning from our mistakes and a lot of the breakages in this area have been closing security holes that still exist in those other communities.
It's really hard for me to write this mail without cussing - the situation is very frustrating: the community dynamics seem to "want to move forward" where they from my perspective "wander left and right and break stuff like a drunken elephant driving a tank throught the Louvre".
Christian _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
-1 - maybe I don't have the right to speak up on CDN usage, but personally I feel it's a bad idea to delegate overall PyPI availability exclusively to a commercial third party.
Well, it's been done, and it was always a better idea than the way mirrors was implemented.
It's OK for me that we're using them to improve PyPI availability, but completely putting our faith in their hands, doesn't sound right to me.
We must put out faith in somebody's hands with regards to PyPI. That hasn't changed.
That's something that the mirroring infrastructure should have been constructed for. I completely agree that the way the mirroring was established was way sub-optimal. I think we can do better.
Only by building our own CDN. We won't do better than the ones that exist.
Well, now we have one breakage point more which keeps annoying me.
We do? How?
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
I agree that this is probably the best choice, and you can still do that.
I'd like to avoid breakage. Again, if you don't let me choose where to spend my time, I'd rather invest the time I need for cleaning up the breakage into something constructive.
The only breakage I can see in this proposal is that the [a-z] dns names go away. That would take four months. I think perhaps that's a bit short. I don't see why we can't keep them around for much longer. A way to find mirrors is needed, but perhaps not automatic, but for when pypi goes down. //Lennart
On Aug 6, 2013, at 2:36 AM, Lennart Regebro <regebro@gmail.com> wrote:
The only breakage I can see in this proposal is that the [a-z] dns names go away. That would take four months. I think perhaps that's a bit short. I don't see why we can't keep them around for much longer.
I think we're all willing to increase the time :) The current timeframe was somewhat arbitrarily suggested by Noah and I just rolled with it for a first draft of the PEP figuring if it was too short someone would (hopefully!) speak up. The other breakage is people relying on --use-mirrors in PyPI but that is NOPd in the upcoming pip 1.5, and the side effect of removing these names in older versions of pip simply won't get mirroring support (It's mirroring support still hits PyPI itself).
A way to find mirrors is needed, but perhaps not automatic, but for when pypi goes down.
Thank you for calling this out, I forgot to include in the PEP that moving the mirroring listing off of PyPI itself means that we increase the chances that the listing will be available if PyPI itself happens to be down. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Tue, Aug 06, 2013 at 08:36 +0200, Lennart Regebro wrote:
Well, now we have one breakage point more which keeps annoying me.
We do? How?
Christian, Donald and me invested considerable debugging time, repeatably, to accomodate Fastly/CDN issues. It required multiple rounds of changes on bandersnatch, devpi and to pypi.python.org source code. Apart from that there have been intermittent install/cache-inconsistency failures. Due to the fast response times of the people involved most of these issues didn't last for too long but the CDN did introduce a new breakage point. holger
Well, now we have one breakage point more which keeps annoying me.
We do? How?
Installations that mention a specific mirror in their configuration file (such as f.pypi.python.org) will break when the DNS name is removed.
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
I agree that this is probably the best choice, and you can still do that.
See above. He did that, and the PyPI maintainers will break it. Regards, Martin
On Aug 6, 2013, at 3:20 AM, martin@v.loewis.de wrote:
Well, now we have one breakage point more which keeps annoying me.
We do? How?
Installations that mention a specific mirror in their configuration file (such as f.pypi.python.org) will break when the DNS name is removed.
I don't see how this is relevant to his statement? He was talking about the CDN and I was asking him to clarify.
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
I agree that this is probably the best choice, and you can still do that.
See above. He did that, and the PyPI maintainers will break it.
I don't think anyones claimed that removing the names won't break things for people who directly referenced them, but it's an important step that we do that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 6 August 2013 17:30, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 3:20 AM, martin@v.loewis.de wrote:
See above. He did that, and the PyPI maintainers will break it.
I don't think anyones claimed that removing the names won't break things for people who directly referenced them, but it's an important step that we do that.
Right, but I think it's one where we can offer responsive mirror maintainers a generous time frame. We're down to only 5 mirrors using the *.pypi.python.org naming scheme anyway, so we should probably include contacting the maintainers directly in the transition plan. That makes the process: - we immediately stop handing out any new *.pypi.python.org mirror names (this has effectively happened already, the PEP will just be making it official) - the operators of the 5 current *.pypi.python.org mirrors are contacted directly, informing them of the plan to deprecate and remove those domain names, and offering the choice of two alternatives: 1. After 2 months (or earlier if requested), the domain name is redirected to the PyPI CDN and the mirror is effectively retired. 2 months after the release of pip 1.5, the name is removed entirely 2. The mirror operator establishes a 301 redirect to a HTTPS capable domain name they control and negotiates the time frame for retirement and removal of the *.pypi.python.org domain record with the PSF infrastructure team - after two months, last.pypi.python.org and any *.pypi.python.org mirror names which didn't request option 2 above are redirected to the CDN - two months after the release of pip 1.5, last.pypi.python.org and any *.pypi.python.org mirror names which didn't request option 2 above are removed from the DNS - the exact time frames for option 2 above will be worked out individually with the mirror operators that request it (that would be at least Christian for f.pypi.python.org, and perhaps some of the other mirror operators if they also choose option 2) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Aug 6, 2013, at 3:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 August 2013 17:30, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 3:20 AM, martin@v.loewis.de wrote:
See above. He did that, and the PyPI maintainers will break it.
I don't think anyones claimed that removing the names won't break things for people who directly referenced them, but it's an important step that we do that.
Right, but I think it's one where we can offer responsive mirror maintainers a generous time frame. We're down to only 5 mirrors using the *.pypi.python.org naming scheme anyway, so we should probably include contacting the maintainers directly in the transition plan.
That makes the process:
- we immediately stop handing out any new *.pypi.python.org mirror names (this has effectively happened already, the PEP will just be making it official) - the operators of the 5 current *.pypi.python.org mirrors are contacted directly, informing them of the plan to deprecate and remove those domain names, and offering the choice of two alternatives:
Minor point but it's 4 mirrors. The a mirror is simply an alias for PyPI itself which leaves, c, e, f, g.
1. After 2 months (or earlier if requested), the domain name is redirected to the PyPI CDN and the mirror is effectively retired. 2 months after the release of pip 1.5, the name is removed entirely 2. The mirror operator establishes a 301 redirect to a HTTPS capable domain name they control and negotiates the time frame for retirement and removal of the *.pypi.python.org domain record with the PSF infrastructure team
- after two months, last.pypi.python.org and any *.pypi.python.org mirror names which didn't request option 2 above are redirected to the CDN - two months after the release of pip 1.5, last.pypi.python.org and any *.pypi.python.org mirror names which didn't request option 2 above are removed from the DNS - the exact time frames for option 2 above will be worked out individually with the mirror operators that request it (that would be at least Christian for f.pypi.python.org, and perhaps some of the other mirror operators if they also choose option 2)
It's probably simpler to just lengthen the timeframe and allow early opt in to having the N.pypi.python.org redirected back to PyPI (Minor point, it doesn't actually go directly through the CDN because the CDN is configured to require SSL). I would much rather have the details laid out in the PEP than have the Infra team being placed in the line of fire. I think it would even be reasonable to not have a forced redirect to the CDN and instead say in N amount of time the DNS entries will be removed, and allow mirror operators to ask us to redirect their N.pypi.python.org back to the CDN if they've felt their migration is complete before N amount of time happens. The big question then becomes what is a reasonable value for N amount of time, the original proposal essentially used 4 months for no real reason. Would 6 months be better? 8? I think making this window _too_ long doesn't really do anything except delay the inevitable and the window should be decided on for what's a reasonable amount of time for people to move away from pointing directly at the N.pypi.python.org not delaying the need to do it until a later date.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 6 August 2013 17:59, Donald Stufft <donald@stufft.io> wrote:
It's probably simpler to just lengthen the timeframe and allow early opt in to having the N.pypi.python.org redirected back to PyPI (Minor point, it doesn't actually go directly through the CDN because the CDN is configured to require SSL).
I would much rather have the details laid out in the PEP than have the Infra team being placed in the line of fire. I think it would even be reasonable to not have a forced redirect to the CDN and instead say in N amount of time the DNS entries will be removed, and allow mirror operators to ask us to redirect their N.pypi.python.org back to the CDN if they've felt their migration is complete before N amount of time happens.
Sounds good to me.
The big question then becomes what is a reasonable value for N amount of time, the original proposal essentially used 4 months for no real reason. Would 6 months be better? 8? I think making this window _too_ long doesn't really do anything except delay the inevitable and the window should be decided on for what's a reasonable amount of time for people to move away from pointing directly at the N.pypi.python.org not delaying the need to do it until a later date.
I believe Christian gets to define reasonable on this point :) Plucking a date out of the air, though, why not: July 1, 2014, with 6 month, 3 month and 1 month warnings sent to the operators of mirrors that haven't yet been redirected back to PyPI. That's nearly 11 months away, and hopefully other changes will have settled down by then. If the mirror operators are happy their transition is complete before then, cool, otherwise they have a hard deadline to work with. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia On 6 August 2013 17:59, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 3:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 August 2013 17:30, Donald Stufft <donald@stufft.io> wrote:
On Aug 6, 2013, at 3:20 AM, martin@v.loewis.de wrote:
See above. He did that, and the PyPI maintainers will break it.
I don't think anyones claimed that removing the names won't break things for people who directly referenced them, but it's an important step that we do that.
Right, but I think it's one where we can offer responsive mirror maintainers a generous time frame. We're down to only 5 mirrors using the *.pypi.python.org naming scheme anyway, so we should probably include contacting the maintainers directly in the transition plan.
That makes the process:
- we immediately stop handing out any new *.pypi.python.org mirror names (this has effectively happened already, the PEP will just be making it official) - the operators of the 5 current *.pypi.python.org mirrors are contacted directly, informing them of the plan to deprecate and remove those domain names, and offering the choice of two alternatives:
Minor point but it's 4 mirrors. The a mirror is simply an alias for PyPI itself which leaves, c, e, f, g.
1. After 2 months (or earlier if requested), the domain name is redirected to the PyPI CDN and the mirror is effectively retired. 2 months after the release of pip 1.5, the name is removed entirely 2. The mirror operator establishes a 301 redirect to a HTTPS capable domain name they control and negotiates the time frame for retirement and removal of the *.pypi.python.org domain record with the PSF infrastructure team
- after two months, last.pypi.python.org and any *.pypi.python.org mirror names which didn't request option 2 above are redirected to the CDN - two months after the release of pip 1.5, last.pypi.python.org and any *.pypi.python.org mirror names which didn't request option 2 above are removed from the DNS - the exact time frames for option 2 above will be worked out individually with the mirror operators that request it (that would be at least Christian for f.pypi.python.org, and perhaps some of the other mirror operators if they also choose option 2)
It's probably simpler to just lengthen the timeframe and allow early opt in to having the N.pypi.python.org redirected back to PyPI (Minor point, it doesn't actually go directly through the CDN because the CDN is configured to require SSL).
I would much rather have the details laid out in the PEP than have the Infra team being placed in the line of fire. I think it would even be reasonable to not have a forced redirect to the CDN and instead say in N amount of time the DNS entries will be removed, and allow mirror operators to ask us to redirect their N.pypi.python.org back to the CDN if they've felt their migration is complete before N amount of time happens.
The big question then becomes what is a reasonable value for N amount of time, the original proposal essentially used 4 months for no real reason. Would 6 months be better? 8? I think making this window _too_ long doesn't really do anything except delay the inevitable and the window should be decided on for what's a reasonable amount of time for people to move away from pointing directly at the N.pypi.python.org not delaying the need to do it until a later date.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 06.08.13 09:59, schrieb Donald Stufft:
The big question then becomes what is a reasonable value for N amount of time, the original proposal essentially used 4 months for no real reason. Would 6 months be better? 8? I think making this window _too_ long doesn't really do anything except delay the inevitable and the window should be decided on for what's a reasonable amount of time for people to move away from pointing directly at the N.pypi.python.org not delaying the need to do it until a later date.
Assuming the main breakage comes from people having hard-coded the mirror names in configuration files: Why not leave the *.pypi names available "forever" (ten years), all pointing to the master? Regards, Martin -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIA084ACgkQavBT8H2dyNLqhQCdFa1N3X/x7K2pYyakDlfkAgDW u74Ani9rN6zQ9TTGxAtl48MI36SmzNxc =kAJm -----END PGP SIGNATURE-----
On Aug 6, 2013, at 6:45 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Assuming the main breakage comes from people having hard-coded the mirror names in configuration files: Why not leave the *.pypi names available "forever" (ten years), all pointing to the master?
The major reason (for me, Noah might have others as Infra lead) is that they have never been available via TLS, so everyone using them hard-coded is using them hard-coded as HTTP. A lot of those people likely don't realize that by using them they are risking a man in the middle attack. So by continuing to support them we are essentially continuing to enable a grossly insecure setting with the very likely case being the folks vulnerable to it have not made an informed decision to do so and instead have merely done what they thought was best practice. Ensuring that the transport is safe is one of my primary goals right now. A secondary (but minor) reason is simply one of logistics. Throughout various migrations around as things on PyPI settled the ones that do point back to PyPI have randomly become broken, sometimes for weeks or months. It's easy to miss checking all of them that they continue to work and I believe that it's better to have a clean break than half ass support those names. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Aug 5, 2013, at 11:09 PM, Christian Theune <ct@gocept.com> wrote:
Hi,
looks like I'm late to the party to figure out that I'm going to be hurt again.
I'd like to suggest explicitly considering what is going to break due to this and how much work you are forcefully inflicting on others. My whole experience around the packaging (distribute/setuptools) and mirroring/CDN in this year estimates cost for my company somewhere between 10k-20k EUR just for keeping up with the breakage those changes incure. It might be that we're wonderfully stupid (..enough to contribute) and all of this causes no headaches for anybody else …. Overall, guessing that the packaging infrastructure is used by probably multiple thousands of companies then I'd expect that at least 100 of them might be experiencing problems like us. Juggling arbritrary numbers I can see that we're inflicting around a million EURs of cost that nobody asked for.
More specific statements below.
On 2013-08-04 22:25:01 +0000, Donald Stufft said:
Here's my PEP for Deprecating and Removing the Official Public Mirrors
It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst
Abstract ======= This PEP provides a path to deprecate and ultimately remove the official public mirroring infrastructure for `PyPI`_. It does not propose the removal of mirroring support in general.
-1 - maybe I don't have the right to speak up on CDN usage, but personally I feel it's a bad idea to delegate overall PyPI availability exclusively to a commercial third party. It's OK for me that we're using them to improve PyPI availability, but completely putting our faith in their hands, doesn't sound right to me.
Rationale ======== The PyPI mirroring infrastructure (defined in `PEP381`_) provides a means to mirror the content of PyPI used by the automatic installers. It also provides a method for autodiscovery of mirrors and a consistent naming scheme.
There are a number of problems with the official public mirrors:
* They give control over a \*.python.org domain name to a third party, allowing that third party to set or read cookies on the pypi.python.org and python.org domain name.
Agreed, that's a problem.
* The use of a sub domain of pypi.python.org means that the mirror operators will never be able to get a certificate of their own, and giving them one for a python.org domain name is unlikely to happen.
Agreed.
* They are often out of date, most often by several hours to a few days, but regularly several days and even months.
That's something that the mirroring infrastructure should have been constructed for. I completely agree that the way the mirroring was established was way sub-optimal. I think we can do better.
* With the introduction of the CDN on PyPI the public mirroring infrastructure is not as important as it once was as the CDN is also a globally distributed network of servers which will function even if PyPI is down.
Well, now we have one breakage point more which keeps annoying me. This argument is not completely true. They may be getting better over time but we have invested heavily to accomodate the breakage - that needs to be balanced with some benefit in the near future.
To be clear, the CDN and other server-side improvements are not a hard-HA replacement like a local company mirror. You are exactly the use case that can and should be using a mirror for your own use. We are doing _nothing_ that disrupts this use case and will support is exactly as before.
* Although there is provisions in place for it, there is currently no known installer which uses the authenticity checks discussed in `PEP381`_ which means that any download from a mirror is subject to attack by a malicious mirror operator, but further more due to the lack of TLS it also means that any download from a mirror is also subject to a MITM attack.
Again, I think that was a mistake during the introduction of the mirroring infrastructure: too few people, too confusing PEP.
* They have only ever been implemented by one installer (pip), and its implementation, besides being insecure, has serious issues with performance and is slated for removal with it's next release (1.5).
Only if you consider the mirror auto-discovery protocol. I'm not sure whether using DNS was such a smart move. A simple HTTP request to find mirrors would have been nice. I think we can still do that.
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
We are also thinking about providing system-level default configuration to hint tools like PIP and setuptools to a different default index that is closer from a network perspective. From a customer perspective this should be "PyPI".
I'd like to avoid breakage. Again, if you don't let me choose where to spend my time, I'd rather invest the time I need for cleaning up the breakage into something constructive.
The indices are in active use. f.pypi.python.org is seeing between 150-300GB of traffic per month, the patterns widely ranging over the last month. This is traffic that is not used internally from gocept.
Due to the number of issues, some of them very serious, and the CDN which more or less provides much of the same benefits this PEP proposes to first deprecate and then remove the public mirroring infrastructure. The ability to mirror and the method of mirroring will not be affected and the existing public mirrors are encouraged to acquire their own domains to host their mirrors on if they wish to continue hosting them.
The biggest benefit of the mirroring infrastructure is that it is intended to be de-centralized. As a community member I can step up and take over responsibility of availability, performance, and security of a mirror.
As a community member I have to completely submit to whatever the CDN does and contacting another community member who hopefully will be with us for a long time and stay in good contact with the CDN for us. That's centralization and I don't like that a bit.
Plan for Deprecation & Removal ============================= Immediately upon acceptance of this PEP documentation on PyPI will be updated to reflect the deprecated nature of the official public mirrors and will direct users to external resources like http://www.pypi-mirrors.org/ to discover unofficial public mirrors if they wish to use one.
On October 1st, 2013, roughly 2 months from the date of this PEP, the DNS names of the public mirrors ([a-g].pypi.python.org) will be changed to point back to PyPI which will be modified to accept requests from those domains. At this point in time the public mirrors will be considered deprecated.
Then, roughly 2 months after the release of the first version of pip to have mirroring support removed (currently slated for pip 1.5) the DNS entries for [a-g].pypi.python.org and last.pypi.python.org will be removed and PyPI will no longer accept requests at those domains.
Oh great. That means in about 4 months I have to go through *any installation that my company maintains* and sift through whether we're still referencing f.pypi.python.org anywhere.
Yes, sorry, no matter how we procedurally handle this shutdown, this will be required. From doing this myself more times than I care to remember, I've not found the actual time difference between ~one month and a year+ mattering much, but having to do it at all is unlikely to change, just when and where.
Can I write a check?
Unofficial Public or Private Mirrors =================================== The mirroring protocol will continue to exist as defined in `PEP381`_ and people are encouraged to utilize to host unofficial public and private mirrors if they so desire. For operators of unofficial public or private mirrors the recommended mirroring client is `Bandersnatch`_.
Thanks for the recommendation.
Instead of this dance breaking many things yet again, I'd love if we could find a way forward keeping the infrastructure.
Some ideas:
- Take control of *.pypi.python.org back - Record other public names of the mirrors - Use 301 redirects to send old installations over to the new mirror names. - Make it easier for community members to help maintain the list of mirrors. - Make a better (faster) removal policy of mirrors if the owners are not responsive. - Make it easier for other community members to set up and maintain mirrors. I'm happy to improve bandersnatch where needed.
Between now and the first DNS change, I would absolutely recommend any current public mirrors to redirect users to their new domain name if they intend to have one, and we'll do whatever we can to help make users aware of the switch. I would rather have a clear timeline with fewer steps than add another stage where we (PSF) are issuing redirects to non-PSF servers. Very very +1 on the easier bandersnatch-ing though, I really would love to see more mirrors out there, I just don't want them associated with PyPI or python.org, and I don't want pip to be trying to auto-discover them. I am also hoping that pypi-mirrors.org will continue to operate as a community project (side note, I would be happy to assist with hosting for it if Ken reads this list and if thats a concern of his) and that the mirror operators can develop policies for things like this. I defer to Nick and Ken if they would like distutils-sig to be involved in that process, but as it stands Ken can apply whatever rules he wants to his mirror list. --Noah
On Aug 6, 2013, at 2:49 AM, Noah Kantrowitz <noah@coderanger.net> wrote:
I am also hoping that pypi-mirrors.org will continue to operate as a community project (side note, I would be happy to assist with hosting for it if Ken reads this list and if thats a concern of his) and that the mirror operators can develop policies for things like this.
Additionally if anyone else wants to maintain a list like this I think it would be more than appropriate to link to it in addition to pypi-mirrors.org on the page about mirroring. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Mon, Aug 05, 2013 at 23:49 -0700, Noah Kantrowitz wrote:
On Aug 5, 2013, at 11:09 PM, Christian Theune <ct@gocept.com> wrote: (...) Between now and the first DNS change, I would absolutely recommend any current public mirrors to redirect users to their new domain name if they intend to have one, and we'll do whatever we can to help make users aware of the switch. I would rather have a clear timeline with fewer steps than add another stage where we (PSF) are issuing redirects to non-PSF servers. Very very +1 on the easier bandersnatch-ing though, I really would love to see more mirrors out there, I just don't want them associated with PyPI or python.org, and I don't want pip to be trying to auto-discover them.
PyPI mirrors _are_ associated with PyPI and pypi.python.org. (Why) Do do want to flatly rule out pip/pypi.python.org support for managing mirrors? The perl CPAN mirroring provides this nice little machine-readable file: http://www.cpan.org/indices/mirrors.json and a python-equivalent could be consumed by pip, i guess. best, holger
On Aug 6, 2013, at 12:10 AM, holger krekel <holger@merlinux.eu> wrote:
On Mon, Aug 05, 2013 at 23:49 -0700, Noah Kantrowitz wrote:
On Aug 5, 2013, at 11:09 PM, Christian Theune <ct@gocept.com> wrote: (...) Between now and the first DNS change, I would absolutely recommend any current public mirrors to redirect users to their new domain name if they intend to have one, and we'll do whatever we can to help make users aware of the switch. I would rather have a clear timeline with fewer steps than add another stage where we (PSF) are issuing redirects to non-PSF servers. Very very +1 on the easier bandersnatch-ing though, I really would love to see more mirrors out there, I just don't want them associated with PyPI or python.org, and I don't want pip to be trying to auto-discover them.
PyPI mirrors _are_ associated with PyPI and pypi.python.org. (Why) Do do want to flatly rule out pip/pypi.python.org support for managing mirrors?
The perl CPAN mirroring provides this nice little machine-readable file:
http://www.cpan.org/indices/mirrors.json
and a python-equivalent could be consumed by pip, i guess.
Because at this time there is no Python package installer that can install from a public mirror in a way that makes me comfortable supporting it as an official resource. This could be addressed in pip by verifying the /simple signatures, but this mostly precludes improved mirroring mechanisms like that used by Crate. More to the point, I as the head of infrastructure am responsible for *.python.org, but if there is an issue with a mirror, be it downtime, server compromise, or anything else, me and my team can't do anything to fix that. This is, again, not a situation I am comfortable with. --Noah
On Tue, Aug 6, 2013 at 9:10 AM, holger krekel <holger@merlinux.eu> wrote:
PyPI mirrors _are_ associated with PyPI and pypi.python.org. (Why) Do do want to flatly rule out pip/pypi.python.org support for managing mirrors?
Automatic mirror discovery opens extra security holes until we have found some way to tighten up the security in general. Once we have a way of verifying packages that work and that doesn't rely on the mirror you are using, we could add it back. Indeed, just having a json list makes sense. //Lennart
On 6 August 2013 16:09, Christian Theune <ct@gocept.com> wrote:
Hi,
looks like I'm late to the party to figure out that I'm going to be hurt again.
That's why I asked for this to be put through the PEP process: to give it more visibility, and provide more opportunity for people potentially affected to have a chance to comment and offer alternatives. Giving third parties the opportunity to read python.org cookies indefinitely isn't an option. Everything else is negotiable.
I'd like to suggest explicitly considering what is going to break due to this and how much work you are forcefully inflicting on others. My whole experience around the packaging (distribute/setuptools) and mirroring/CDN in this year estimates cost for my company somewhere between 10k-20k EUR just for keeping up with the breakage those changes incure. It might be that we're wonderfully stupid (..enough to contribute) and all of this causes no headaches for anybody else …. Overall, guessing that the packaging infrastructure is used by probably multiple thousands of companies then I'd expect that at least 100 of them might be experiencing problems like us. Juggling arbritrary numbers I can see that we're inflicting around a million EURs of cost that nobody asked for.
More specific statements below.
On 2013-08-04 22:25:01 +0000, Donald Stufft said:
Here's my PEP for Deprecating and Removing the Official Public Mirrors
It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst
Abstract
=======
This PEP provides a path to deprecate and ultimately remove the official
public mirroring infrastructure for `PyPI`_. It does not propose the removal
of mirroring support in general.
-1 - maybe I don't have the right to speak up on CDN usage, but personally I feel it's a bad idea to delegate overall PyPI availability exclusively to a commercial third party. It's OK for me that we're using them to improve PyPI availability, but completely putting our faith in their hands, doesn't sound right to me.
Would you be happier if it said "the current incarnation of the public mirroring infrastructure"? I have no objections to somebody proposing a *new* less broken mirroring process.
That's something that the mirroring infrastructure should have been constructed for. I completely agree that the way the mirroring was established was way sub-optimal. I think we can do better.
As noted above, this PEP is about killing off the *current* public mirroring system as being irredeemably broken. If that inspires somebody to come up with a more sensible alternative, so much the better.
* With the introduction of the CDN on PyPI the public mirroring infrastructure
is not as important as it once was as the CDN is also a globally distributed
network of servers which will function even if PyPI is down.
Well, now we have one breakage point more which keeps annoying me. This argument is not completely true. They may be getting better over time but we have invested heavily to accomodate the breakage - that needs to be balanced with some benefit in the near future.
That's why explicit mirror usage is still supported and recommended.
* Although there is provisions in place for it, there is currently no known
installer which uses the authenticity checks discussed in `PEP381`_ which
means that any download from a mirror is subject to attack by a malicious
mirror operator, but further more due to the lack of TLS it also means that
any download from a mirror is also subject to a MITM attack.
Again, I think that was a mistake during the introduction of the mirroring infrastructure: too few people, too confusing PEP.
Which is why *this* incarnation of it needs to go away.
* They have only ever been implemented by one installer (pip), and its
implementation, besides being insecure, has serious issues with performance
and is slated for removal with it's next release (1.5).
Only if you consider the mirror auto-discovery protocol. I'm not sure whether using DNS was such a smart move. A simple HTTP request to find mirrors would have been nice. I think we can still do that.
And can be done regardless of what happens to the current system.
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
Which will be unaffected for anyone not relying on a pypi.python.org subdomain.
We are also thinking about providing system-level default configuration to hint tools like PIP and setuptools to a different default index that is closer from a network perspective. From a customer perspective this should be "PyPI".
I'd like to avoid breakage. Again, if you don't let me choose where to spend my time, I'd rather invest the time I need for cleaning up the breakage into something constructive.
The indices are in active use. f.pypi.python.org is seeing between 150-300GB of traffic per month, the patterns widely ranging over the last month. This is traffic that is not used internally from gocept.
I think it would be suitable for the PEP to include an escape clause for maintainers of a domain to request that the PSF infrastructure team keep their subdomain active for longer than the general timeframe proposed, with a 301 redirect to a new host. This will need to be worked about between the infrastructure team and the maintainers of the specific instance.
The biggest benefit of the mirroring infrastructure is that it is intended to be de-centralized.
As a community member I can step up and take over responsibility of availability, performance, and security of a mirror.
And, indeed, that is still fully supported. What's going away is the delegation of pypi.python.org subdomains and the associated mirror auto-discovery system. There is no near term plan to create a replacement.
As a community member I have to completely submit to whatever the CDN does and contacting another community member who hopefully will be with us for a long time and stay in good contact with the CDN for us. That's centralization and I don't like that a bit.
Strictly speaking, you're submitting to the PSF infrastructure team, who manage the relationship with Fastly. Those interested in joining the infrastructure SIG can sign up here: http://mail.python.org/mailman/listinfo/infrastructure
Then, roughly 2 months after the release of the first version of pip to have
mirroring support removed (currently slated for pip 1.5) the DNS entries for
[a-g].pypi.python.org and last.pypi.python.org will be removed and PyPI will
no longer accept requests at those domains.
Oh great. That means in about 4 months I have to go through *any installation that my company maintains* and sift through whether we're still referencing f.pypi.python.org anywhere.
Can I write a check?
I think it makes sense for maintainers of particular mirrors to request a stay of execution until their traffic logs show everything coming in under an updated FQDN.
Some ideas:
- Take control of *.pypi.python.org back
- Record other public names of the mirrors
- Use 301 redirects to send old installations over to the new mirror names.
I think it makes sense for mirror maintainers to be able to request this process over the default handling (redirection to the PyPI CDN)
- Make it easier for community members to help maintain the list of mirrors.
- Make a better (faster) removal policy of mirrors if the owners are not responsive.
For these two points, I think having the PEP cover an addition and removal process for http://www.pypi-mirrors.org/ might make sense (assuming Ken is amenable to the idea).
- Make it easier for other community members to set up and maintain mirrors. I'm happy to improve bandersnatch where needed.
Lastly, again, and I might be getting on everyones nerves.
Why does it seem that other communities have figured this out much simpler, with less hassle, and with no significant changes for years and we need to keep changing stuff over and over and over and break things over and over and over.
Because the current structure of PyPI is fundamentally flawed, and we're still suffering the consequences more than a decade later. A software distribution index server should be a static filesystem that contains all the necessary metadata (including signatures) and can be mirrored with rsync. PyPI is far from being that :P Perl gets credit for CPAN, but something I only realised recently is that they probably deserve more credit for PAUSE, which is the *upload* side of CPAN. Much of the CPAN metadata is derived directly from the distributed software by PAUSE rather than relying on client side tools. That means CPAN can publish new metadata just by upgrading PAUSE - they don't need to worry about how people are doing the uploads. Also, CPAN, like Linux distro trees, can be mirrored with rsync rather than needing a custom client. It's much easier to maintain backwards compatibility when the only required server API is the ability to serve static files. The only things that have changed recently are that: - the rubygems.org compromise has made it obvious that sticking our heads in the sand and trusting the fact that there are easier targets out there to protect us is no longer an adequate answer - we've made the decision to try to fix the underlying brokenness rather than living with it forever - we have people willing to do the work to make that happen
It's really hard for me to write this mail without cussing - the situation is very frustrating: the community dynamics seem to "want to move forward" where they from my perspective "wander left and right and break stuff like a drunken elephant driving a tank throught the Louvre".
Hence the PEP process. That's the opportunity for those that are more aware of the backwards compatibility issues to provide a course correction to those of us that are concerned with starting to close the many and varied security vulnerabilities in the PyPI ecosystem. Cheers, Nick.
On Aug 6, 2013, at 12:01 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 August 2013 16:09, Christian Theune <ct@gocept.com> wrote:
Hi,
looks like I'm late to the party to figure out that I'm going to be hurt again.
That's why I asked for this to be put through the PEP process: to give it more visibility, and provide more opportunity for people potentially affected to have a chance to comment and offer alternatives. Giving third parties the opportunity to read python.org cookies indefinitely isn't an option.
Everything else is negotiable.
I'd like to suggest explicitly considering what is going to break due to this and how much work you are forcefully inflicting on others. My whole experience around the packaging (distribute/setuptools) and mirroring/CDN in this year estimates cost for my company somewhere between 10k-20k EUR just for keeping up with the breakage those changes incure. It might be that we're wonderfully stupid (..enough to contribute) and all of this causes no headaches for anybody else …. Overall, guessing that the packaging infrastructure is used by probably multiple thousands of companies then I'd expect that at least 100 of them might be experiencing problems like us. Juggling arbritrary numbers I can see that we're inflicting around a million EURs of cost that nobody asked for.
More specific statements below.
On 2013-08-04 22:25:01 +0000, Donald Stufft said:
Here's my PEP for Deprecating and Removing the Official Public Mirrors
It's source is at: https://github.com/dstufft/peps/blob/master/mirror-removal.rst
Abstract
=======
This PEP provides a path to deprecate and ultimately remove the official
public mirroring infrastructure for `PyPI`_. It does not propose the removal
of mirroring support in general.
-1 - maybe I don't have the right to speak up on CDN usage, but personally I feel it's a bad idea to delegate overall PyPI availability exclusively to a commercial third party. It's OK for me that we're using them to improve PyPI availability, but completely putting our faith in their hands, doesn't sound right to me.
Would you be happier if it said "the current incarnation of the public mirroring infrastructure"? I have no objections to somebody proposing a *new* less broken mirroring process.
That's something that the mirroring infrastructure should have been constructed for. I completely agree that the way the mirroring was established was way sub-optimal. I think we can do better.
As noted above, this PEP is about killing off the *current* public mirroring system as being irredeemably broken. If that inspires somebody to come up with a more sensible alternative, so much the better.
* With the introduction of the CDN on PyPI the public mirroring infrastructure
is not as important as it once was as the CDN is also a globally distributed
network of servers which will function even if PyPI is down.
Well, now we have one breakage point more which keeps annoying me. This argument is not completely true. They may be getting better over time but we have invested heavily to accomodate the breakage - that needs to be balanced with some benefit in the near future.
That's why explicit mirror usage is still supported and recommended.
* Although there is provisions in place for it, there is currently no known
installer which uses the authenticity checks discussed in `PEP381`_ which
means that any download from a mirror is subject to attack by a malicious
mirror operator, but further more due to the lack of TLS it also means that
any download from a mirror is also subject to a MITM attack.
Again, I think that was a mistake during the introduction of the mirroring infrastructure: too few people, too confusing PEP.
Which is why *this* incarnation of it needs to go away.
* They have only ever been implemented by one installer (pip), and its
implementation, besides being insecure, has serious issues with performance
and is slated for removal with it's next release (1.5).
Only if you consider the mirror auto-discovery protocol. I'm not sure whether using DNS was such a smart move. A simple HTTP request to find mirrors would have been nice. I think we can still do that.
And can be done regardless of what happens to the current system.
Also, not everyone wants or needs auto-detection the way that the protocol describes it. I personally just hand-pick a mirror (my own, hah) and keep using that.
Which will be unaffected for anyone not relying on a pypi.python.org subdomain.
We are also thinking about providing system-level default configuration to hint tools like PIP and setuptools to a different default index that is closer from a network perspective. From a customer perspective this should be "PyPI".
I'd like to avoid breakage. Again, if you don't let me choose where to spend my time, I'd rather invest the time I need for cleaning up the breakage into something constructive.
The indices are in active use. f.pypi.python.org is seeing between 150-300GB of traffic per month, the patterns widely ranging over the last month. This is traffic that is not used internally from gocept.
I think it would be suitable for the PEP to include an escape clause for maintainers of a domain to request that the PSF infrastructure team keep their subdomain active for longer than the general timeframe proposed, with a 301 redirect to a new host. This will need to be worked about between the infrastructure team and the maintainers of the specific instance.
The biggest benefit of the mirroring infrastructure is that it is intended to be de-centralized.
As a community member I can step up and take over responsibility of availability, performance, and security of a mirror.
And, indeed, that is still fully supported. What's going away is the delegation of pypi.python.org subdomains and the associated mirror auto-discovery system. There is no near term plan to create a replacement.
As a community member I have to completely submit to whatever the CDN does and contacting another community member who hopefully will be with us for a long time and stay in good contact with the CDN for us. That's centralization and I don't like that a bit.
Strictly speaking, you're submitting to the PSF infrastructure team, who manage the relationship with Fastly. Those interested in joining the infrastructure SIG can sign up here: http://mail.python.org/mailman/listinfo/infrastructure
All members of the Infra Staff team have access to the Fastly admin panel, and Fastly is aware of all of us as authorized to work on the PSF account. Bus factor of 4 is definitely not perfect, but as the one that is responsible for safeguarding such things, I am okay with it for now.
Then, roughly 2 months after the release of the first version of pip to have
mirroring support removed (currently slated for pip 1.5) the DNS entries for
[a-g].pypi.python.org and last.pypi.python.org will be removed and PyPI will
no longer accept requests at those domains.
Oh great. That means in about 4 months I have to go through *any installation that my company maintains* and sift through whether we're still referencing f.pypi.python.org anywhere.
Can I write a check?
I think it makes sense for maintainers of particular mirrors to request a stay of execution until their traffic logs show everything coming in under an updated FQDN.
Some ideas:
- Take control of *.pypi.python.org back
- Record other public names of the mirrors
- Use 301 redirects to send old installations over to the new mirror names.
I think it makes sense for mirror maintainers to be able to request this process over the default handling (redirection to the PyPI CDN)
- Make it easier for community members to help maintain the list of mirrors.
- Make a better (faster) removal policy of mirrors if the owners are not responsive.
For these two points, I think having the PEP cover an addition and removal process for http://www.pypi-mirrors.org/ might make sense (assuming Ken is amenable to the idea).
- Make it easier for other community members to set up and maintain mirrors. I'm happy to improve bandersnatch where needed.
Lastly, again, and I might be getting on everyones nerves.
Why does it seem that other communities have figured this out much simpler, with less hassle, and with no significant changes for years and we need to keep changing stuff over and over and over and break things over and over and over.
Because the current structure of PyPI is fundamentally flawed, and we're still suffering the consequences more than a decade later. A software distribution index server should be a static filesystem that contains all the necessary metadata (including signatures) and can be mirrored with rsync. PyPI is far from being that :P
Perl gets credit for CPAN, but something I only realised recently is that they probably deserve more credit for PAUSE, which is the *upload* side of CPAN. Much of the CPAN metadata is derived directly from the distributed software by PAUSE rather than relying on client side tools. That means CPAN can publish new metadata just by upgrading PAUSE - they don't need to worry about how people are doing the uploads.
Also, CPAN, like Linux distro trees, can be mirrored with rsync rather than needing a custom client. It's much easier to maintain backwards compatibility when the only required server API is the ability to serve static files.
I will fight any attempt to do this with every fiber of my being. This kind of "dumb server" API means that any metadata indexing or searching either needs to be precomputed or implemented in a much more intelligent client. This is already somewhat the case with pip, and as someone that has to deal with multiple client implementations it makes me very sad that I can't just call a REST endpoint to know what will be installed when I do a thing. This is neither here nor there, but I wanted to stake out my grounds so I can growl when people get too close :)
The only things that have changed recently are that: - the rubygems.org compromise has made it obvious that sticking our heads in the sand and trusting the fact that there are easier targets out there to protect us is no longer an adequate answer - we've made the decision to try to fix the underlying brokenness rather than living with it forever - we have people willing to do the work to make that happen
There is also finally much closer coordination between the whole stack of the packaging teams, which means that changes that once took years can now happen in a day or two. This definitely manifests as a possibly frustrating rate of changes compared to previously. --Noah
On 6 August 2013 17:13, Noah Kantrowitz <noah@coderanger.net> wrote:
Also, CPAN, like Linux distro trees, can be mirrored with rsync rather than needing a custom client. It's much easier to maintain backwards compatibility when the only required server API is the ability to serve static files.
I will fight any attempt to do this with every fiber of my being. This kind of "dumb server" API means that any metadata indexing or searching either needs to be precomputed or implemented in a much more intelligent client. This is already somewhat the case with pip, and as someone that has to deal with multiple client implementations it makes me very sad that I can't just call a REST endpoint to know what will be installed when I do a thing. This is neither here nor there, but I wanted to stake out my grounds so I can growl when people get too close :)
I agree having a smart server is good, I just think exposing a dumb, easy to mirror, signed data store is good, too :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Aug 06, 2013 at 17:19 +1000, Nick Coghlan wrote:
On 6 August 2013 17:13, Noah Kantrowitz <noah@coderanger.net> wrote:
Also, CPAN, like Linux distro trees, can be mirrored with rsync rather than needing a custom client. It's much easier to maintain backwards compatibility when the only required server API is the ability to serve static files.
I will fight any attempt to do this with every fiber of my being. This kind of "dumb server" API means that any metadata indexing or searching either needs to be precomputed or implemented in a much more intelligent client. This is already somewhat the case with pip, and as someone that has to deal with multiple client implementations it makes me very sad that I can't just call a REST endpoint to know what will be installed when I do a thing. This is neither here nor there, but I wanted to stake out my grounds so I can growl when people get too close :)
I agree having a smart server is good, I just think exposing a dumb, easy to mirror, signed data store is good, too :)
FWIW I think CPAN is structured such that search sites operate on mirrored data. The master server thus can remain dumb. Sounds like a good recipe to me. It's a bit sad but i think even now we are struggling to meet CPAN's architecture and ease-of-use, let alone improve on it. holger
On Aug 6, 2013, at 3:24 AM, holger krekel <holger@merlinux.eu> wrote:
FWIW I think CPAN is structured such that search sites operate on mirrored data. The master server thus can remain dumb. Sounds like a good recipe to me.
It's a bit sad but i think even now we are struggling to meet CPAN's architecture and ease-of-use, let alone improve on it.
Changes like this aren't off the table in the future. Right now a lot of the work is being to bring some level of sanity to what is already there and then make it possible to iterate and hash out what we want a python package index to look like. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Quoting Nick Coghlan <ncoghlan@gmail.com>:
On 6 August 2013 16:09, Christian Theune <ct@gocept.com> wrote:
Hi,
looks like I'm late to the party to figure out that I'm going to be hurt again.
That's why I asked for this to be put through the PEP process: to give it more visibility, and provide more opportunity for people potentially affected to have a chance to comment and offer alternatives. Giving third parties the opportunity to read python.org cookies indefinitely isn't an option.
Define "third party". There are a number of organisations other than the PSF that can read python.org cookies. As Noah explains, it's a matter of trust. Noah chooses to trust Fastly, I choose to trust Christian Theune. We both have then imposed our trust on the community. In any case, I consider the cookie issue a red herring. Mirror operators could only steal cookies if users actually pointed their web browsers to the mirrors. They typically don't, since they use setuptools or pip, which doesn't even have access to the cookies. And, if a mirror operator actually does request cookies, there is a high risk in being caught in doing so. If that happens, the mirror operator will not only lose the mirror, but also lose community trust. Regards, Martin
On Aug 6, 2013, at 3:15 AM, martin@v.loewis.de wrote:
Quoting Nick Coghlan <ncoghlan@gmail.com>:
On 6 August 2013 16:09, Christian Theune <ct@gocept.com> wrote:
Hi,
looks like I'm late to the party to figure out that I'm going to be hurt again.
That's why I asked for this to be put through the PEP process: to give it more visibility, and provide more opportunity for people potentially affected to have a chance to comment and offer alternatives. Giving third parties the opportunity to read python.org cookies indefinitely isn't an option.
Define "third party". There are a number of organisations other than the PSF that can read python.org cookies.
As Noah explains, it's a matter of trust. Noah chooses to trust Fastly, I choose to trust Christian Theune. We both have then imposed our trust on the community.
Sure, but there's also a matter of the *number* of people trusted each new person to trust is another potential pain point. There's really no requirement to have the mirrors hosted on N.pypi.python.org. The fact they do is a legacy issue that can be corrected with a much better story for reliability and security.
In any case, I consider the cookie issue a red herring. Mirror operators could only steal cookies if users actually pointed their web browsers to the mirrors. They typically don't, since they use setuptools or pip, which doesn't even have access to the cookies. And, if a mirror operator actually does request cookies, there is a high risk in being caught in doing so. If that happens, the mirror operator will not only lose the mirror, but also lose community trust.
The cookie issue is very serious because it does not require someone to knowingly point their browser at N.pypi.python.org. A mirror operator could simply inline an image tag in a package, someone views the package page, and automatically makes a request to N.pypi.python.org which is sent the cookie and a script on N.pypi.python.org can read it. Also the claim that there is a high risk in being caught, there isn't really. It would be very easily to do this near silently.
Regards, Martin
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
It's really hard for me to write this mail without cussing - the situation is very frustrating: the community dynamics seem to "want to move forward" where they from my
Christian Theune <ct <at> gocept.com> writes: perspective "wander left and
right and break stuff like a drunken elephant driving a tank throught the Louvre".
Le Louvre has a rather large inner yard, so it may actually be doable - the elephant will have to duck at times, though. Regards Antoine.
participants (13)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Christian Theune
-
Donald Stufft
-
holger krekel
-
Justin Cappos
-
Lennart Regebro
-
martin@v.loewis.de
-
Michael Merickel
-
Nick Coghlan
-
Noah Kantrowitz
-
Richard Jones
-
Trishank Karthik Kuppusamy