PEP449 - Removal of the PyPI Mirror Auto Discovery and Naming Scheme
Ok round two of this PEP. I've made a number of modifications to it: * Changed the title to be more relevant to the actual proposal. * Cleared up language to make it obvious that this was explicitly the naming and auto discovery being removed and not the mirrors in general. * Removed pointless distinctions between "Official" and "Unofficial" mirrors * Removed comments about staleness and points about the CDN which were not directly related to this PEP as the PEP doesn't attempt to solve issues with the mirroring protocol, just the discovery and naming scheme. * Incorporate feedback from the thread and simplify the proposal as well as extend the time frame. * Respond to some of the feedback from the thread that wasn't incorporated. I've explicitly CC'd Christian to this email so he can hopefully see it easier. Abstract ======== This PEP provides a path to deprecate and ultimately remove the auto discovery of PyPI mirrors as well as the hard coded naming scheme which requires delegating a domain name under pypi.python.org to a third party. Rationale ========= The PyPI mirroring infrastructure (defined in `PEP381`_) provides a means to mirror the content of PyPI used by the automatic installers. It also provides a method for auto discovery of mirrors and a consistent naming scheme. There are a number of problems with the auto discovery protocol and the naming scheme: * They give control over a \*.python.org domain name to a third party, allowing that third party to set or read cookies on the pypi.python.org and python.org domain name. * The use of a sub domain of pypi.python.org means that the mirror operators will never be able to get a SSL certificate of their own, and giving them one for a python.org domain name is unlikely to happen. * The auto discovery uses an unauthenticated protocol (DNS). * The lack of a TLS certificate on these domains means that clients can not be sure that they have not been a victim of DNS poisoning or a MITM attack. * The auto discovery protocol was designed to enable a client to automatically select a mirror for use. This is no longer a requirement because the CDN that PyPI is now using a globally distributed network of servers which will automatically select one close to the client without any effort on the clients part. * The auto discovery protocol and use of the consistent naming scheme has only ever been implemented by one installer (pip), and its implementation, besides being insecure, has serious issues with performance and is slated for removal with it's next release (1.5). * While there are provisions in `PEP381`_ that would solve *some* of these issues for a dedicated client it would not solve the issues that affect a users browser. Additionally these provisions have not been implemented by any installer to date. Due to the number of issues, some of them very serious, and the CDN which provides most of the benefit of the auto discovery and consistent naming scheme this PEP proposes to first deprecate and then remove the [a..z].pypi.python.org names for mirrors and the last.pypi.python.org name for the auto discovery protocol. The ability to mirror and the method of mirror will not be affected and will continue to exist as written in `PEP381`_. Operators of existing mirrors are encouraged to acquire their own domains and certificates to use for their mirrors if they wish to continue hosting them. Plan for Deprecation & Removal ============================== Immediately upon acceptance of this PEP documentation on PyPI will be updated to reflect the deprecated nature of the official public mirrors and will direct users to external resources like http://www.pypi-mirrors.org/ to discover unofficial public mirrors if they wish to use one. Mirror operators, if they wish to continue operating their mirror, should acquire a domain name to represent their mirror and, if they are able, a TLS certificate. Once they have acquired a domain they should redirect their assigned N.pypi.python.org domain name to their new domain. On Feb 15th, 2014 the DNS entries for [a..z].pypi.python.org and last.pypi.python.org will be removed. At any time prior to Feb 15th, 2014 a mirror operator may request that their domain name be reclaimed by PyPI and pointed back at the master. Why Feb 15th, 2014 ------------------ The most critical decision of this PEP is the final cut off date. If the date is too soon then it needlessly punishes people by forcing them to drop everything to update their deployment scripts. If the date is too far away then the extended period of time does not help with the migration effort and merely puts off the migration until a later date. The date of Feb 15th, 2014 has been chosen because it is roughly 6 months from the date of the PEP. This should ensure a lengthy period of time to enable people to update their deployment procedures to point to the new domains names without merely padding the cut off date. Why the DNS entries must be removed ----------------------------------- While it would be possible to simply reclaim the domain names used in mirror and direct them back at PyPI in order to prevent users from needing to update configurations to point away from those domains this has a number of issues. * Anyone who currently has these names hard coded in their configuration has them hard coded as HTTP. This means that by allowing these names to continue resolving we make it simple for a MITM operator to attack users by rewriting the redirect to HTTPS prior to giving it to the client. * The overhead of maintaining several domains pointing at PyPI has proved troublesome for the small number of N.pypi.python.org domains that have already been reclaimed. They often times get mis-configured when things change on the service which often leaves them broken for months at a time until somebody notices. By leaving them in we leave users of these domains open to random breakages which are less likely to get caught or noticed. * People using these domains have explicitly chosen to use them for one reason or another. One such reason may be because they do not wish to deploy from a host located in a particular country. If these domains continue to resolve but do not point at their existing locations we have silently removed this choice from the existing users of those domains. That being said, removing the entries *will* require users who have modified their configuration to either point back at the master (PyPI) or select a new mirror name to point at. This is regarded as a regrettable requirement to protect PyPI itself and the users of the mirrors from the attacks outlined above or, at the very least, require them to make an informed decision about the insecurity. Public or Private Mirrors ========================= The mirroring protocol will continue to exist as defined in `PEP381`_ and people are encouraged to to host public and private mirrors if they so desire. The recommended mirroring client is `Bandersnatch`_. .. _PyPI: https://pypi.python.org/ .. _PEP381: http://www.python.org/dev/peps/pep-0381/ .. _Bandersnatch: https://pypi.python.org/pypi/bandersnatch ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Heh, this got a "Usenet nod" from me. I wanted to mention this in a talk tomorrow, so I posted the updated draft in the PEPs repo. Hopefully the February timeframe is generous enough for Christian to either talk to the PSF about a cert for f.pypi.python.org, or else migrate to a different domain name. I think the exact date is the main open question now - I believe the PEP makes a solid case for why the status quo can't continue indefinitely. Cheers, Nick.
On Aug 16, 2013, at 10:31 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Heh, this got a "Usenet nod" from me. I wanted to mention this in a talk tomorrow, so I posted the updated draft in the PEPs repo.
Hopefully the February timeframe is generous enough for Christian to either talk to the PSF about a cert for f.pypi.python.org, or else migrate to a different domain name. I think the exact date is the main open question now - I believe the PEP makes a solid case for why the status quo can't continue indefinitely.
Cheers, Nick.
I forgot that still needed updated, thanks! I know one person has said to me privately they plan on reviewing it still, but I haven't seen anything from Christian. Other then that I've not seen a lot of response so either people haven't noticed it or they were usenet nodding as well :D ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Oh FWIW pip got a CVE assigned earlier today for --use-mirrors which is effectively an implementation of what this PEP is removing. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 17. Aug2013, at 4:45 AM, Donald Stufft <donald@stufft.io> wrote:
On Aug 16, 2013, at 10:31 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Heh, this got a "Usenet nod" from me. I wanted to mention this in a talk tomorrow, so I posted the updated draft in the PEPs repo.
Hopefully the February timeframe is generous enough for Christian to either talk to the PSF about a cert for f.pypi.python.org, or else migrate to a different domain name. I think the exact date is the main open question now - I believe the PEP makes a solid case for why the status quo can't continue indefinitely.
Cheers, Nick.
I forgot that still needed updated, thanks!
I know one person has said to me privately they plan on reviewing it still, but I haven't seen anything from Christian.
Other then that I've not seen a lot of response so either people haven't noticed it or they were usenet nodding as well :D
Sorry for the stealth mode. The long thread has been sitting in my Inbox while the last week was very busy and we're hosting a sprint at our offices right now. I hope I will get around to reading and responding early next week. Christian -- Christian Theune · ct@gocept.com gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · Tel +49 345 1229889-7 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
On Aug 10, 2013, at 9:07 PM, Donald Stufft <donald@stufft.io> wrote:
[snip]
I guess I'm going to ask for some pronouncement on this? It's been two weeks with no real feedback. FWIW tangentially related to this proposal, g.pypi.python.org is now 16 days out of date. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sat, Aug 24, 2013 at 16:47 -0400, Donald Stufft wrote:
On Aug 10, 2013, at 9:07 PM, Donald Stufft <donald@stufft.io> wrote:
[snip]
I guess I'm going to ask for some pronouncement on this? It's been two weeks with no real feedback.
FWIW tangentially related to this proposal, g.pypi.python.org is now 16 days out of date.
Being one of the people who wanted to but didn't feedback (still in vacation, writing from a camping place with ssh/mutt and lousy connectivity FWIW): - the PEP claims that PEP381 mirroring protocol continues to exist. But are the statements in http://www.python.org/dev/peps/pep-0381/#statistics-page still valid, i.e. does pypi.python.org still crawl mirrors for statistics when the PEP449-DNS removal happens? Also PEP381 has seen some modifications and enhancements before and after the CDN introduction. - relatedly, I'd suggest to clarify that this PEP does at least not preclude further PEPs or attempts to introduce other means than DNS to manage PyPI mirrors (one where mirror availability is stored at and queried via a python.org address). Ideally, it should already incorporate a procedure to register mirrors and to list them at a web page. - maybe a "future work" section could list these issues. I guess one underlying question is how much we want to rely on the CDN mid/long-term. It's introduction was not discussed in a PEP but it is mentioned e.g. in PEP449 as a reason to shutdown mirror management infrastructure. That all being said, i am otherwise ok with PEP449 as DNS seems indeed the wrong way to handle mirror management. best, holger
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On Aug 27, 2013, at 7:10 AM, holger krekel <holger@merlinux.eu> wrote:
Being one of the people who wanted to but didn't feedback (still in vacation, writing from a camping place with ssh/mutt and lousy connectivity FWIW):
Ah! I hope you're having fun :)
- the PEP claims that PEP381 mirroring protocol continues to exist. But are the statements in http://www.python.org/dev/peps/pep-0381/#statistics-page still valid, i.e. does pypi.python.org still crawl mirrors for statistics when the PEP449-DNS removal happens? Also PEP381 has seen some modifications and enhancements before and after the CDN introduction.
PyPI hasn't crawled mirrors for statistics for 3 months? or so. Ever since download counts first got shut off and it has never been restored. Personally I have no plans to restore it. It caused needless complication and the download counts from the mirrors aren't that important IMO. The download counts are already inaccurate and are primarily useful as a form of relative comparison so the additional numbers do not aid much in the form of relativeness only in absolute counts (which as stated are already widely inaccurate). Perhaps PEP381 should be updates to take into account the new abilities added for mirrors that have happened recently (the Serials, the Headers etc). It also probably makes sense to update it since PyPI is no longer fetching statistics from the mirrors. I don't think we need a new mirror PEP though as the mirroring protocol is mostly the same and the enhancements exist primarily as a means of getting a more accurate mirror. I *do* have plans down the road to introduce a new mirroring protocol but that is a ways out still as there are other things higher up on my todo list.
- relatedly, I'd suggest to clarify that this PEP does at least not preclude further PEPs or attempts to introduce other means than DNS to manage PyPI mirrors (one where mirror availability is stored at and queried via a python.org address). Ideally, it should already incorporate a procedure to register mirrors and to list them at a web page.
I don't see this PEP as precluding anything else. Currently it points to http://pypi-mirrors.org/ as the place to locate new mirrors from in a manual fashion. I'm not too concerned with an automatic discovery protocol since the only installer as far as I'm aware that even used the existing one was pip which is removing that support in 1.5 anyways. That being said I'm not opposed to a new PEP introducing a different scheme but I probably won't be the architect of it and I can make a small update to it that it doesn't preclude further PEPs if that would make people feel more comfortable.
- maybe a "future work" section could list these issues.
I guess one underlying question is how much we want to rely on the CDN mid/long-term. It's introduction was not discussed in a PEP but it is mentioned e.g. in PEP449 as a reason to shutdown mirror management infrastructure.
Personally I see the CDN as the best option for the bulk of people wanting to install from a *public* mirror. There are of course situations where you might want to install from a different public mirror (China being a big one). I see mirrors mostly being useful for smaller use cases now. However I have no plans or desire to make the public mirrors go away other than the existing DNS names (and only then because of security concerns).
That all being said, i am otherwise ok with PEP449 as DNS seems indeed the wrong way to handle mirror management.
Awesome good to hear. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Howdi, On 17. Aug2013, at 7:55 AM, Christian Theune <ct@gocept.com> wrote:
Sorry for the stealth mode. The long thread has been sitting in my Inbox while the last week was very busy and we're hosting a sprint at our offices right now.
I hope I will get around to reading and responding early next week.
Well, almost. Picking the pieces back up, I'll stop bickering now [1]. I'm grateful for the offer to include gocept in a more formally trusted relationship. However, I don't think that adds more value and just defers paying back technical debt. I'll be a bit more aggressive with the migration and plan to do the following things: In the next days - Establish a new primary name for our mirror (probably pypi.rzob.gocept.net or something similar) - Inform our internal team that they should update f.pypi.python.org to the new name - Make f.pypi.python.org serve 301 redirects to the new name In two weeks: - Make f.pypi.python.org serve 401 Gone This should get your plan back on track. I'm happy with the proposal otherwise, thanks for taking the time to keep me informed! Christian PS: Not sure whether I missed anything important in the meantime … ;) [1] I still feel some need for personal interaction confirming or disproving some of my thoughts, but I don't think it's worthwhile to do that on a mailinglist. Personally I feel a bit stupid for the amount of pain we experience with the packaging right now - especially as we try to stay in touch and contribute. I don't mind the work in general but it would be nice to have less pain … ? -- Christian Theune · gocept gmbh & co. kg flyingcircus.io · operations as a service Forsterstraße 29 · 06112 Halle (Saale) · Tel +49 345 1229889-7
On 27. Aug2013, at 5:22 PM, Christian Theune <ct@gocept.com> wrote:
Howdi,
On 17. Aug2013, at 7:55 AM, Christian Theune <ct@gocept.com> wrote:
Sorry for the stealth mode. The long thread has been sitting in my Inbox while the last week was very busy and we're hosting a sprint at our offices right now.
I hope I will get around to reading and responding early next week.
Well, almost.
Picking the pieces back up, I'll stop bickering now [1].
I'm grateful for the offer to include gocept in a more formally trusted relationship. However, I don't think that adds more value and just defers paying back technical debt.
I'll be a bit more aggressive with the migration and plan to do the following things:
In the next days
- Establish a new primary name for our mirror (probably pypi.rzob.gocept.net or something similar)
The new primary name is "pypi.gocept.com". This already existed before.
- Inform our internal team that they should update f.pypi.python.org to the new name - Make f.pypi.python.org serve 301 redirects to the new name
It does this now. I will also add a valid SSL certificate in the next minutes. What's your take on enforcing SSL e.g. via redirects? Christian -- Christian Theune · gocept gmbh & co. kg flyingcircus.io · operations as a service Forsterstraße 29 · 06112 Halle (Saale) · Tel +49 345 1229889-7
On 8/28/13 8:37 AM, Christian Theune wrote:
I will also add a valid SSL certificate in the next minutes. What's your take on enforcing SSL e.g. via redirects?
I am not an expert, but I guess this depends on who is enforcing the SSL redirection. If someone untrusted can be a man-in-the-middle between your clients and http://pypi.gocept.com, then this man-in-the-middle should be able to redirect your HTTP-only clients anywhere else. I would venture that the best thing to do, if feasible, is to get your clients to point strictly to https://pypi.gocept.com and test that pip
= 1.3 verifies the SSL connection.
On 28. Aug2013, at 4:03 PM, Trishank Karthik Kuppusamy <tk47@students.poly.edu> wrote:
On 8/28/13 8:37 AM, Christian Theune wrote:
I will also add a valid SSL certificate in the next minutes. What's your take on enforcing SSL e.g. via redirects?
I am not an expert, but I guess this depends on who is enforcing the SSL redirection. If someone untrusted can be a man-in-the-middle between your clients and http://pypi.gocept.com, then this man-in-the-middle should be able to redirect your HTTP-only clients anywhere else.
Right. It doesn't add any security on its own, but it's a way that people can discover you're using SSL. :) I'll have to read up on how to do HSTS actually …
I would venture that the best thing to do, if feasible, is to get your clients to point strictly to https://pypi.gocept.com and test that pip >= 1.3 verifies the SSL connection.
Right. Christian -- Christian Theune · gocept gmbh & co. kg flyingcircus.io · operations as a service Forsterstraße 29 · 06112 Halle (Saale) · Tel +49 345 1229889-7
On 08/28/2013 12:09 PM, Christian Theune wrote:
Right. It doesn't add any security on its own, but it's a way that people can discover you're using SSL. :) I'll have to read up on how to do HSTS actually …
That was my next question. Does pip honour HSTS? I could be wrong, but I do not think so...
On 29 Aug 2013 03:17, "Trishank Karthik Kuppusamy" <tk47@students.poly.edu> wrote:
On 08/28/2013 12:09 PM, Christian Theune wrote:
Right. It doesn't add any security on its own, but it's a way that people can discover you're using SSL. :) I'll have to read up on how to do HSTS actually …
That was my next question. Does pip honour HSTS? I could be wrong, but I do not think so...
It's likely worth checking with Donald and Noah how the SSL enforcement on PyPI itself is set up. I believe the aim was just to ensure browsers are always using HTTPS, while switching other tools to SSL still requires client side updates. Cheers, Nick.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On 08/28/2013 07:05 PM, Nick Coghlan wrote:
It's likely worth checking with Donald and Noah how the SSL enforcement on PyPI itself is set up. I believe the aim was just to ensure browsers are always using HTTPS, while switching other tools to SSL still requires client side updates.
Makes perfect sense to me!
On Aug 28, 2013, at 7:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 29 Aug 2013 03:17, "Trishank Karthik Kuppusamy" <tk47@students.poly.edu> wrote:
On 08/28/2013 12:09 PM, Christian Theune wrote:
Right. It doesn't add any security on its own, but it's a way that people can discover you're using SSL. :) I'll have to read up on how to do HSTS actually …
That was my next question. Does pip honour HSTS? I could be wrong, but I do not think so...
It's likely worth checking with Donald and Noah how the SSL enforcement on PyPI itself is set up. I believe the aim was just to ensure browsers are always using HTTPS, while switching other tools to SSL still requires client side updates.
Cheers, Nick.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
pip does not respect HSTS. It would be somewhat nice if it did but the primary purpose of HSTS is to prevent against SSL downgrade attacks and users own error by entering http:// instead of https://. It's less important in a tool like pip where https should be hardcoded. It's use would essentially work to remove user error if they accidentally enter a http:// url instead of a https:// url (which isn't a bad thing). HTTP on PyPI always redirects idempotent methods to HTTPS and it includes HSTS but it does generally require client side updates to switch to HTTPS (in part because it requires client side updates to even validate SSL). What somebody else said that redirecting HTTP to HTTPS is a nice signal to users they should be using HTTPS but it doesn't actually protect users as someone in a MITM position can intercept the redirect and just return content instead. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Hi, On 27. Aug2013, at 5:22 PM, Christian Theune <ct@gocept.com> wrote:
I'll be a bit more aggressive with the migration and plan to do the following things: […] In two weeks:
- Make f.pypi.python.org serve 401 Gone
This is done now. f.pypi.python.org does not serve any meaningful content any longer. I'd suggest keeping the DNS around for a few more weeks so people get to see the 410 and fix their setups. Christian -- Christian Theune · gocept gmbh & co. kg flyingcircus.io · operations as a service Forsterstraße 29 · 06112 Halle (Saale) · Tel +49 345 1229889-7
On 12 Sep 2013 17:24, "Christian Theune" <ct@gocept.com> wrote:
Hi,
On 27. Aug2013, at 5:22 PM, Christian Theune <ct@gocept.com> wrote:
I'll be a bit more aggressive with the migration and plan to do the
following things:
[…] In two weeks:
- Make f.pypi.python.org serve 401 Gone
This is done now. f.pypi.python.org does not serve any meaningful content any longer.
I'd suggest keeping the DNS around for a few more weeks so people get to see the 410 and fix their setups.
Thanks for working through this Christian! Cheers, Nick.
Christian
-- Christian Theune · gocept gmbh & co. kg flyingcircus.io · operations as a service Forsterstraße 29 · 06112 Halle (Saale) · Tel +49 345 1229889-7
participants (5)
-
Christian Theune
-
Donald Stufft
-
holger krekel
-
Nick Coghlan
-
Trishank Karthik Kuppusamy