RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

Hi, I wrote a PEP based on the previous thread "Backport ssl.MemoryBIO on Python 2.7?". Thanks for Cory Benfield, Alex Gaynor and Nick Coghlan who helped me to write it! HTML version: https://www.python.org/dev/peps/pep-0546/ Inline verison below. Victor PEP: 546 Title: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7 Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com>, Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-May-2017 Abstract ======== Backport the ssl.MemoryBIO and ssl.SSLObject classes from Python 3 to Python 2.7 to enhance the overall security of Python 2.7. Rationale ========= While Python 2.7 is getting closer to its end-of-support date (scheduled for 2020), it is still used on production systems and the Python community is still responsible for its security. This PEP will help facilitate the future adoption of :pep:`543` across all supported Python versions, which will improve security for both Python 2 and Python 3 users. This PEP does NOT propose a general exception for backporting new features to Python 2.7 - every new feature proposed for backporting will still need to be justified independently. In particular, it will need to be explained why relying on an independently updated backport on the Python Package Index instead is not an acceptable solution. PEP 543 ------- :pep:`543` defines a new TLS API for Python which would enhance Python security by giving Python applications access to the native TLS implementations on Windows and macOS, instead of using OpenSSL. A side effect is that it gives access to the system trust store and certificates installed locally by system administrators, enabling Python applications to use "company certificates" without having to modify each application and so to correctly validate TLS certificates (instead of having to ignore or bypass TLS certificate validation). For practical reasons, Cory Benfield would like to first implement an I/O-less class similar to ssl.MemoryBIO and ssl.SSLObject for :pep:`543`, and to provide a second class based on the first one to use sockets or file descriptors. This design would help to structure the code to support more backends and simplify testing and auditing, as well as implementation. Later, optimized classes using directly sockets or file descriptors may be added for performance. While :pep:`543` defines an API, the PEP would only make sense if it comes with at least one complete and good implementation. The first implementation would ideally be based on the ``ssl`` module of the Python standard library, as this is shipped to all users by default and can be used as a fallback implementation in the absence of anything more targetted. If this backport is not performed, the only baseline implementation that could be used would be pyOpenSSL. This is problematic, however, because of the interaction with pip, which is shipped with CPython on all supported versions. requests, pip and ensurepip --------------------------- There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO. In the absence of a Python 2.7 backport, Requests is required to basically use the same solution that Twisted currently does: namely, a mandatory dependency on `pyOpenSSL <https://pypi.python.org/pypi/pyOpenSSL>`_. The `pip <https://pip.pypa.io/>`_ program has to embed all its dependencies for practical reasons: namely, that it cannot rely on any other installation method being present. Since pip depends on requests, it means that it would have to embed a copy of pyOpenSSL. That would imply substantial usability pain to install pip. Currently, pip doesn't support embedding C extensions which must be compiled on each platform and so require a C compiler. Since Python 2.7.9, Python embeds a copy of pip both for default installation and for use in virtual environments via the new ``ensurepip`` module. If pip ends up bundling PyOpenSSL, then CPython will end up bundling PyOpenSSL. Only backporting ``ssl.MemoryBIO`` and ``ssl.SSLObject`` would avoid the need to embed pyOpenSSL, and would fix the bootstrap issue (python -> ensurepip -> pip -> requests -> MemoryBIO). Changes ======= Add ``MemoryBIO`` and ``SSLObject`` classes to the ``ssl`` module of Python 2.7. The code will be backported and adapted from the master branch (Python 3). The backport also significantly reduced the size of the Python 2/Python 3 difference of the ``_ssl`` module, which make maintenance easier. Links ===== * :pep:`543` * `[backport] ssl.MemoryBIO <https://bugs.python.org/issue22559>`_: Implementation of this PEP written by Alex Gaynor (first version written at October 2014) * :pep:`466` Discussions =========== * `[Python-Dev] Backport ssl.MemoryBIO on Python 2.7? <https://mail.python.org/pipermail/python-dev/2017-May/147981.html>`_ (May 2017) Copyright ========= This document has been placed in the public domain.

Jython 2.7.1 is about to be released, with full support of upstream pip (9.0.1), and corresponding vendored libraries, including requests. However, this proposed new feature for CPython 2.7, and its usage, will likely break pip on Jython 2.7.x going forward, given that future versions of pip will depend on requests requiring MemoryBIO. Or am I wrong in this analysis? This means we have to get back on the 2.7 development treadmill just as we were about to focus on finally working on Jython 3 ( https://github.com/jython/jython3 previews this work). Given that this proposed new feature is for 2.7 to support event loop usage and not a security fix, I'm -1 on this change. In particular, it runs counter to the justification policy stated in PEP 466. - Jim On Wed, May 31, 2017 at 7:25 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

2017-05-31 17:45 GMT+02:00 Jim Baker <jim.baker@python.org>:
Hum, it seems like the PEP 546 abstract is incomplete. The final goal of the PEP is to make Python 3 more secure thanks to all goodness of the PEP 543. The PEP 546 tries to explain why Python 2.7 is blocking the adoption of the PEP 543 in practice. Victor

So we have two distinct changes that are proposed here: 1. Support alternative implementations of TLS instead of OpenSSL. In particular this will enable the use of system trust stores for certificates. 2. Implement ABCs and concrete classes to support MemoryBIO, etc., from 3.7. Supporting system trust stores is a valid security fix for 2.7, and I have no such problem with such changes as long as they are narrowed to this specific change. But I object to a completely new feature being added to 2.7 to support the implementation of event loop SSL usage. This feature cannot be construed as a security fix, and therefore does not qualify as a feature that can be added to CPython 2.7 at this point in its lifecycle. The discussion that implementing such new features for 2.7 will improve their adoption for Python 3 is a red herring. We could enumerate many such features, but https://www.python.org/dev/peps/pep-0404/#upgrade-path is rather clear here. - Jim On Wed, May 31, 2017 at 10:40 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

On May 31, 2017, at 02:09 PM, Jim Baker wrote:
The other problem with this is that there isn't just one CPython 2.7, there are many. It's likely that most people get their Python from whatever distribution/package manager they use. Looking at active Ubuntu/Debian releases you can see Python 2.7's from 2.7.3 to 2.7.13. Few if any of those will get the new feature backported, so that makes it difficult to rely on them being present. I agree with Jim that it makes sense to backport security fixes. Usually those are more well contained, and thus easier to cherry pick into stable releases. New features are much tougher to justify the potential for regressions. Cheers, -Barry

On Wed, 31 May 2017 14:09:20 -0600 Jim Baker <jim.baker@python.org> wrote:
I agree with this sentiment. Also see comments by Ben Darnell and others here: https://github.com/python/peps/pull/272#pullrequestreview-41388700 Moreover I think that a 2.7 policy decision shouldn't depend on whatever future plans there are for Requests. The slippery slope of relaxing maintenance policy on 2.7 has come to absurd extremities. If Requests is to remain 2.7-compatible, it's up to Requests to do the necessary work to do so. Regards Antoine.

2017-06-01 10:57 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
If Requests is to remain 2.7-compatible, it's up to Requests to do the necessary work to do so.
In practice, CPython does include Requests in ensurepip. Because of that, it means that Requests cannot use any C extension. CPython 2.7 ensurepip prevents evolutions of Requests on Python 3.7. Is my rationale broken somehow? The root issue is to get a very secure TLS connection in pip to download packages from pypi.python.org. On CPython 3.6, we made multiple small steps to include more and more features in the stdlib ssl module, but I understand that the lack of root certificate authorities (CA) on Windows and macOS is still a major blocker issue for pip. That's why pip uses Requests which uses certifi (Mozilla bundled root certificate authorities.) pip and so Requests are part of the current success of the Python community. I disagree that Requests pratical isssues are not our problems. -- Moreover, the PEP 546 Rationale not only include Requests, but also the important PEP 543 to make CPython 3.7 more secure in the long term. Do you also disagree on the need of the need of the PEP 546 (backport) to make the PEP 543 (new TLS API) feasible in practice? Victor

Le 01/06/2017 à 11:13, Victor Stinner a écrit :
That's why pip uses Requests which uses certifi (Mozilla bundled root certificate authorities.)
pip could use certifi without using Requests. My guess is that Requests is used mostly because it eases coding.
pip and so Requests are part of the current success of the Python community.
pip is, but I'm not convinced about Requests. If Requests didn't exist, people (including pip's developers) would use another HTTP-fetching library, they wouldn't switch to Go or Ruby.
Do you also disagree on the need of the need of the PEP 546 (backport) to make the PEP 543 (new TLS API) feasible in practice?
Yes, I disagree. We needn't backport that new API to Python 2.7. Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only mode for a very long time. Python 3.6 is out. We should move on. Regards Antoine.

On Thu, Jun 1, 2017 at 7:23 PM, Antoine Pitrou <antoine@python.org> wrote:
But it is in *security fix* mode for at least another three years (ish). Proper use of TLS certificates is a security question. How hard would it be for the primary codebase of Requests to be written to use a C extension, but with a fallback *for pip's own bootstrapping only* that provides one single certificate - the authority that signs pypi.python.org? The point of the new system is that back-ends can be switched out; a stub back-end that authorizes only one certificate would theoretically be possible, right? Or am I completely misreading which part needs C? ChrisA

On Thu, 1 Jun 2017 19:50:22 +1000 Chris Angelico <rosuav@gmail.com> wrote:
Why are you bringing "proper use of TLS certificates"? Python 2.7 doesn't need another backport for that. The certifi package is available for Python 2.7 and can be integrated simply with the existing ssl module. Regards Antoine.

On Thu, Jun 1, 2017 at 8:01 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
As stated in this thread, OS-provided certificates are not handled by that. For instance, if a local administrator distributes a self-signed cert for the intranet server, web browsers will use it, but pip will not. ChrisA

On Thu, 1 Jun 2017 20:05:48 +1000 Chris Angelico <rosuav@gmail.com> wrote:
That's true. But: 1) pip could grow a config entry to set an alternative or additional CA path 2) it is not a "security fix", as not being able to recognize privately-signed certificates is not a security breach. It's a new feature Regards Antoine.

No it can’t. Exporting the Windows or macOS security store to a big file of PEM is a security vulnerability because the macOS and Windows security stores expect to work with their own certificate chain building algorithms. OpenSSL builds chains differently, and disregards some metadata that Windows and macOS store, which means that cert validation will work differently than in the system store. This can lead to pip accepting a cert marked as “untrusted for SSL”, for example, which would be pretty bad. Cory

Le 01/06/2017 à 12:23, Cory Benfield a écrit :
No it can’t.
OpenSSL builds chains differently, and disregards some metadata that Windows and macOS store, which means that cert validation will work differently than in the system store. This can lead to pip accepting a cert marked as “untrusted for SSL”, for example, which would be pretty bad.
Are you claiming that OpenSSL certificate validation is insecure and shouldn't be used at all? I have never heard that claim before. Regards Antoine.

Of course I’m not. I am claiming that using OpenSSL certificate validation with root stores that are not intended for OpenSSL can be. This is because trust of a certificate is non-binary. For example, consider WoSign. The Windows TLS implementation will distrust certificates that chain up to WoSign as a root certificate that were issued after October 21 2016. This is not something that can currently be represented as a PEM file. Therefore, the person exporting the certs needs to choose: should that be exported or not? If it is, then OpenSSL will happily trust it even in situations where the system trust store would not. More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information. Cory

On Thu, 1 Jun 2017 11:45:14 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
I am claiming that using OpenSSL certificate validation with root stores that are not intended for OpenSSL can be. This is because trust of a certificate is non-binary. For example, consider WoSign. The Windows TLS implementation will distrust certificates that chain up to WoSign as a root certificate that were issued after October 21 2016. This is not something that can currently be represented as a PEM file. Therefore, the person exporting the certs needs to choose: should that be exported or not? If it is, then OpenSSL will happily trust it even in situations where the system trust store would not.
I was not talking about exporting the whole system CA as a PEM file, I was talking about adding an option for system adminstrators to configure an extra CA certificate to be recognized by pip.
More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information.
How much of this is relevant to pip? Regards Antoine.

Generally speaking system administrators aren’t wild about this option, as it means that they can only add to the trust store, not remove from it. So, while possible, it’s not a complete solution to this issue. I say this because the option *already* exists, at least in part, via the REQUESTS_CA_BUNDLE environment variable, and we nonetheless still get many complaints from system administrators.
More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information.
How much of this is relevant to pip?
Depends. If the design goal is “pip respects the system administrator”, then the answer is “all of it”. An administrator wants to be able to configure their system trust settings. Ideally they want to do this once, and once only, such that all applications on their system respect it. Cory

Who is the “we” that should move on? Python core dev? Or the Python ecosystem? Because if it’s the latter, then I’m going to tell you right now that the ecosystem did not get the memo. If you check the pip download numbers for Requests in the last month you’ll see that 80% of our downloads (9.4 million) come from Python 2. That is an enormous proportion: far too many to consider not supporting that user-base. So Requests is basically bound to support that userbase. Requests is stuck in a place from which it cannot move. We feel we cannot drop 2.7 support. We want to support as many TLS backends as possible. We want to enable the pip developers to focus on their features, rather than worrying about HTTP and TLS. And we want people to adopt the async/await keywords as much as possible. It turns out that we cannot satisfy all of those desires with the status quo, so we proposed an alternative that involves backporting MemoryBIO. So, to the notion of “we need to move on”, I say this: we’re trying. We really, genuinely, are. I don’t know how much stronger of a signal I can give about how much Requests cares about Python 3 than to signal that we’re trying to adopt async/await and be compatible with asyncio. I believe that Python 3 is the present and future of this language. But right now, we can’t properly adopt it because we have a userbase that you want to leave behind, and we don’t. I want to move on, but I want to bring that 80% of our userbase with us when we do. My reading of your post is that you would rather Requests not adopt the async/await paradigm than backport MemoryBIO: is my understanding correct? If so, fair enough. If not, I’d like to try to work with you to a place where we can all get what we want. Cory

Hi Cory, On Thu, Jun 01, 2017 at 11:22:21AM +0100, Cory Benfield wrote:
We want to support as many TLS backends as possible.
Just a wild idea, but have you investigated a pure-Python fallback for 2.7 such as TLSlite? Of course the fallback need only be used during bootstrapping, and the solution would be compatible with every stable LTS Linux distribution release that was not shipping the latest and greatest 2.7. David

I have, but discarded the idea. There are no pure-Python TLS implementations that are both feature-complete and actively maintained. Additionally, doing crypto operations in pure-Python is a bad idea, so any implementation that did crypto in Python code would be ruled out immediately (which rules out TLSLite), so I’d need what amounts to a custom library: pure-Python TLS with crypto from OpenSSL, which is not currently exposed by any Python module. Ultimately it’s just not a winner. Cory

On Thu, Jun 01, 2017 at 11:47:31AM +0100, Cory Benfield wrote:
I have, but discarded the idea.
I'm glad to hear it was given sufficent thought. :) I have one final 'crazy' idea, and actually it does not seem to bad at all: can't you just fork a subprocess or spawn threads to handle the blocking SSL APIs? Sure it wouldn't be beautiful, but it is more appealing than forcing an upgrade on all 2.7 users just so they can continue to use pip. (Which, ironically, seems to resonate strongly with the motivation behind all of this work -- allowing users to continue with their old environments without forcing an upgrade to 3.x!) David

So, this will work, but at a performance and code cleanliness cost. This essentially becomes a Python-2-only code-path, and a very large and complex one at that. This has the combined unfortunate effects of meaning a) a proportionally small fraction of our users get access to the code path we want to take forward into the future, and b) the majority of our users get an inferior experience of having a library either spawn threads or processes under their feet, which in Python has a tendency to get nasty fast (I for one have experienced the joy of having to ctrl+c multiple times to get a program using paramiko to actually die). Again, it’s worth noting that this change will not just affect pip but also the millions of Python 2 applications using Requests. I am ok with giving those users access to only part of the functionality that the Python 3 users get, but I’m not ok with that smaller part also being objectively worse than what we do today. Cory

On Thu, Jun 01, 2017 at 12:18:48PM +0100, Cory Benfield wrote:
"Doctor, it hurts when I do this .." Fine, then how about rather than exporting pip's problems on to the rest of the world (which an API change to a security module in a stable branch most certainly is, 18 months later when every other Python library starts depending on it), just drop SSL entirely, it will almost certainly cost less pain in the long run, and you can even arrange for the same code to run in both major versions. Drop SSL? But that's madness! Serve the assets over plain HTTP and tack a signature somewhere alongside it, either side-by-side in a file, embedded in a URL query string, or whatever. Here[0] is 1000 lines of pure Python that can validate a public key signature over a hash of the asset as it's downloaded. Embed the 32 byte public key in the pip source and hey presto. [0] https://github.com/jfindlay/pure_pynacl/blob/master/pure_pynacl/tweetnacl.py Finding someone to audit the signature checking capabilities of [0] will have vastly lower net cost than getting the world into a situation where pip no longer runs on the >1e6 EC2 instances that will be running Ubuntu 14.04/16.04 LTS until the turn of the next decade. Requests can't be installed without a working SSL implementation? Then drop requests, it's not like it does much for pip anyway. Downloads worldwide get a huge speedup due to lack of TLS handshake latency, a million Squid caching reverse proxies worldwide jump into action caching tarballs they previously couldn't see, pip's _vendor directory drops by 4.2MB, and Python package security depends on 1k lines of memory-safe code rather than possibly *the* worst example of security-unconcious C to come into existence since the birth of our industry. Sounds like a win to me. Maybe set a standard rather than blindly follow everyone else, at the cost of.. everyone else. David

On 1 Jun 2017, at 15:10, David Wilson <dw+python-dev@hmmz.org> wrote:
So for the record I’m assuming most of the previous email was a joke: certainly it’s not going to happen. ;) But this is a real concern that does need to be addressed: Requests can’t meaningfully use this as its only TLS backend until it propagates to the wider 2.7 ecosystem, at least far enough such that pip can drop Python 2.7 releases lower than 2.7.14 (or wherever MemoryBIO ends up, if backported). So a concern emerges: if you grant my other premises about the utility of the backport, is it worth backporting at all? The answer to that is honestly not clear to me. I chatted with the pip developers, and they have 90%+ of their users currently on Python 2, but more than half of those are on 2.7.9 or later. This shows some interest in upgrading to newer Python 2s. The question, I think, is: do we end up in a position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2? I don’t have an answer to that question. I have a gut instinct that says yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”. But I’d say that from my perspective this is the crux of the problem. We can hedge against this by just choosing to backport and accepting that it may never become useful, but a reasonable person can disagree and say that it’s just not worth the effort. Frankly, I think that amidst all the other arguments this is the one that most concretely needs answering, because if we don’t think Requests can ever meaningfully rely on the presence of MemoryBIO on 2.7 (where “rely on” can be approximated to 90%+ of 2.7 users having access to it AND 2.7 still having non-trivial usage numbers) then ultimately this PEP doesn’t grant me much benefit. There are others who believe there are a few other benefits we could get from it (helping out Twisted etc.), but I don’t know that I’m well placed to make those arguments. (I also suspect I’d get accused of moving the goalposts.) Cory

On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
The answer to that is honestly not clear to me. I chatted with the pip developers, and they have 90%+ of their users currently on Python 2, but more than half of those are on 2.7.9 or later. This shows some interest in upgrading to newer Python 2s. The question, I think, is: do we end up in a position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2?
I don’t have an answer to that question. I have a gut instinct that says yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”.
Let's see. Python 2 users include people on Windows who install it themselves, and then have no mechanism for automatic updates. They'll probably stay on whatever 2.7.x they first got, until something forces them to update. But it also includes people on stable Linux distros, where they have automatic updates provided by Red Hat or Debian or whomever, so a change like this WILL propagate - particularly (a) as the window is three entire years, and (b) if the change is considered important by the distro managers, which is a smaller group of people to convince than the users themselves. By 2020, Windows 7 will be out of support. By various estimates, Win 7 represents roughly half of all current Windows users. That means that, by 2020, at least half of today's Windows users will either have upgraded to a new OS (likely with a wipe-and-fresh-install, so they'll get a newer Python), or be on an unsupported OS, on par with people still running XP today. The same is true for probably close to 100% of Linux users, since any supported Linux distro will be shipping updates between now and 2020, and I don't know much about Mac OS updates, but I rather suspect that they'll also be updating. (Can anyone confirm?) So I'd be in the "yes" category. Across the next few years, I strongly suspect that 2.7.14 will propagate reasonably well. And I also strongly suspect that, even once 2020 hits and Python 2 stops getting updates, it will still be important to a lot of people. These numbers aren't backed by much, but it's slightly better than mere gut instinct. Do you have figures for how many people use pip on Windows vs Linux vs Mac OS? ChrisA

On 1 Jun 2017, at 17:14, Chris Angelico <rosuav@gmail.com> wrote:
Do you have figures for how many people use pip on Windows vs Linux vs Mac OS?
I have figures for the download numbers, which are an awkward proxy because most people don’t CI on Windows and macOS, but they’re the best we have. Linux has approximately 20x the download numbers of either Windows or macOS, and both Windows and macOS are pretty close together. These numbers are a bit confounded due to the fact that 1/4 of Linux’s downloads are made up of systems that don’t report their platform, so the actual ratio could be anywhere from about 25:1 to 3:1 in favour of Linux for either Windows or macOS. All of this is based on the downloads made in the last month. Again, an enormous number of these downloads are going to be CI downloads which overwhelmingly favour Linux systems. For some extra perspective, the next highest platform by download count is FreeBSD, with 0.04% of the downloads of Linux. HTH, Cory

On Fri, Jun 2, 2017 at 2:35 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
I have figures for the download numbers, which are an awkward proxy because most people don’t CI on Windows and macOS, but they’re the best we have. Linux has approximately 20x the download numbers of either Windows or macOS, and both Windows and macOS are pretty close together. These numbers are a bit confounded due to the fact that 1/4 of Linux’s downloads are made up of systems that don’t report their platform, so the actual ratio could be anywhere from about 25:1 to 3:1 in favour of Linux for either Windows or macOS. All of this is based on the downloads made in the last month.
Again, an enormous number of these downloads are going to be CI downloads which overwhelmingly favour Linux systems.
Hmm. So it's really hard to know. Pity. I suppose it's too much to ask for IP-based stat exclusion for the most commonly-used CI systems (Travis, Circle, etc)? Still, it does look like most pip usage happens on Linux. Also, it seems likely that the people who use Python and pip heavily are going to be the ones who most care about keeping up-to-date with point releases, so I still stand by my belief that yes, 2.7.14+ could take the bulk of 2.7's marketshare before 2.7 itself stops being significant. Thanks for the figures. ChrisA

2017-06-01 18:51 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
It sems like PyPI statistics are public: https://langui.sh/2016/12/09/data-driven-decisions/ Another article on PyPI stats: https://hynek.me/articles/python3-2016/ 2.7: 419 millions (89%) 3.3+3.4+3.5+3.6: 51 millions (11%) (I ignored 2.6) Victor

On Jun 02, 2017, at 02:14 AM, Chris Angelico wrote:
I'm not so sure about that, given long term support releases. For Ubuntu, LTS releases live for 5 years: https://www.ubuntu.com/info/release-end-of-life By 2020, only Ubuntu 16.04 and 18.04 will still be maintained, so while 18.04 will likely contain whatever the latest 2.7 is available at that time, 16.04 won't track upstream point releases, but instead will get select cherry picks. For good reason, there's a lot of overhead to backporting fixes into stable releases, and something as big as being suggested here would, in my best guess, have a very low chance of showing up in stable releases. -Barry

On Jun 01, 2017, at 07:22 PM, Victor Stinner wrote:
I can help Canonical to backport MemoryBIO *if they want* to cherry-pick this feature ;-)
(Pedantically speaking, this falls under the Ubuntu project's responsibility, not directly Canonical.) Writing the patch is only part of the process: https://wiki.ubuntu.com/StableReleaseUpdates There's also Debian to consider. Cheers, -Barry

Using 2.7.9 as a sort of benchmark here, currently 26% of downloads from PyPI are using a version of Python older than 2.7.9, 2 months ago that number was 31%. (That’s total across all Python versions). Python >= 2.7.9, <3 is at 43% (previously 53%). So in ~2.5 years 2.7.9+ has become > 50% of all downloads from PyPI while older versions of Python 2.7 are down to only ~25% of the total number of downloads made by pip. I was also curious about how this had changed over the past year instead of just the past two months, a year ago >=2.7,<2.7.9 accounted for almost 50% of all downloads from PyPI (compared to the 25% today). It *looks* like on average we’re dropping somewhere between 1.5% and 2% each month so a conservative estimate if these numbers hold, we’re be looking at single digit numbers for >=2.7,<2.7.9 in roughly 11 months, or 3.5 years after the release of 2.7.9. If we assume that the hypothetical 2.7.14 w/ MemoryBio support would follow a similar adoption curve, we would expect to be able to mandate it for pip/etc in at a worst case scenario, 3-4 years after release. In addition to that, pip 9 comes with a new feature that makes it easier to sunset support for versions of Python without breaking the world [1]. The likely scenario is that while pip 9+ is increasing in share Python <2.7.14 will be decreasing, and that would mean that we could *likely* start mandating it earlier, maybe at the 2 year mark or so. [1] An astute reader might ask, why could you not use this same mechanism to simply move on to only supporting Python 3? It’s true we could do that, however as a rule we generally try to keep support for Pythons until the usage drops below some threshold, where that threshold varies based on how hard it is to continue supporting that version of Python and what the “win” is in terms of dropping it. Since we’re still at 90% of downloads from PyPI being done using Python 2, that suggests the threshold for Python 3.x is very far away and will extend beyond 2020 (I mean, we’re just *now* finally able to drop support for Python 2.6). In case it’s not obvious, I am very much in support of this PEP. — Donald Stufft

On Jun 1, 2017 9:20 AM, "Chris Angelico" <rosuav@gmail.com> wrote: On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield <cory@lukasa.co.uk> wrote: position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2?
I don’t have an answer to that question. I have a gut instinct that says
yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”.
Let's see. Python 2 users include people on Windows who install it themselves, and then have no mechanism for automatic updates. They'll probably stay on whatever 2.7.x they first got, until something forces them to update. But it also includes people on stable Linux distros, where they have automatic updates provided by Red Hat or Debian or whomever, so a change like this WILL propagate - particularly (a) as the window is three entire years, and (b) if the change is considered important by the distro managers, which is a smaller group of people to convince than the users themselves. I believe that for answering this question about the ssl module, it's really only Linux users that matter, since pip/requests/everyone else pushing for this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS (IIUC) is to stop using the ssl module entirely in favor of new ctypes bindings for their respective native TLS libraries. (And yes, in principle it might be possible to write new ctypes-based bindings for openssl, but (a) this whole project is already teetering on the verge of being impossible given the resources available, so adding any major extra deliverable is likely to sink the whole thing, and (b) compared to the proprietary libraries, openssl is *much* harder and riskier to wrap at the ctypes level because it has different/incompatible ABIs depending on its micro version and the vendor who distributed it. This is why manylinux packages that need openssl have to ship their own, but pip can't and shouldn't ship its own openssl for many hopefully obvious reasons.) -n

2017-06-01 19:10 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
The long term plan include one Windows implementation, one macOS implementation and one implementation using the stdlib ssl module. But it seems like right now, Cory is working alone and has limited time to implement his PEP 543 (new TLS API). The short term plans is to implement the strict minimum implementation, the one relying on the existing stdlib ssl module. Backporting MemoryBIO makes it possible to get the new TLS API "for free" on Python 2.7. IMHO Python 2.7 support is a requirement to make the PEP popular enough to make it successful. The backport is supposed to fix a chicken-and-egg issue :-)
(And yes, in principle it might be possible to write new ctypes-based bindings for openssl, but (...))
A C extension can also be considered, but I trust more code in CPython stdlib, since it would be well tested by our big farm of buildbots and have more eyes looking to the code. -- It seems like the PEP 546 (backport MemoryBIO) should make it more explicit that MemoryBIO support will be "optional": it's ok if Jython or PyPy doesn't implement it. It's ok if old Python 2.7 versions don't implement it. I expect anyway to use a fallback for those. It's just that I would prefer to avoid a fallback (likely a C extension) whenever possible, since it would cause various issues, especially for C code using OpenSSL: OpenSSL API changed many times :-/ Victor

On 01Jun2017 1010, Nathaniel Smith wrote:
How much of a stop-gap would it be (for Windows at least) to override OpenSSL's certificate validation with a call into the OS? This leaves most of the work with OpenSSL, but lets the OS say yes/no to the certificates based on its own configuration. For Windows, this is under 100 lines of C code in (probably) _ssl, and while I think an SChannel based approach is the better way to go long-term,[1] offering platform-specific certificate validation as the default in 2.7 is far more palatable than backporting new public API. I can't speak to whether there is an equivalent function for Mac (validate a certificate chain given the cert blob). Cheers, Steve [1]: though I've now spent hours looking at it and still have no idea how it's supposed to actually work...

It’s entirely do-able. This is where I reveal just how long I’ve been fretting over this problem: https://pypi.python.org/pypi/certitude <https://pypi.python.org/pypi/certitude>. Ignore the description, it’s wildly out-of-date: let me summarise the library instead. Certitude is a Python library that uses CFFI and Rust to call into the system certificate validation libraries on macOS and Windows using a single unified API. Under the covers it has a whole bunch of Rust code that translates from what OpenSSL can give you (a list of certificates in the peer cert chain in DER format) and into what those two operating systems expect. The Rust code for Windows is here[1] and is about as horrifying a chunk of Rust as you can imagine seeing (the Windows API does not translate very neatly into Rust so the word “unsafe” appears a lot), but it does appear to work, at least in the mainline cases and in the few tests I have. The macOS code is here[2] and is moderately less horrifying, containing no instances of the word “unsafe”. I lifted this approach from Chrome, because at the moment this is what they do: they use their custom fork of OpenSSL (BoringSSL) to do the actual TLS protocol manipulation, but hand the cert chain verification off to platform-native libraries on Windows and macOS. I have never productised this library because ultimately I never had the time to spend writing a sufficiently broad test-case to confirm to me that it worked in all cases. There are very real risks in calling these APIs directly because if you get it wrong it’s easy to fail open. It should be noted that right now certitude only works with, surprise, PyOpenSSL. Partly this is because the standard library does not expose SSL_get_peer_cert_chain, but even if it did that wouldn’t be enough as OpenSSL with VERIFY_NONE does not actually *save* the peer cert chain anywhere. That means even with PyOpenSSL the only way to get the peer cert chain is to hook into the verify callback and save off the certs as they come in, a gloriously absurd solution that is impossible with pure-Python code from the ssl module. While this approach could work with _ssl.c, it ultimately doesn’t resolve the issue. It involves writing a substantial amount of new code which needs to be maintained by the ssl module maintainers. All of this code needs to be tested *thoroughly*, because python-dev would be accepting responsibility for the incredibly damaging potential CVEs in that code. And it doesn’t get python-dev out of the business of shipping OpenSSL on macOS and Windows, meaning that python-dev continues to bear the burden of OpenSSL CVEs, as well as the brand new CVEs that it is at risk of introducing. Oh, and it can’t be backported to Python 2.7 or any of the bugfix-only Python 3 releases, and as I just noted the ssl module has never made it possible to use this approach from outside CPython. So it’s strictly just as bad as the situation PEP 543 is in, but with more C code. Doesn’t sound like a winning description to me. ;) Cory [1]: https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/wind... <https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/wind...> [2]: https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/osx.... <https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/osx....>

Thanks Cory for the long explanation. Let me try to summarize (tell me if I'm wrong). We have 3 options: * Do nothing: reject the PEP 546 and let each project handles security on its own (current status co) * Write *new* C code, maybe using certitude as a starting point, to offload certifcate validation on Windows and macOS * Backport existing code from master to 2.7: MemoryBIO and SSLObject Writing new code seems more risky and error-prone than backporting already "battle-tested" MemoryBIO from master. I also expect that writing code to validate certificate will be longer than the "100 lines of C code in (probably)" expected by Steve Dower. rust-certitude counts around 700 lines of Rust and 80 lines of Python code. But maybe I misunderstood the purpose of certitude: Steve Dower asked to only validate a certificate, not load or export CA. I counted 150 Python lines for SSLObject and 230 C lines for MemoryBIO. Since the long term plan is to not use stdlib ssl but a new implementation on Windows and macOS, it seems worthless to backport MemoryBIO on Python 2.7. The PEP 546 (backport MemoryBIO) is a practical solution to provide a *smooth* transition from ssl to a new TLS API. The experience showed that hard changes like "run 2to3 and drop your Python 2 code" doesn't work in practice. Users want a transition plan with small steps. Victor 2017-06-02 11:08 GMT+02:00 Cory Benfield <cory@lukasa.co.uk>:

That’s all certitude does. The docs of certitude are from an older version of the project when I was considering just doing a live-export to PEM file, before I realised the security concerns of that approach. We’d also require some other code that lives outside certitude. In particular, code needs to be written to handle the OpenSSL verify callback to save off the cert chain and to translate errors appropriately. There’s also a follow-on problem: the ssl module allows you to call SSLContext.load_default_certs and then SSLContext.load_verify_locations. If you do that, those two behave *additively*: both the default certs and custom verify locations are trusted. Certitude doesn’t allow you to do that: it says that if you choose to use the system certs then darn it that’s all you get. Working out how to do that without just importing random stuff into the user’s keychain would be…tricky. Do-able, for sure, but would require code I haven’t written for Certitude (I may have written it using ctypes elsewhere though). Cory

Thanks you for all the explanations. So to summarize my opinion, I'm still -0.5 on this PEP. I would also like to see the issues Jython, Ubuntu et al. have mentioned solved before this is accepted. Regards Antoine. On Fri, 2 Jun 2017 11:42:58 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:

The PEP's rationale is now "This PEP will help facilitate the future adoption of :pep:`543` across all supported Python versions, which will improve security for both Python 2 and Python 3 users." What exactly are these security improvements? My understanding (which may well be incorrect) is that the security improvements come in the future when the PEP 543 APIs are implemented on top of the various platform-native security libraries. These changes will not be present until some future 3.x release, and will not be backported to 2.7 (without another PEP, which I expect would be even more contentious than this one). What then is the security improvement for Python 2 users? In Tornado, I have not felt any urgency to replace wrap_socket with MemoryBIO. Is there a security-related reason I should do so sooner rather than later? (I'd also urge Cory and any other wrap_socket skeptics on the requests team to reconsider - Tornado's SSLIOStream works well. The asynchronous use of wrap_socket isn't so subtle and risky with buffering) -Ben On Fri, Jun 2, 2017 at 8:07 AM Antoine Pitrou <solipsis@pitrou.net> wrote:

On Jun 2, 2017 7:24 AM, "Ben Darnell" <ben@bendarnell.com> wrote: The PEP's rationale is now "This PEP will help facilitate the future adoption of :pep:`543` across all supported Python versions, which will improve security for both Python 2 and Python 3 users." What exactly are these security improvements? My understanding (which may well be incorrect) is that the security improvements come in the future when the PEP 543 APIs are implemented on top of the various platform-native security libraries. These changes will not be present until some future 3.x release, and will not be backported to 2.7 (without another PEP, which I expect would be even more contentious than this one). What then is the security improvement for Python 2 users? My understanding is that PEP 543 would be released as a library on pypi, as well as targeting stdlib inclusion in some future release. The goal of the MemoryBIO PEP is to allow the PEP 543 package to be pure-python, support all the major platforms, and straddle py2/py3 In Tornado, I have not felt any urgency to replace wrap_socket with MemoryBIO. Is there a security-related reason I should do so sooner rather than later? (I'd also urge Cory and any other wrap_socket skeptics on the requests team to reconsider - Tornado's SSLIOStream works well. The asynchronous use of wrap_socket isn't so subtle and risky with buffering) I'll leave the discussion of wrap_socket's reliability to others, because I don't have any experience there, but I do want to point out that that which primitive you pick has major system design consequences. MemoryBIO is an abstraction that can implement wrap_socket, but not vice-versa; if you use wrap_socket as your fundamental primitive then that leaks all over your abstraction hierarchy. Twisted has to have a MemoryBIO implementation of their TLS code, because they want to support TLS over arbitrary transports. Carrying a wrap_socket implementation as well would mean twice the tricky and security-sensitive code to maintain, plus breaking their current abstractions to expose whether any particular transport object is actually just a raw socket. The problem gets worse when you add PEP 543's pluggable TLS backends. If you can require a MemoryBIO-like API, then adding a backend becomes a matter of defining like 4 methods with relatively simple, testable interfaces, and it automatically works everywhere. Implementing a wrap_socket interface on top of this is complicated because the socket API is complicated and full of subtle corners, but it only has to be done once and libraries like tornado can adapt to the quirks of the one implementation. OTOH if you have to also support backends that only have wrap_socket, then this multiplies the complexity of everything. Now we need a way to negotiate which APIs each backend supports, and we need to somehow document all the weird corners of how wrapped sockets are supposed to behave in edge cases, and when different wrapped sockets inevitably behave differently then libraries like tornado need to discover those differences and support all the options. We need a way to explain to users why some backends work with some libraries but not others. And so on. And we still need to support MemoryBIOs in as many cases as possible. -n

On 2 June 2017 at 19:42, Victor Stinner <victor.stinner@gmail.com> wrote:
There's also a 4th option: * Introduce a dependency from requests onto PyOpenSSL when running in async mode on Python 2.7 in the general case, and figure out some other pip-specific option for ensurepip bootstrapping (like a *private* MemoryBIO implementation, or falling back to synchronous mode in requests) During the pre-publication PEP discussions, I kinda dismissed the PyOpenSSL dependency option out of hand due to the ensurepip bootstrapping issues it may introduce, but I think we need to discuss it further in the PEP as it would avoid some of the other challenges brought up here (Linux distro update latencies, potential complications for alternate Python 2.7 implementations, etc). For example: * if requests retains a synchronous mode fallback implementation, then ensurepip could use that in the absence of PyOpenSSL * even if requests drops synchronous mode entirely, we could leave the public ssl module API alone, and add an _ensurepip bootstrap module specifically for use in the absence of a full PyOpenSSL module If we adopted the latter approach, then for almost all intents and purposes, ssl.MemoryBIO and ssl.SSLObject would remain a Python 3.5+ only API, and anyone wanting access to it on 2.7 would still need to depend on PyOpenSSL. The benefit of making any backport a private API is that it would mean we weren't committing to support that API for general use: it would be supported *solely* for the use case discussed in the PEP (i.e. helping to advance the development of PEP 543 without breaking pip bootstrapping in the process). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Jun 2, 2017 at 1:29 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Whatever the eventual outcome, I don't think there's any danger someone will read this thread and think "wow, it's so easy to get new features into 2.7". -n -- Nathaniel J. Smith -- https://vorpus.org

On 3 June 2017 at 08:21, Nathaniel Smith <njs@pobox.com> wrote:
Aye, when "Be related to the Python ecosystem's collective handling of one of the essential network protocols that underpins the security of the entire internet" is a criterion to even *start* the backport discussion (let alone successfully make the case), I don't think "We might get deluged with further backport proposals" is really something we need to worry about :) Another key factor is being able to convince folks with some level of influence with commercial redistributors that the idea is worth supporting - the upstream backport in CPython is only step one in rolling out this kind of change, and we need to get it all the way through into the redistributor channels for it to really have the desired impact (hence PEP 493 as a follow-up to PEP 476 - at least some redistributors, including Red Hat, said "No" to the original hard compatibility break, but were OK with a smoother transition plan that gave end users more control over the timing of the change in behaviour). For the MemoryBIO and SSLObject backport, I'm still in the "Maybe" state myself. While I'm definitely sympathetic to the proposal (SSL/TLS is incredibly important, but cross-distro and cross-version SSL/TLS is already painful, and it's even worse once you bring cross-platform considerations and the sync/async split into the mix), it's an unfortunate fact of redistributor life that we don't typically have the luxury of pitching backport proposals to product management based on the benefits they offer to the upstream development community - we need to be able to articulate a clear and relatively immediate benefit to customers or commercial partners. For this PEP, "We want to make the new Python TLS abstraction API work" would potentially qualify as such a benefit (although I can't make any promises about that), but we're not even at that stage yet - that's a point we'd only reach if the library was initially built with a dependency on 3.5+ and 2.7.14+, *and* we either saw significant demand for that new API amongst customers, or else we genuinely needed to bring it in as a dependency of something else (and even then, we prefer approaches like a clearly distinct PyOpenSSL dependency that can be handled in a higher level product like OpenStack rather than needing to be baked directly into the operating system itself). As a result, as awkward and annoying as it would be, I'm wondering if the structure of the requests migration may need to be something like: - switch unconditionally to an async backend on Py3 - switch conditionally to an async backend on Py2 when PyOpenSSL is available - retain sufficient sync-only support on Py2 to allow pip to disable the PyOpenSSL dependency As a redistributor, ensuring that sufficiently recent versions of PyOpenSSL are available is likely to be a *far* more tractable problem than getting backports implemented at the standard library level (even as a private API), and even if that doesn't happen, developers that are using 3rd party modules like requests usually have significantly more freedom to install additional Python level dependencies than they do to upgrade their production Python runtime. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ultimately we don’t have the resources to do this. The requests core development team is 4 people, 3 of whom are part time best-effort maintainers and one of whom is me, who has foolishly decided to also work on PEP 543 as well as take over the lead maintenance role on urllib3. urllib3 has another two additional best-effort maintainers. We simply do not have the development resources to support two parallel branches of code. Even if we did, doing so is a recipe for adding bugs and inconsistencies between the two, leading to increased workload as we break our users and fail to keep the two branches in sync. It also makes it much harder for users to migrate from our synchronous backend to our async one, as those two no longer use identical underlying implementations and so will have subtle inconsistencies. The TL;DR there is: no, we’d rather stay sync only than do that. Cory

On 5 June 2017 at 18:42, Cory Benfield <cory@lukasa.co.uk> wrote:
Would you be OK with the notion of a "just for pip bootstrapping" private backend in _ensurepip_ssl? That is, the only officially supported async-backed requests configuration on Py2 would be with the PyOpenSSL dependency installed, but in collaboration with the pip devs we'd also plumb in the pieces to let a new async-backed requests work without any extension modules other than those in the standard library. That way, the only thing truly gated on the backport would be *pip* updating its bundled version of requests to the async-backed version - for normal third party use, the change would be "you need PyOpenSSL", rather than "you need a newer version of Python". We'd still effectively end up with two different code execution paths (one backed by PyOpenSSL, one backed by the new private _ensurepip_ssl extension module), but the level of divergence would be much lower (it's just a question of where MemoryBIO and SSLObject are coming from) and the support scope for the less frequently used path would be much narrower (i.e. if a problem report isn't related to pip bootstrapping, it can be closed as "won't fix") Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

It’s not clear to me what the structure of that looks like, or what work is required to achieve it. Right now Twisted never uses MemoryBIO and SSLObject: it always uses PyOpenSSL. That seems like we’d need to add MemoryBIO and SSLObject support to Twisted which can be enabled in some way other than feature detection (that is, so it can be installed). Without PEP 543 this is pretty gross. With PEP 543 it sucks less, but it also gates this work behind PEP 543 being successfully implemented and landed. I guess we are *open* to that approach? It’s not clear to me how beneficial that is, and it doesn’t gain any of the ecosystem benefits (no-one else on Py 2 can ever use this chunk of tested, understood code), but it’s certainly an option. The indirection gives me pause though. Cory

Is pip allowed to use the hypothetical _ensurepip_ssl outside of ensurepip? Thinking about this reminded me about the *other* reason pip avoids dependencies— avoiding making assertions about what versions of software can actually be installed. IOW if pip depends on pyOpenSSL >= X.Y, then you essentially can’t install any other version of pyOpenSSL and you’d be forced to upgrade. This isn’t end of the world and pyOpenSSL is generally stable enough we could *probably* get away with a unversioned dependency on a non Windows platform. On Windows we’d have to uh, I guess use a SChannel c-types backend? Not having our TLS library in the stdlib makes it a bit more difficult, but from pip’s own POV if there’s a MemoryBio that we’re allowed to use and then requests uses pyOpenSSL normally (we’d apply a small patch to pip’s copy of requests to make it use it, but that’s easy to do with our setup) I think that would be workable. I think that’s a less optimal solution than just accepting PEP 546 which I think is beneficial to the Python ecosystem as a whole, making it easier to port more code onto 3.x (and allowing unlocking 3.x’s capabilities) and will make it easier to maintain the ssl library on 2.7, given it will have less of a divergence from 3.x again. — Donald Stufft

On 5 June 2017 at 21:44, Donald Stufft <donald@stufft.io> wrote:
Is pip allowed to use the hypothetical _ensurepip_ssl outside of ensurepip?
Yes, so something like _tls_bootstrap would probably be a better name for the helper module (assuming we go down the "private API to bootstrap 3rd party SSL modules" path). The leading underscore would be akin to the one on _thread - it isn't that the API would be undocumented or inaccessible, it's that it wouldn't have any backwards compatibility guarantees for general use, so you shouldn't use it without a really good reason.
Indeed, and I think that's sufficient justification for at least adding the private TLS bootstrapping API.
Right, I'm just trying to make sure we clearly separate the two arguments: pip bootstrapping (which can potentially be addressed through a private alternate SSL/TLS API) and ecosystem advancement (making it easier for SSL/TLS capable applications to retain Python 2.7.x compatibility at least until 2020, while still supporting the improvements in asynchronous IO capabilities in the 3.x series). Separating the two main options like that gives us two future scenarios: 1. Private TLS bootstrapping API - requests & any other libraries that want this functionality would need a dependency on PyOpenSSL for Python versions prior to 3.5 - the base SSL/TLS support in Python 2.7 remains focused on synchronous IO, with only limited support for sans-io style protocol library implementations - some time in 2017 or 2018, pip starts requiring that environments provide either an externally managed _tls_bootstrap module, *or* a sufficiently recent PyOpenSSL - the first CPython maintenance release to include that pip update would also provide the required _tls_bootstrap module by default - redistributors will be able to backport _tls_bootstrap as an independent module (e.g. as part of their system pip packages) rather than necessarily including it directly in their Python runtime - this distribution model should be resilient over time, allowing _tls_bootstrap to be rebased freely without risking breaking end user applications that weren't expecting it (this approach would likely even be useful in LTS 3.x environments as they age - e.g. Ubuntu 14.04 and 16.04) - the private _tls_bootstrap API would either be a straight backport of the 3.6 ssl module, or else a shim around the corresponding standard library's ssl module to add the required features, whichever was judged easiest to implement and maintain 2. Public standard library ssl module API update - libraries needing the functionality *either* depend on PyOpenSSL *or* set their minimum Python version to 2.7.14+ - if libraries add a Python version check to their installation metadata, LTS distributions are out of luck, since we typically don't rebase Python (e.g. the RHEL 7 system Python still advertises itself as 2.7.5, even though assorted backports have been applied on top of that) - even if libraries rely on API feature detection rather than explicit version checks, redistributors still need to actually patch the standard library - we can't just add a guaranteed-side-effect-free module as a separate post-installation step - while some system administrators may be willing and able to deploy their own variant a _tls_boostrap module in advance of their redistributors providing one, relatively few are going to be willing to apply custom patches to Python before deploying it So honestly, I don't think the current "ecosystem advancement" argument in PEP 546 actually works, as the enhancement is *significantly* easier to roll out to older 2.7.x releases if the adoption model is "Supporting SSL/TLS with sans-io style protocol implementations on Python versions prior to Python 3.5 requires the use of PyOpenSSL or a suitable PEP 543 compatible library (unless you're a package management tool, in which case you should check for a separate platform provided Python version independent _tls_bootstrap module before falling back to PyOpenSSL)". As long as we actually document the _tls_bootstrap API (even if that's just a cross-reference to the Python 3.6 ssl module documentation), then folks wanting to share asynchronous IO code between Python 2 & 3 will have the following options: - use wrap_socket() (as Tornado does today) - depend on PyOpenSSL (as Twisted does today) - use _tls_bootstrap if available, fall back to PyOpenSSL otherwise <-- new piece added by PEP 546 due to the problems around pip attempting to upgrade its own dependencies Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jun 5, 2017 7:01 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote: On 5 June 2017 at 21:44, Donald Stufft <donald@stufft.io> wrote:
Is pip allowed to use the hypothetical _ensurepip_ssl outside of ensurepip?
Yes, so something like _tls_bootstrap would probably be a better name for the helper module (assuming we go down the "private API to bootstrap 3rd party SSL modules" path). It seems like there's a risk here that we end up with two divergent copies of ssl.py and _ssl.c inside the python 2 tree, and that this will make it harder to do future bug fixing and backports. Is this going to cause problems? Is there any way to mitigate them? -n

On 6 June 2017 at 10:59, Nathaniel Smith <njs@pobox.com> wrote:
Aye, I spent some time thinking about potentially viable implementation architectures, and realised that any "private API" style solution pretty much *has* to be structured as a monkeypatching API to be effective. Otherwise there is too much risk of divergence in things like exception definitions that end up causing cryptic "Why is my SSL exception handler not catching my SSL connection error?" type bugs. The gist of the approach would be that for libraries and non-bootstrapped applications, their SSL/TLS feature detection code would ultimately look something like this: try: # Don't do anything special if we don't need to from ssl import MemoryBIO, SSLObject expect ImportError: # Otherwise fall back to using PyOpenSSL try: from OpenSSL.SSL import Connection except ImportError: raise ImportError("Failed to import asynchronous SSL/TLS support: <details>") It's currently more complex than that in practice (since PyOpenSSL doesn't natively offer an emulation of the affected parts of the standard library's ssl module API), but that's what it would effectively boil down to at the lowest level of any compatibility wrapper. Conveniently, this is *already* what libraries and non-bootstrapped modules have to do if they're looking to support older Python versions without requiring PyOpenSSL on newer releases, so there wouldn't actually be any changes on that front. By contrast, for bootstrapped applications (including pip) the oversimplified compatibility import summary would instead look more like this: try: # Don't do anything special if we don't need to from ssl import MemoryBIO, SSLObject expect ImportError: # See if we can do a runtime in-place upgrade of the standard library try: import _tls_bootstrap except ImportError: # Otherwise fall back to using PyOpenSSL try: from OpenSSL.SSL import Connection except ImportError: raise ImportError("Failed to bootstrap asynchronous SSL/TLS support: <details>") else: _tls_bootstrap.monkeypatch_ssl() The reason this kind of approach is really attractive to redistributors from a customer risk management perspective is that like gevent's monkeypatching of synchronous networking APIs, it's *opt-in at runtime*, so the risk of our accidentally inflicting it on a customer that doesn't want it and doesn't need it is almost exactly zero - if none of their own code includes the "import _tls_bootstrap; _tls_bootstrap.monkeypatch_ssl()" invocation and none of their dependencies start enabling it as an implicit side effect of some other operation, they'll never even know the enhancement is there. Instead, the compatibility risks get concentrated in the applications relying on the bootstrapping API, since the monkeypatching process is a potentially new source of bugs that don't exist in the more conventional execution models. The reason that's still preferable though, is that it means the monkeypatching demonstrably doesn't have to be 100% perfect: it just has to be close enough to the way the Python 3 standard library API works to meet the needs of the applications that actually use it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Jun 5, 2017 at 8:49 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
OK. To unpack that, I think it would mean: 2.7's ssl.py and _ssl.c remain exactly as they are. We create a new _tls_bootstrap.py and __tls_bootstrap.c, which start out as ~copies of the current ssl.py and _ssl.c code in master. (Unfortunately the SSLObject code can't just be separated out from _ssl.c into a new file, because the implementations of SSLObject and SSLSocket are heavily intertwined.) Then the copied files are hacked up to work in this monkeypatching context (so that e.g. instead of defining the SSLError hierarchy in C code, it gets imported from the main ssl.py). So: we end up with two copies of the ssl code in the py27 tree, both diverged from the copy that in the py3 tree, in different ways. I know that the ssl maintainers won't be *happy* about this given that keeping the py2 and py3 code similar is an ongoing issue and was one of the stated advantages of backporting SSLObject, but I don't know whether it's a showstopper or not; it'd be good to hear their perspective. -n -- Nathaniel J. Smith -- https://vorpus.org

With Nick's suggestion of _tls_bootstrap, this has certainly become more complicated. I'm still thinking of the ramifications for future Jython 2.7 support, if Python dev goes down this path. It still seems easier to me to avoid exposing the SSLObject/MemoryBIO model to 2.7 and instead concentrate on just having native TLS support. Yes, this might require more work than a simple backport, but everyone is resource constrained, not just the Requests or Jython teams. My concrete suggestion is that any work on PEP 543 will benefit from improving the testing currently found in test_ssl as part of its implementation. For a variety of reasons, test functions like ssl_io_loop ( https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691) avoid asserting more than it can properly manage the framing of wrapped/unwrapped data, so for example one doesn't want to state explicitly when SSL_ERROR_WANT_READ would be called (too much white box at that point). On the other hand, the lack of unit testing of, for example, SSLObject.read, but instead only doing at the functional test level, means that there's nothing explicit testing "will raise an SSLWantReadError if it needs more data than the incoming BIO has available" (per the docs), or similar use of signaling (SSLEOFError comes to mind). The problem we have seen with Jython supporting similar functionality in the past in socket/select/ssl is not the happy path aspects like what ssl_io_loop tests, but underdocumented/undertested error states. So what benefits SChannel support, for example, should benefit Jython 3's implementation that would use Java's counterpart to SSLObject, SSLEngine. That would be a very good outcome of these proposed PEPs. - Jim On Tue, Jun 6, 2017 at 4:08 AM, Nathaniel Smith <njs@pobox.com> wrote:

On 6 Jun 2017, at 18:49, Jim Baker <jim.baker@python.org> wrote:
With Nick's suggestion of _tls_bootstrap, this has certainly become more complicated. I'm still thinking of the ramifications for future Jython 2.7 support, if Python dev goes down this path. It still seems easier to me to avoid exposing the SSLObject/MemoryBIO model to 2.7 and instead concentrate on just having native TLS support. Yes, this might require more work than a simple backport, but everyone is resource constrained, not just the Requests or Jython teams.
Oh. This hadn’t occurred to me until just now, but doesn’t PyOpenSSL (or anything built on CFFI) just basically not run on Jython? Or am I mistaken?
My concrete suggestion is that any work on PEP 543 will benefit from improving the testing currently found in test_ssl as part of its implementation. For a variety of reasons, test functions like ssl_io_loop (https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691 <https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691>) avoid asserting more than it can properly manage the framing of wrapped/unwrapped data, so for example one doesn't want to state explicitly when SSL_ERROR_WANT_READ would be called (too much white box at that point). On the other hand, the lack of unit testing of, for example, SSLObject.read, but instead only doing at the functional test level, means that there's nothing explicit testing "will raise an SSLWantReadError if it needs more data than the incoming BIO has available" (per the docs), or similar use of signaling (SSLEOFError comes to mind).
Yeah, PEP 543 just basically assumes the stdlib’s testing of TLS doesn’t exist and that it’ll have to manufacture its own. Unfortunately, because it is attempting to abstract across many different implementations the tests will need to be fairly high-level: for example, there is no consistent way to discuss the actual size of the buffers in the buffer-based TLS implementation, and they are allowed to be unbounded buffers, so tests cannot validate that TLSWantReadError and TLSWantWriteError are ever actually raised: all they can do is run tests that will handle them in the even that they are raised and confirm the data is transferred appropriately. Cory

On Wed, Jun 7, 2017 at 2:31 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
Sorry, I wish this were true, but PyOpenSSL is not available on Jython, because we do not yet support CFFI for Jython. CFFI support is something we have looked at, but we have not implemented. (There is a related and far more ambitious project to implement support for the C Extension API, http://jyni.org/, which Stefan Richthofer is working on with me under the Google Summer of Code.) Having said that, bundling PyOpenSSL for use by pip is something that we would *not* want to do for Jython itself. We want to use the native Java ecosystem where possible for built-in functionality, in part because we already have native support already to the underlying system trust store on *all platforms*. (Java development did all the heavy lifting for us.) Instead our current implementation of socket/ssl/select is built on Netty, plus some additional work for managing SSLContext in such a way that is compatible with Python. There is an out-of-date doc I prepared describing what was done (but broad aspects are still current): https://github.com/ jimbaker/socket-reboot
Right, so such improved testing, regardless of level, will still be an improvement, and we look forward to seeing it for use by Jython as well. I should mention that we sometimes see undocumented functionality leak out. For example, https://docs.python.org/3/library/socket.html#socket.socket.listen doesn't say anything about backlog=0, but the requests test suite (last time I looked on Jython) now depends on it. We assumed it was something like http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html, but as described in http://bugs.python.org/issue8498, it now means that "a server application in python that accepts exactly 1 connection", by passing to the underlying C. More tests, more docs, please, even though of course that's a lot of dev effort. - Jim

On 7 Jun 2017, at 20:06, Jim Baker <jim.baker@python.org> wrote:
Sorry, I wish this were true, but PyOpenSSL is not available on Jython, because we do not yet support CFFI for Jython. CFFI support is something we have looked at, but we have not implemented. (There is a related and far more ambitious project to implement support for the C Extension API, http://jyni.org/ <http://jyni.org/>, which Stefan Richthofer is working on with me under the Google Summer of Code.)
This is what I was worried about. Moving to require PyOpenSSL *also* locks us out of Jython support, at least for the time being. That’s another point in the “con” column for making PyOpenSSL a mandatory dependency.
I should mention that we sometimes see undocumented functionality leak out. For example, https://docs.python.org/3/library/socket.html#socket.socket.listen <https://docs.python.org/3/library/socket.html#socket.socket.listen> doesn't say anything about backlog=0, but the requests test suite (last time I looked on Jython) now depends on it. We assumed it was something like http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html <http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html>, but as described in http://bugs.python.org/issue8498 <http://bugs.python.org/issue8498>, it now means that "a server application in python that accepts exactly 1 connection", by passing to the underlying C. More tests, more docs, please, even though of course that's a lot of dev effort.
Ah, yes, we do. In our defense, this is the semantic of the listen syscall, and the socket module is generally a pretty thin wrapper around most of the socket syscalls. But in hindsight this is definitely the kind of thing that gets tricky for someone trying to reimplement the socket module in terms of a different abstraction. I don’t want to dive down this rabbit hole because if we do I’ll have to start talking about how the complexity of the socket API is one of the other implementation hurdles for PEP 543, but I think that’s a conversation for another time. Cory

2017-06-08 10:30 GMT+02:00 Cory Benfield <cory@lukasa.co.uk>:
Even if we do backport MemoryBIO to the next Python 2.7.14, I don't think that you can require MemoryBIO. What about all existing operating systems which provide a Python 2.7 without MemoryBIO? You need to have a workaround anyway. For example, make the new asynchronous API optional and use the old blocking mode in the meanwhile. Victor

I mentioned it earlier, but using the current download numbers from PyPI, <2.7.9 is rapidly dropping and is likely going to be single digit %age within the next 6-8 months IIRC. If 2.7.14 follows a similar trajectory, requests can depend on it within like… 2 years? Maybe 3? Likely it will depend on whether 2.7.14 gets into the next Ubuntu LTS or not. — Donald Stufft

Maybe the intent of my PEP is unclear: the goal is not to allow Requests to require MemoryBIO, but to get a wide adoption of a future implementation of the new TLS API (PEP). IMHO having an implementation working on the latest Python 2.7 version should make it possible to use it on some kinds of applications. I should take time to read the last messages in this thread and try to summarize them in the PEP ;-) Victor 2017-06-08 13:06 GMT+02:00 Donald Stufft <donald@stufft.io>:

On 8 June 2017 at 21:30, Victor Stinner <victor.stinner@gmail.com> wrote:
I think the short version of the main unanswered questions would be: 1. How hard would it be to define a full wheel-installable "backport.ssl" module that made the latest 3.x ssl module API available to Python 2 applications? 2. Given such a module, how hard would it be to add an "install()" function that added it to sys.modules as the "ssl" module? While that wouldn't allow pip or requests to depend on it without some solution to the bootstrapping problem, it would clearly separate the two questions, and also give us an architecture closer to the way importlib2 works (i.e. if you want to make that the implementation of import in a Python 2 application, you have to explicitly enable it at runtime, but you can also install it whenever you want, rather than having to rely on your Python implementation to provide it). We'd also still be left with the probably of figuring out how to handle Jython, but I figure we can consider that part of the bootstrapping problem: how to let applications reliably get hold of "backport.ssl" under a name provided by the host Python 2.7 implementation, rather than the normal pip-installable name. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I'm just going to straight up admit that I've lost track of the point of this thread. It sounds like we don't *need* to backport any of ssl into the Python 2.7 standard library, as long as we can bundle a 3rd-party backport for pip? I assume that, at a high level, the operation needed is to download content over https using the system trust stores. Is that what we're trying to achieve here? If it is, can we just provide an enhanced urlretrieve()? Certainly on Windows, and presumably on macOS, it's much easier to do the high-level GET operation than to reimplement it using primitives. As far as I can tell (bearing in mind that my brain can't keep track of this thread anymore), the only argument against this is if someone wants sensible default behaviour with local configuration tweaks: treat one specific certificate as valid while also treating the system (and user) certificate stores as valid without actually putting your certificate into the system (or user) store. My gut reaction to that is to say "too bad - choices are 100% system configuration or 100% custom configuration". I suspect that's not a suitable reaction, but I can't explain why. (And bear in mind that the current state of the ssl module on Windows will never get 100% system configuration.) Can someone explain to me why pip can't use a simple "system store urlretrieve()" function without configuration and "OpenSSL urlretrieve()" function with fully custom configuration? Even if only to bootstrap something that *can* merge two entirely different configuration systems? Cheers, Steve

On 8 June 2017 at 17:40, Steve Dower <steve.dower@python.org> wrote:
I'm just going to straight up admit that I've lost track of the point of this thread.
You have my sympathies - I'm not really following it either :-(
It sounds like we don't *need* to backport any of ssl into the Python 2.7 standard library, as long as we can bundle a 3rd-party backport for pip?
My understanding is that the PEP is asking to backport a new feature. The problem is that as a new feature, this received some (justifiable) pushback. The arguments for why this new feature is needed then got messy - as I understand it, it's something to do with how the requests library moves forward - they are blocked from supporting newer async features from 3.x because they need to support 2.7. This feature would relieve that logjam for them. Obviously, as a 3rd party library, their issues aren't particularly compelling for the stdlib - but because pip uses requests, and pip is shipped with Python, things get complicated.
The problem is that pip uses more features of requests than just issuing GET requests. We aren't going to be in a position to switch to a simple urlretrieve operation, even as some sort of fallback. What I'm personally not at all clear on is why we can't just ship a version of pip that supports 2.7 with 2.7, and a later version with 3.x. That doesn't make the problem for pip and requests any easier, but it does make it not python-dev's problem. The issue is that the gulf between 2.7 and 3.x is getting wider, and it's starting to cause real pain to 3rd party projects like requests. Does that justify backporting this specific feature to 2.7? I don't know. Note that I haven't actually read the original PEP. I don't have a view on networking issues, security, or Python 2.7 support. So I didn't really feel the need to more than skim this thread. My only interest really is where pip gets involved - and that's where I get confused as I don't really see why (ensure)pip makes the problem so much more complicated. Paul PS I'd be amazed if my summary above isn't wrong in at least some key points. Corrections welcome!

The basic yak stak here is: * PEP 543 should be the future, it is a much much better way of handling TLS than our current ssl module is. * Cory can’t spend his work time on PEP 543 unless he can say it is useful for requests. * In order for PEP 543 to be useful for requests, he needs a way to provide a backport for it for Python 2.7. * This backport *CAN* be OpenSSL only, but needs to be able to provide the same API. * PEP 543 wants to work with MemoryBIOs instead of sockets, because a MemoryBio is a much much better way of implementing this problem for a variety of reasons, and it would be a mistake to use a socket primitive again. * Indepently, requests also wants to be able to provide the ability for people to use it with asyncio, however it can’t drop support for Python 2.7 in the quest for doing that. Twisted provides a way forward that lets requests work on both 2.x and 3.x and integrate with asyncio, but Twisted requires MemoryBio to do so. * pyOpenSSL *could* be used to provide the MemoryBio needed on 2.7 for both cases from up above, however, pip cannot depend on a C library that isn’t part of the standard library - in addition this would break alternative runtimes like Jython where pyOpenSSL doesn’t work. Thus, adding MemoryBio to 2.7’s ssl means that requests can use it instead of depending on a C package (which it can’t because of pip), which means that Cory can then justify working on PEP 543 as part of his requests work, because he can say on Python 3.7, We use PEP 543 natively, and on Python < 3.7 we either no longer support that Python or we can wrap ssl.MemoryBio using a pure Python backport shim that provides the same API as PEP 543. Indendently of that, adding MemoryBio to 2.7’s ssl means that Twisted can use it instead of depending on a C package, which means requests can depend on Twisted (which it otherwise can’t, again because of pip). This then means that requests can refactor itself internally to use Twisted to write asyncio compatible code that provides a synchronous API and a asynchronous API that works on 2.7 and 3.x with asyncio. All of the other options require effectively forking the code or the ecosystem by either having “this library you use for sync” and “the library you use for async” largely duplicating code OR requires all of the network libraries to drop support for 2.7 (can’t, most of their users are on 2.7 still) or requires forking the library to have a 2.x and a 3.x version (we tried this before early on in the 3.x split and we settled on the fact that a single code base is a much better way to handle straddling the 2.x/3.x line). So basically back porting MemoryBio unlocks two important things for the health of the Python ecosystem: * Allows forward progress on PEP 543, which provides a wealth of great benefits like using the platform trust model and removing the need for pip, requests, etc to bundle a CA bundle internally and removing the need (long term anyways) for Python to ship a copy of OpenSSL on platforms that don’t provide it. * Allows requests and other libraries to continue to straddle the 2.x/3.x line where they need to, while still providing people who are using Python 3.x a way to use asyncio without having to fork the entire ecosystem into having an aio* copy of every single network library that exists.
Can someone explain to me why pip can't use a simple "system store urlretrieve()" function without configuration and "OpenSSL urlretrieve()" function with fully custom configuration? Even if only to bootstrap something that *can* merge two entirely different configuration systems?
It would require rewriting significant parts of our code that interfaces with HTTP, including the fact that we rely on some additional requests libraries (like cache control) to implement HTTP caching in pip, so unless urlretireve supported that as well we’d have to completely rewrite that from scratch. — Donald Stufft

Sorry I forgot one other important benefit: * It reduces the delta between the 3.x and the 2.x ssl and _ssl modules, which makes actually maintaining those modules easier because this code is fiddly and hard to get right, so the more we can just directly backport security fixes from one to the other rather than having to rewrite the patch, the better off we are. And the downside here is pretty small honestly? It’s not changing behavior of anything that currently exists since it’s adding a new thing inside the ssl module and Alex has already written the patch so there’s little extra work to do and it actually makes maintenance easier since patches can more readily be applied straight from `master`. The primary argument I can see against it, is one of purity, that 2.7 shouldn’t get new features but as we know, practicality beats purity ;) (and we’ve already accepted that TLS is a special case, special enough to break the rules, so the main question is whether this specific thing is worthwhile— which given it’s benefits to the Python ecosystem and to maintaining Python, I think it is). — Donald Stufft

On 9 June 2017 at 05:51, Donald Stufft <donald@stufft.io> wrote:
And the downside here is pretty small honestly?
Unfortunately, one of the downsides is "Doesn't provide any clearly compelling benefits to users of LTS distributions, so even those of us in a position to advocate for such backports probably won't win the related arguments". So the question is whether or not we're OK with saying that users affected by that will need to switch to a different set of Python binaries to get the latest pip & requests (e.g. upgrading their base distro, or else adopting Red Hat SCLs or one of the cross-platform, and hence cross-distro, distributions).
It isn't a purity argument so much as a "Will the change reach a sufficiently large proportion of 2.7 users to actually solve the compatibility problem it is aiming to solve?" argument (There's a *bit* of a purity argument behind that, in that the PEP 466 backport *did* break customer applications due to the incompatibilities with some of the async frameworks that were using underscore prefixed APIs that changed behaviour, so our credibility for "trust us, it won't break anything" is low in this context, but it's not the primary objection) I'm 100% confident that we'd reach a large enough audience with a compatibility shim that gets installed on disk as something other than "ssl" (e.g. "backport.ssl" or "_ssl_bootstrap"), and that has the virtue of enabling a multi-tiered distribution approach: - backport.ssl on PyPI as an independently installed module - _ssl_bootstrap as a runtime or redistributor provided module This approach also allows the API to be updated as necessary to meet the needs of PEP 543, without needing to go through the full PEP process again. The downside is adding a 3rd stack to maintain (Py3, Py2-compatible Py3 backport, Py2) without making maintenance on the Py2 stack any easier. I'm markedly less confident of reaching a sufficiently large audience with a "So stop using the system Python, then" approach (Don't get me wrong, I'd love for people to stop using the system Python for things that aren't part of the operating system, but people also *really like* the convenience of doing that). OTOH, if the aim is to make the change now, so it gets into Ubuntu 18.04, with a view to projects only starting to fully rely on it in mid-to-late 2018 or so? That timeline might actually work, and this approach has the benefit of actually making the Py2 SSL stack easier to maintain between now and 2020. So honestly, I'd be +1 for either approach: - stdlib backport to make dual-stack maintenance easier for the current volunteers, and we'll see how things work out on the ease-of-adoption front - PyPI backport to make 2.7 adoption easier, and we'll continue pestering redistributors to actually fund maintenance of Python 2.7's SSL stack properly (and encourage customers of those redistributors to do the same) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Cory can correct me if I’m wrong, but yea that’s the general idea here. We need a tree, and the best time to plant a tree is 20 years ago, but the second best time to plant a tree is today ;) I don’t think anyone is going to immediately start depending on a hypothetical 2.7.14 with MemoryBio support, but getting it out now means that Cory can start working on PEP 543 as part of a longer term effort for requests as well as starting to work on async-ifying requests. Neither of those efforts are going to be quick or simple tasks, so the work to make them possible can happen while we wait for adoption to naturally grow on 2.7.14. — Donald Stufft

On 09Jun2017 0343, Nick Coghlan wrote:
My draft reply to Donald sat overnight, so I abandoned it in favour of agreeing with Nick. I'm in principle in favour of anything that makes 2.7 less of a burden to maintain (up to and including EOL :) ), so if backporting parts of ssl/_ssl makes that easier then I'm +0. However, I do prefer the PyPI backport with some tool bundled in order to obtain it. In fact, given the nature of OpenSSL, I'd be in favour of that approach for all versions of Python (at least on Windows it would likely work well - probably less so on other platforms where we couldn't include a prebuilt fallback easily, though those tend to include compilers...). That hypothetical "_ensuressl" module in my mind really doesn't have to do much other than determine which file to download and then download and extract it, which can be done with OS level tools rather than needing our own stack. It may also be the necessary mechanism to make ssl pip-updateable, since we have the locking problem that prevents it being possible normally. Cheers, Steve

A ensuressl style module that tries to install an OpenSSL module is actually fairly hard to do securely. The fundamental issue being that fetching a file securely from the network before you have the primary tool for fetching things securely from a network gets to be a bit of a chicken and egg problem. There are *possible* ways around that, but they essentially boil down to having to ship something akin to a CA Bundle inside of Python which we just shift the responsibility from needing to have an updated OpenSSL in all versions of Python. The good news here is that PEP 543 is the thing that will enable using SChannel on Windows, SecureTransport on macOS, and OpenSSL on Linux getting python-dev out of the business of shipping an OpenSSL at all [1]. Backporting ssl.MemoryBio represents the smallest reasonable change to 2.7 we can make [2] that paves the path forward to being able to do that in some number of years, because we can’t do that until PEP 543 is a thing and people are using it. [1] Ok, the ``ssl`` module is still going to be ssl dependent, and there is a lot of code out there using it so we would need to keep it for awhile to maintain compatibility…. but at some point we can stop bundling it in our installers under the guise of “we have a newer, better API for handling this, and if you still need 2.7 compatibility, here’s the pip installable libraries you need to install to get it”. [2] Where reasonable means, it’s not sacrificing good things or making things harder to use for the sake of being smaller. — Donald Stufft

On 09Jun2017 1118, Donald Stufft wrote:
One of the reasons I'm wanting to push this way is that there are other APIs on Windows to do the core client use cases we seem to care about. For example, https://msdn.microsoft.com/en-us/library/windows/desktop/aa384106(v=vs.85).a... could easily let us implement the requests API with a direct call to the operating system, without having to get into the SChannel world of hurt. I assume macOS has similarly high-level APIs. These are totally fine for implementing a requests-like API that relies on system configuration for HTTPS connections. They are not sufficient for building a server, but they should be sufficient for downloading the packages you need to build a server. This is starting to feel to me like we're stuck on "building the thing right" and have skipped over "building the right thing". But maybe it's just me... Cheers, Steve

For the purposes of this PEP I think we should exclude this. The only way this works on a decent number of platforms (hi there BSDs and Linuxes) is if the option on those platforms is libcurl. Linux does not ship a high level HTTP client library: libcurl is basically the option. That would require pip binding libcurl, NSURLSession, and the Windows API. It’s not clear to me that the solution to “we don’t want to backport some SSL code” is “write a bunch of bindings to other third-party libraries”. Cory

I’m personally not super interested in having platform specific HTTP handling. I honestly want as little platform specific code as I can get away with, and the primary reason why I’m okay with drawing the line at the TLS stack (instead of the HTTP stack) is because being able to get operating system updates for the TLS stack is of immense benefit in addition to the fact that it’s really the *only* reasonable way to get access to the platform specific trust store. Relying on the OS to handle HTTP means that we cannot depend on anything that isn’t the lowest common denominator across all of the variety of platforms we support, which (as I see Cory has mentioned) isn’t really a thing outside of macOS/Windows. For those platforms we’d just be adding yet another C dependency to Python (or pip, or requests, or whatever). There’s a lot of HTTP handling code in pip that would need to be rewritten to support this, and I don’t think it would be of a particular benefit much less be doable in any reasonable timeframe. This isn’t really being particularly innovative in this area, it’s essentially what the browsers are doing. It also matches what curl is doing. All that being said, if someone *does* want pip to use WinHTTP, requests provides a mechanism where you can plug in your own network handling code, so someone could write a requests-winhttp adapter that did that, and we could possibly even expose the ability to ask pip to use that. However that’s entirely in the domain of pip/requests feature requests and not really related to python-dev/this PEP itself. — Donald Stufft

On 9 Jun 2017, at 20:54, Donald Stufft <donald@stufft.io> wrote:
All that being said, if someone *does* want pip to use WinHTTP, requests provides a mechanism where you can plug in your own network handling code, so someone could write a requests-winhttp adapter that did that, and we could possibly even expose the ability to ask pip to use that. However that’s entirely in the domain of pip/requests feature requests and not really related to python-dev/this PEP itself.
As an FYI I played with this on Mac far enough to prove that it’d work: https://github.com/Lukasa/requests-darwin <https://github.com/Lukasa/requests-darwin> It’s not anywhere near feature complete, but it basically works pretty ok. Of course, ideally Requests would just get entirely out of the way at this point because it duplicates a lot of NSURLSession’s logic, but it’s an approach that can work. Cory

On Jun 09, 2017, at 08:43 PM, Nick Coghlan wrote:
As I've said, I'm okay with this approach as long as we don't expose new public APIs in 2.7. That won't prevent other folks from using the private APIs, but we have no responsibility to support them.
Of course, lots of distributions are completely voluntary, so that kind of pestering falls on underfunded ears. ;) But a PyPI backport might still be useful since those third party packages can be distro-fied into current and backported channels. -Barry

On 08Jun2017 1237, Donald Stufft wrote:
Awesome, this is exactly what I needed to see. What if Python 2.7 just exposed the OpenSSL primitives necessary so that ctypes could use them? Is that a viable approach here? Presumably then a MemoryBIO backport could be pure-Python. It doesn't help the other *ythons, but unless they have MemoryBIO implementations ready to backport then I can't think of anything that will help them short of a completely new API. Cheers, Steve

I would have to let Cory answer the question about feasibility here since he’s much more familiar with OpenSSL’s API (and even binding something like this with ctypes) than I am. The first thing that really stands out to me though is it just feels a bit like shuffling deckchairs? At least I don’t see what adding the new feature of “thing you can use ctypes with to get a MemoryBio” does that adding the ssl.MemoryBio feature doesn’t other than downsides: * It creates yet another divergence between 3.x and 2.x that makes it harder to maintain ssl and _ssl. * It’s harder to use for requests/Twisted/etc. * It’s not already written (Ok this is minor, but still!). What do you see the benefits of that approach being over just backporting ssl.MemoryBio? — Donald Stufft

The short answer is that, while it’s do-able, we have some problems with ABI compatibility. OpenSSL 1.1 and 1.0 are ABI incompatible, so I have to write divergent ctypes code to handle each case. It may also be relevant to support OpenSSL 0.9.x so we roll into the same ABI compatibility concern all over again. Doubly annoyingly a bunch of OpenSSL code in 1.0 is actually macros which don’t work in ctypes so there’ll be a lot of futzing about in structures to get what I need to do done. This also doesn’t get into the difficulty of some operating systems shipping a LibreSSL masquerading as an OpenSSL, which is subtly incompatible in ways I don’t fully understand at this time.

On Thu, 8 Jun 2017 09:30:37 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
Ah, yes, we do. In our defense, this is the semantic of the listen syscall,[...]
According to POSIX, the backlog is only a hint, i.e. Jython is probably ok in not observing its value: """The backlog argument provides a hint to the implementation which the implementation shall use to limit the number of outstanding connections in the socket's listen queue.""" http://pubs.opengroup.org/onlinepubs/9699919799/functions/listen.html Regards Antoine.

On Tue, Jun 6, 2017 at 10:49 AM, Jim Baker <jim.baker@python.org> wrote:
Yeah, presumably the preferred way to do PEP 543 on Jython would be to have a native backend based on javax.net.ssl.SSLEngine. Not sure how that affects the MemoryBIO discussion – maybe it means that Jython just doesn't need a MemoryBIO backport in the stdlib ssl module, because everyone who might use it will find themselves using SSLEngine instead.
Another testing challenge is that the stdlib ssl module has no way to trigger a renegotiation, and therefore there's no way to write tests to check that it properly handles a renegotiation, even though renegotiation is by far the trickiest part of the protocol to get right. (In particular, renegotiation is the only case where attempting to read can give WantWrite and vice-versa.) -n -- Nathaniel J. Smith -- https://vorpus.org

2017-06-07 10:56 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
Renegociation was the source of a vulnerability in SSL/TLS protocols, so maybe it's a good thing that it's not implemented :-) https://www.rapid7.com/db/vulnerabilities/tls-sess-renegotiation Renegociation was removed from the new TLS 1.3 protocol: https://tlswg.github.io/tls13-spec/ "TLS 1.3 forbids renegotiation" Victor

On Jun 7, 2017 6:29 AM, "Victor Stinner" <victor.stinner@gmail.com> wrote: 2017-06-07 10:56 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
Renegociation was the source of a vulnerability in SSL/TLS protocols, so maybe it's a good thing that it's not implemented :-) https://www.rapid7.com/db/vulnerabilities/tls-sess-renegotiation Renegociation was removed from the new TLS 1.3 protocol: https://tlswg.github.io/tls13-spec/ "TLS 1.3 forbids renegotiation" Oh, sure, renegotiation is awful, no question. The HTTP/2 spec also forbids it. But it does still get used and python totally implements it :-). If python is talking to some peer and the peer says "hey, let's renegotiate now" then it will. There just aren't any tests for what happens next. Not that this has much to do with MemoryBIOs. Sorry for the tangent. -n

On 6 June 2017 at 20:08, Nathaniel Smith <njs@pobox.com> wrote:
If we went down this path, I think the only way to keep it maintainable would be to do one of two things: 1. Do the refactoring in Py3 first, but with the namespace pairing being "ssl/_enable_ssl_backport" rather than "_tls_bootstrap/ssl" 2. Don't try to share any implementation details with Py2, and instead provide a full "_ssl_backport" module more along the lines of importlib2 With the second option, we'd just do a wholesale backport of the 3.6 SSL module under a different name, and projects like pip would use it as follows: # See if there is an updated SSL module backport available try: import _ssl_backport except ImportError: pass else: _ssl_backport.install() # Fails if ssl or _ssl were already imported # At this point, sys.modules["ssl"] and sys.modules["_ssl"] would match the # API of the Python 3.6 versions, *not* the Python 2.7 versions. # At the first 2.7.x maintenance release after 3.7, they'd be updated # to match 3.7, and then the same again for 3.8. # (3.9 is likely to be after the end of 2.7 maintenance) # After that, proceed as usual with feature-based checks try: from ssl import MemoryBIO, SSLObject expect ImportError: # Otherwise fall back to using PyOpenSSL try: from OpenSSL.SSL import Connection except ImportError: raise ImportError("Failed to bootstrap asynchronous SSL/TLS support: <details>") Since there's no chance of breaking applications that don't opt in, while still making the newer network security features available to customers that want them, I think I could make a reasonable case for backporting such an "_ssl_backport.install()" implementation into the RHEL 7 system Python, but even if I wasn't able to do that, this kind of runtime injection into sys.modules could still be shipped as an add-on library in a way that an updated standard library SSL module can't. Such an approach would make the 2.7 regression tests noticeably slower (since we'd need to re-run a subset of them with the backport being injected by regrtest before anything loads the ssl module), and means backporting SSL fixes to 2.7 would become a two step process (once to resync the _ssl_backport module, and the second to update the default implementation), but I think there would be enough benefits to make that worth the pain: - we'd have a baseline 2.7 SSL/TLS implementation that we can readily keep in line with its 3.x counterparts - this approach has a reasonable chance of making it through the review processes for long term support distributions - system and application integrators can more readily do their own backports independently of redistributors Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

It’s not just bootstrapping that pip has a problem with for C extensions, it also prevents upgrading PyOpenSSL on Windows because having pip import PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the pip process exits and no longer imports PyOpenSSL. This isn’t a problem on Linux or macOS or the other *nix clients though. We patch requests as it is today to prevent it from importing simplejson and cryptography for this reason. — Donald Stufft

On 3 June 2017 at 02:22, Donald Stufft <donald@stufft.io> wrote:
Would requests be loading PyOpenSSL on Windows, though? If the aim is to facilitate PEP 543, then I'd expect it to be using the SChannel backend in that case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I’m not sure! The exact specifics of how it’d get implemented and the transition from what we have now to that could be yes or no (particularly the transition period). I’m just making sure that the constraint we have in pip is clearly defined here to make sure we don’t accept something that ends up not actually being suitable. I don’t have an opinion on the private bootstrap module (well I do, I like it less than just back porting MemoryBio “for real", but not one on whether it’d work or not). — Donald Stufft

On Fri, 2 Jun 2017 12:22:06 -0400 Donald Stufft <donald@stufft.io> wrote:
It’s not just bootstrapping that pip has a problem with for C extensions, it also prevents upgrading PyOpenSSL on Windows because having pip import PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the pip process exits and no longer imports PyOpenSSL. This isn’t a problem on Linux or macOS or the other *nix clients though. We patch requests as it is today to prevent it from importing simplejson and cryptography for this reason.
Does pip use any advanced features in Requests, at least when it comes to downloading packages (which is where the bootstrapping issue lies AFAIU)? Because at this point it should like you may be better off with a simple pure Python HTTP downloader. Regards Antoine.

It’s hard to fully answer the question because it sort of depends? Could we switch to just like, urllib2 or something? Yea we could, in fact we used to use that and switched to using requests because we had to backport security work around / fixes ourselves (the big one at the time was host name matching/verification) and we were really bad at keeping up with tracking what patches needed applying and when. Switching to requests let us offload that work to the requests team who are doing a phenomenal job at it. Beyond that though, getting HTTP right is hard, and pip used to have to try and implement work arounds to either broken or less optimal urllib2 behavior whereas requests generally gets it right for us out of the box. Closer to your specific questions about features, we’re using the requests session support to handle connection pooling to speed up downloading (since we don’t need to open a new connection for every download then), the adapter API to handle transparently allowing file:// URLs, the auth framework to handle holding authentication for multiple domains at once, and the third party library cachecontrol to handle our HTTP caching using a browser style cache. I suspect (though I’d let him speak for himself) that Cory would rather continue to be sync only than require pip to go back to not using requests. — Donald Stufft

On 2 Jun 2017, at 17:59, Donald Stufft <donald@stufft.io> wrote:
I suspect (though I’d let him speak for himself) that Cory would rather continue to be sync only than require pip to go back to not using requests.
We are not wedded to supporting pip, but I think the interaction between the two tools is positive. I think pip gains a lot from depending on us, and we get a lot of value from having pip as a dependency. So the cost of supporting pip would have to be pretty darn high for us to want to stop doing that, and so on this issue I think Donald is right. Cory

On Sat, Jun 03, 2017 at 02:10:50AM +1000, Nick Coghlan wrote:
Ignoring Ben's assertion regarding the legitimacy of async wrap_socket() (which seems to render this entire conversation moot), if you still really want to go this route, could ctypes be abused to provide the missing implementation from the underlying libs? It'd be a hack, but it would only be necessary during bootstrapping. David

On 1 June 2017 at 17:14, Chris Angelico <rosuav@gmail.com> wrote:
However, it is trivial for Windows users to upgrade if asked to, as there's no issue around system packages depending on a particular version (or indeed, much of anything depending - 3rd party applications on Windows bundle their own Python, they don't use the globally installed one). So in principle, there should be no problem expecting Windows users to be on the latest version of 2.7.x. In fact, I suspect that the proportion of Windows users on Python 3 is noticeably higher than the proportion of Linux/Mac OS users on Python 3 (for the same reason). So this problem may overall be less pressing for Windows users. I have no evidence that isn't anecdotal to back this last assertion up, though. Linux users often use the OS-supplied Python, and so getting the distributions to upgrade, and to backport upgrades to old versions of their OS and (push those backports as required updates) is the route to get the bulk of the users there. Experience on pip seems to indicate this is unlikely to happen, in practice. Mac OS users who use the system Python are, as I understand it, stuck with a pretty broken version (I don't know if newer versions of the OS change that). But distributions like Macports are more common and more up to date. Apart from the Windows details, these are purely my impressions.
Do you have figures for how many people use pip on Windows vs Linux vs Mac OS?
No. But we do get plenty of bug reports from Windows users, so I don't think there's any reason to assume it's particularly low (given the relative numbers of *python* users - in fact, it may be proportionately higher as Windows users don't have alternative options like yum). Paul

Note that on macOS, within the next year macOS users using the system Python are going to be unable to talk to PyPI anyways (unless Apple does something here, which I think they will), but in either case, Apple was pretty good about upgrading to 2.7.9 (I think they had the first OS released that supported 2.7.9?). — Donald Stufft

On Jun 1, 2017, at 3:57 PM, Donald Stufft <donald@stufft.io> wrote:
Note that on macOS, within the next year macOS users using the system Python are going to be unable to talk to PyPI anyways (unless Apple does something here, which I think they will), but in either case, Apple was pretty good about upgrading to 2.7.9 (I think they had the first OS released that supported 2.7.9?).
Forgot to mention that pip 10.0 will work around this, thus forcing macOS users to upgrade or be cut off. — Donald Stufft

On Thu, Jun 01, 2017 at 04:01:54PM +0100, Cory Benfield wrote:
So for the record I’m assuming most of the previous email was a joke: certainly it’s not going to happen. ;)
But this is a real concern that does need to be addressed
Unfortunately it wasn't, but at least I'm glad to have accidentally made a valid point amidst the cloud of caffeine-fuelled irritability :/ Apologies for the previous post, it was hardly constructive. David

On Thu, 1 Jun 2017 11:22:21 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
Who is the “we” that should move on? Python core dev? Or the Python ecosystem?
Sorry. Python core dev certainly. As for the rest of the ecosystem, it is moving on as well.
Well, certain features could be 3.x-only, couldn't they?
I don't get what async/await keywords have to do with this. We're talking about backporting the ssl memory BIO object... (also, as much as I think asyncio is a good thing, I'm not sure it will do much for the problem of downloading packages from HTTP, even in parallel)
I want to move on, but I want to bring that 80% of our userbase with us when we do. My reading of your post is that you would rather Requests not adopt the async/await paradigm than backport MemoryBIO: is my understanding correct?
Well you cannot use async/await on 2.7 in any case, and you cannot use asyncio on 2.7 (Trollius, which was maintained by Victor, has been abandoned AFAIK). If you want to use coroutines in 2.7, you need to use Tornado or Twisted. Twisted may not, but Tornado works fine with the stdlib ssl module. Regards Antoine.

Moving, sure, but slowly. Again, I point to the 80% download number.
In principle, sure. In practice, that means most of our users don’t use those features and so we don’t get any feedback on whether they’re good solutions to the problem. This is not great. Ideally we want features to be available across as wide a deploy base as possible, otherwise we risk shipping features that don’t solve the actual problem very well. Good software comes, in part, from getting user feedback.
All of this is related. I wrote a very, very long email initially and deleted it all because it was just too long to expect any normal human being to read it, but the TL;DR here is that we also want to support async/await, and doing so requires a memory BIO object.
I can use Twisted on 2.7, and Twisted has great integration with async/await and asyncio when they are available. Great and getting greater, in fact, thanks to the work of the Twisted and asyncio teams. As to Tornado, the biggest concern there is that there is no support for composing the TLS over non-TCP sockets as far as I am aware. The wrapped socket approach is not suitable for some kinds of stream-based I/O that users really should be able to use with Requests (e.g. UNIX pipes). Not a complete non-starter, but also not something I’d like to forego. Cory

On Thu, 1 Jun 2017 12:01:41 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
In principle, sure. In practice, that means most of our users don’t use those features and so we don’t get any feedback on whether they’re good solutions to the problem.
On bugs.python.org we get plenty of feedback from people using Python 3's features, and we have been for years. Your concern would have been very valid in the Python 3.2 timeframe, but I don't think it is anymore.
All of this is related. I wrote a very, very long email initially and deleted it all because it was just too long to expect any normal human being to read it, but the TL;DR here is that we also want to support async/await, and doing so requires a memory BIO object.
async/await doesn't require a memory BIO object. For example, Tornado supports async/await (*) even though it doesn't use a memory BIO object for its SSL layer. And asyncio started with a non-memory BIO SSL implementation while still using "yield from". (*) Despite the fact that Tornado's own coroutines are yield-based generators.
As to Tornado, the biggest concern there is that there is no support for composing the TLS over non-TCP sockets as far as I am aware. The wrapped socket approach is not suitable for some kinds of stream-based I/O that users really should be able to use with Requests (e.g. UNIX pipes).
Hmm, why would you use TLS on UNIX pipes except as an academic experiment? Tornado is far from a full-fledged networking package like Twisted, but its HTTP(S) support should be very sufficient (understandably, since it is the core use case for it). Regards Antoine.

Ok? I guess? I don’t know what to do with that answer, really. I gave you some data (80%+ of requests downloads over the last month were Python 2), and you responded with “it doesn’t cause us problems”. That’s good for you, I suppose, and well done, but it doesn’t seem immediately applicable to the concern I have.
You are right, sorry. I should not have used the word “require”. Allow me to rephrase. MemoryBIO objects are vastly, vastly more predictable and tractable than wrapped sockets when combined with non-blocking I/O. Using wrapped sockets and select/poll/epoll/kqueue, while possible, requires extremely subtle code that is easy to get wrong, and can nonetheless still have awkward bugs in it. I would be extremely loathe to use such an implementation, but you are correct, such an implementation can exist.
Let me be clear that there is no intention to use either Tornado or Twisted’s HTTP/1.1 parsers or engines. With all due respect to both projects, I have concerns about both their client implementations. Tornado’s default is definitely not suitable for use in Requests, and the curl backend is but, surprise surprise, requires a C extension and oh god we’re back here again. I have similar concerns about Twisted’s default HTTP/1.1 client. Tornado’s HTTP/1.1 server is certainly sufficient, but also not of much use to Requests. Requests very much intends to use our own HTTP logic, not least because we’re sick of relying on someone else’s. Literally what we want is to have an event loop backing us that we can integrate with async/await and that requires us to reinvent as few wheels as possible while giving an overall better end-user experience. If I were to use Tornado, because I would want to integrate PEP 543 support into Tornado I’d ultimately have to rewrite Tornado’s TLS implementation *anyway* to replace it with a PEP 543 version. If I’m doing that, I’d much rather do it with MemoryBIO than wrapped sockets, for all of the reasons above. As a final note, because I think we’re getting into the weeds here: this is not *necessary*. None of this is *necessary*. Requests exists, and works today. We’ll get Windows TLS support regardless of anything that’s done here, because I’ll just shim it into urllib3 like we did for macOS. What I am pushing for with PEP 543 is an improvement that would benefit the whole ecosystem: all I want to do is to make it possible for me to actually use it and ship it to users in the tools I maintain. It is reasonable and coherent for python-dev to say “well, good luck, but no backports to help you out”. The result of that is that I put PEP 543 on the backburner (because it doesn’t solve Requests/urllib3’s problems, and ultimately my day job is about resolving those issues), and probably that we shutter the async discussion for Requests until we drop Python 2 support. That’s fine, Python is your project, not mine. But I don’t see that there’s any reason for us not to ask for this backport. After all, the worst you can do is say no, and my problems remain the same. Cory

Le 01/06/2017 à 15:12, Cory Benfield a écrit :
I don’t know what to do with that answer, really. I gave you some data (80%+ of requests downloads over the last month were Python 2), and you responded with “it doesn’t cause us problems”.
And indeed it doesn't. Unless the target user base for pip is widely different than Python's, it shouldn't cause you any problems either.
As a final note, because I think we’re getting into the weeds here: this is not *necessary*. None of this is *necessary*. Requests exists, and works today.
And pip could even bundle a frozen 2.7-compatible version of Requests if it wanted/needed to...
Then the PEP is really wrong or misleading in the way it states its own motivations. Regards Antoine.

Maybe not now, but I think it’s fair to say that it did, right? As I recall, Python spent a long time with two fully supported Python versions, and then an even longer time with a version that was getting bugfixes. Tell me, which did you get more feedback on during that time? Generally speaking it is fair to say that at this point *every line of code in Requests* is exercised or depended on by one of our users. If we write new code available to a small fraction of them, and it is in any way sizeable, then that stops being true. Again, we should look at the fact that most libraries that successfully support Python 2 and Python 3 do so through having codebases that share as much code as possible between the two implementations. Each line of code that is exercised in only one implementation becomes a vector for a long, lingering bug. Anyway, all I know is that the last big project to do this kind of hard cut was Python, and while many of us are glad that Python 3 is real and glad that we pushed through the pain, I don’t think anyone would argue that the move was painless. A lesson can be learned there, especially for Requests which is not currently nursing a problem as fundamental to it as Python was.
Sure, if pip wants to internalise supporting and maintaining that version. One of the advantages of the pip/Requests relationship is that pip gets to stop worrying about HTTP: if there’s a HTTP problem, that’s on someone else to fix. Bundling that would remove that advantage.
How so? TLS is not a part of the HTTP parser. It’s an intermediary layer between the transport (resolutely owned by the network layer in Twisted/Tornado) and the parsing layer (resolutely owned by Requests). Ideally we would not roll our own. Cory

On Thu, 1 Jun 2017 14:37:55 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
Until Python 3.2 and perhaps 3.3, yes. Since 3.4, definitely not. For example asyncio quickly grew a sizable community around it, even though it had established Python 2-compatible competitors.
In the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO", and also in the general feeling it gives that the backport is motivated by security reasons primarily. I understand that some users would like more features in Python 2.7. That has been the case since it was decided that feature development in the 2.x line would end in favour of Python 3 development. But our maintenance policy has been and is to develop new features on Python 3 (which some people have described as a "carrot" for migrating, which is certainly true). Regards Antoine.

Sure, but “until 3.2” covers a long enough time to take us from now to “deprecation of Python 2”. Given that the Requests team is 4 people, unlike python-dev’s much larger number, I suspect we’d have at least as much pain proportionally as Python did. I’m not wild about signing up for that.
Ok, let’s address those together. There are security reasons to do the backport, but they are “it helps us build a pathway to PEP 543”. Right now there are a lot of people interested in seeing PEP 543 happen, but vastly fewer in a position to do the work. I am, but only if I can actually use it for the things that are in my job. If I can’t, then PEP 543 becomes an “evenings and weekends” activity for me *at best*, and something I have to drop entirely at worst. Adopting PEP 543 *would* be a security benefit, so while this PEP itself is not directly in and of itself a security benefit, it builds a pathway to something that is. As to the plans to move Requests to a more event loop-y model, I think that it does stand in the way of this, but only insomuch as, again, we want our event loopy model to be as bug-free as possible. But I can concede that rewording on that point would be valuable. *However*, it’s my understanding that even if I did that rewording, you’d still be against it. Is that correct? Cory

Jython 2.7.1 is about to be released, with full support of upstream pip (9.0.1), and corresponding vendored libraries, including requests. However, this proposed new feature for CPython 2.7, and its usage, will likely break pip on Jython 2.7.x going forward, given that future versions of pip will depend on requests requiring MemoryBIO. Or am I wrong in this analysis? This means we have to get back on the 2.7 development treadmill just as we were about to focus on finally working on Jython 3 ( https://github.com/jython/jython3 previews this work). Given that this proposed new feature is for 2.7 to support event loop usage and not a security fix, I'm -1 on this change. In particular, it runs counter to the justification policy stated in PEP 466. - Jim On Wed, May 31, 2017 at 7:25 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

2017-05-31 17:45 GMT+02:00 Jim Baker <jim.baker@python.org>:
Hum, it seems like the PEP 546 abstract is incomplete. The final goal of the PEP is to make Python 3 more secure thanks to all goodness of the PEP 543. The PEP 546 tries to explain why Python 2.7 is blocking the adoption of the PEP 543 in practice. Victor

So we have two distinct changes that are proposed here: 1. Support alternative implementations of TLS instead of OpenSSL. In particular this will enable the use of system trust stores for certificates. 2. Implement ABCs and concrete classes to support MemoryBIO, etc., from 3.7. Supporting system trust stores is a valid security fix for 2.7, and I have no such problem with such changes as long as they are narrowed to this specific change. But I object to a completely new feature being added to 2.7 to support the implementation of event loop SSL usage. This feature cannot be construed as a security fix, and therefore does not qualify as a feature that can be added to CPython 2.7 at this point in its lifecycle. The discussion that implementing such new features for 2.7 will improve their adoption for Python 3 is a red herring. We could enumerate many such features, but https://www.python.org/dev/peps/pep-0404/#upgrade-path is rather clear here. - Jim On Wed, May 31, 2017 at 10:40 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

On May 31, 2017, at 02:09 PM, Jim Baker wrote:
The other problem with this is that there isn't just one CPython 2.7, there are many. It's likely that most people get their Python from whatever distribution/package manager they use. Looking at active Ubuntu/Debian releases you can see Python 2.7's from 2.7.3 to 2.7.13. Few if any of those will get the new feature backported, so that makes it difficult to rely on them being present. I agree with Jim that it makes sense to backport security fixes. Usually those are more well contained, and thus easier to cherry pick into stable releases. New features are much tougher to justify the potential for regressions. Cheers, -Barry

On Wed, 31 May 2017 14:09:20 -0600 Jim Baker <jim.baker@python.org> wrote:
I agree with this sentiment. Also see comments by Ben Darnell and others here: https://github.com/python/peps/pull/272#pullrequestreview-41388700 Moreover I think that a 2.7 policy decision shouldn't depend on whatever future plans there are for Requests. The slippery slope of relaxing maintenance policy on 2.7 has come to absurd extremities. If Requests is to remain 2.7-compatible, it's up to Requests to do the necessary work to do so. Regards Antoine.

2017-06-01 10:57 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
If Requests is to remain 2.7-compatible, it's up to Requests to do the necessary work to do so.
In practice, CPython does include Requests in ensurepip. Because of that, it means that Requests cannot use any C extension. CPython 2.7 ensurepip prevents evolutions of Requests on Python 3.7. Is my rationale broken somehow? The root issue is to get a very secure TLS connection in pip to download packages from pypi.python.org. On CPython 3.6, we made multiple small steps to include more and more features in the stdlib ssl module, but I understand that the lack of root certificate authorities (CA) on Windows and macOS is still a major blocker issue for pip. That's why pip uses Requests which uses certifi (Mozilla bundled root certificate authorities.) pip and so Requests are part of the current success of the Python community. I disagree that Requests pratical isssues are not our problems. -- Moreover, the PEP 546 Rationale not only include Requests, but also the important PEP 543 to make CPython 3.7 more secure in the long term. Do you also disagree on the need of the need of the PEP 546 (backport) to make the PEP 543 (new TLS API) feasible in practice? Victor

Le 01/06/2017 à 11:13, Victor Stinner a écrit :
That's why pip uses Requests which uses certifi (Mozilla bundled root certificate authorities.)
pip could use certifi without using Requests. My guess is that Requests is used mostly because it eases coding.
pip and so Requests are part of the current success of the Python community.
pip is, but I'm not convinced about Requests. If Requests didn't exist, people (including pip's developers) would use another HTTP-fetching library, they wouldn't switch to Go or Ruby.
Do you also disagree on the need of the need of the PEP 546 (backport) to make the PEP 543 (new TLS API) feasible in practice?
Yes, I disagree. We needn't backport that new API to Python 2.7. Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only mode for a very long time. Python 3.6 is out. We should move on. Regards Antoine.

On Thu, Jun 1, 2017 at 7:23 PM, Antoine Pitrou <antoine@python.org> wrote:
But it is in *security fix* mode for at least another three years (ish). Proper use of TLS certificates is a security question. How hard would it be for the primary codebase of Requests to be written to use a C extension, but with a fallback *for pip's own bootstrapping only* that provides one single certificate - the authority that signs pypi.python.org? The point of the new system is that back-ends can be switched out; a stub back-end that authorizes only one certificate would theoretically be possible, right? Or am I completely misreading which part needs C? ChrisA

On Thu, 1 Jun 2017 19:50:22 +1000 Chris Angelico <rosuav@gmail.com> wrote:
Why are you bringing "proper use of TLS certificates"? Python 2.7 doesn't need another backport for that. The certifi package is available for Python 2.7 and can be integrated simply with the existing ssl module. Regards Antoine.

On Thu, Jun 1, 2017 at 8:01 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
As stated in this thread, OS-provided certificates are not handled by that. For instance, if a local administrator distributes a self-signed cert for the intranet server, web browsers will use it, but pip will not. ChrisA

On Thu, 1 Jun 2017 20:05:48 +1000 Chris Angelico <rosuav@gmail.com> wrote:
That's true. But: 1) pip could grow a config entry to set an alternative or additional CA path 2) it is not a "security fix", as not being able to recognize privately-signed certificates is not a security breach. It's a new feature Regards Antoine.

No it can’t. Exporting the Windows or macOS security store to a big file of PEM is a security vulnerability because the macOS and Windows security stores expect to work with their own certificate chain building algorithms. OpenSSL builds chains differently, and disregards some metadata that Windows and macOS store, which means that cert validation will work differently than in the system store. This can lead to pip accepting a cert marked as “untrusted for SSL”, for example, which would be pretty bad. Cory

Le 01/06/2017 à 12:23, Cory Benfield a écrit :
No it can’t.
OpenSSL builds chains differently, and disregards some metadata that Windows and macOS store, which means that cert validation will work differently than in the system store. This can lead to pip accepting a cert marked as “untrusted for SSL”, for example, which would be pretty bad.
Are you claiming that OpenSSL certificate validation is insecure and shouldn't be used at all? I have never heard that claim before. Regards Antoine.

Of course I’m not. I am claiming that using OpenSSL certificate validation with root stores that are not intended for OpenSSL can be. This is because trust of a certificate is non-binary. For example, consider WoSign. The Windows TLS implementation will distrust certificates that chain up to WoSign as a root certificate that were issued after October 21 2016. This is not something that can currently be represented as a PEM file. Therefore, the person exporting the certs needs to choose: should that be exported or not? If it is, then OpenSSL will happily trust it even in situations where the system trust store would not. More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information. Cory

On Thu, 1 Jun 2017 11:45:14 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
I am claiming that using OpenSSL certificate validation with root stores that are not intended for OpenSSL can be. This is because trust of a certificate is non-binary. For example, consider WoSign. The Windows TLS implementation will distrust certificates that chain up to WoSign as a root certificate that were issued after October 21 2016. This is not something that can currently be represented as a PEM file. Therefore, the person exporting the certs needs to choose: should that be exported or not? If it is, then OpenSSL will happily trust it even in situations where the system trust store would not.
I was not talking about exporting the whole system CA as a PEM file, I was talking about adding an option for system adminstrators to configure an extra CA certificate to be recognized by pip.
More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information.
How much of this is relevant to pip? Regards Antoine.

Generally speaking system administrators aren’t wild about this option, as it means that they can only add to the trust store, not remove from it. So, while possible, it’s not a complete solution to this issue. I say this because the option *already* exists, at least in part, via the REQUESTS_CA_BUNDLE environment variable, and we nonetheless still get many complaints from system administrators.
More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information.
How much of this is relevant to pip?
Depends. If the design goal is “pip respects the system administrator”, then the answer is “all of it”. An administrator wants to be able to configure their system trust settings. Ideally they want to do this once, and once only, such that all applications on their system respect it. Cory

Who is the “we” that should move on? Python core dev? Or the Python ecosystem? Because if it’s the latter, then I’m going to tell you right now that the ecosystem did not get the memo. If you check the pip download numbers for Requests in the last month you’ll see that 80% of our downloads (9.4 million) come from Python 2. That is an enormous proportion: far too many to consider not supporting that user-base. So Requests is basically bound to support that userbase. Requests is stuck in a place from which it cannot move. We feel we cannot drop 2.7 support. We want to support as many TLS backends as possible. We want to enable the pip developers to focus on their features, rather than worrying about HTTP and TLS. And we want people to adopt the async/await keywords as much as possible. It turns out that we cannot satisfy all of those desires with the status quo, so we proposed an alternative that involves backporting MemoryBIO. So, to the notion of “we need to move on”, I say this: we’re trying. We really, genuinely, are. I don’t know how much stronger of a signal I can give about how much Requests cares about Python 3 than to signal that we’re trying to adopt async/await and be compatible with asyncio. I believe that Python 3 is the present and future of this language. But right now, we can’t properly adopt it because we have a userbase that you want to leave behind, and we don’t. I want to move on, but I want to bring that 80% of our userbase with us when we do. My reading of your post is that you would rather Requests not adopt the async/await paradigm than backport MemoryBIO: is my understanding correct? If so, fair enough. If not, I’d like to try to work with you to a place where we can all get what we want. Cory

Hi Cory, On Thu, Jun 01, 2017 at 11:22:21AM +0100, Cory Benfield wrote:
We want to support as many TLS backends as possible.
Just a wild idea, but have you investigated a pure-Python fallback for 2.7 such as TLSlite? Of course the fallback need only be used during bootstrapping, and the solution would be compatible with every stable LTS Linux distribution release that was not shipping the latest and greatest 2.7. David

I have, but discarded the idea. There are no pure-Python TLS implementations that are both feature-complete and actively maintained. Additionally, doing crypto operations in pure-Python is a bad idea, so any implementation that did crypto in Python code would be ruled out immediately (which rules out TLSLite), so I’d need what amounts to a custom library: pure-Python TLS with crypto from OpenSSL, which is not currently exposed by any Python module. Ultimately it’s just not a winner. Cory

On Thu, Jun 01, 2017 at 11:47:31AM +0100, Cory Benfield wrote:
I have, but discarded the idea.
I'm glad to hear it was given sufficent thought. :) I have one final 'crazy' idea, and actually it does not seem to bad at all: can't you just fork a subprocess or spawn threads to handle the blocking SSL APIs? Sure it wouldn't be beautiful, but it is more appealing than forcing an upgrade on all 2.7 users just so they can continue to use pip. (Which, ironically, seems to resonate strongly with the motivation behind all of this work -- allowing users to continue with their old environments without forcing an upgrade to 3.x!) David

So, this will work, but at a performance and code cleanliness cost. This essentially becomes a Python-2-only code-path, and a very large and complex one at that. This has the combined unfortunate effects of meaning a) a proportionally small fraction of our users get access to the code path we want to take forward into the future, and b) the majority of our users get an inferior experience of having a library either spawn threads or processes under their feet, which in Python has a tendency to get nasty fast (I for one have experienced the joy of having to ctrl+c multiple times to get a program using paramiko to actually die). Again, it’s worth noting that this change will not just affect pip but also the millions of Python 2 applications using Requests. I am ok with giving those users access to only part of the functionality that the Python 3 users get, but I’m not ok with that smaller part also being objectively worse than what we do today. Cory

On Thu, Jun 01, 2017 at 12:18:48PM +0100, Cory Benfield wrote:
"Doctor, it hurts when I do this .." Fine, then how about rather than exporting pip's problems on to the rest of the world (which an API change to a security module in a stable branch most certainly is, 18 months later when every other Python library starts depending on it), just drop SSL entirely, it will almost certainly cost less pain in the long run, and you can even arrange for the same code to run in both major versions. Drop SSL? But that's madness! Serve the assets over plain HTTP and tack a signature somewhere alongside it, either side-by-side in a file, embedded in a URL query string, or whatever. Here[0] is 1000 lines of pure Python that can validate a public key signature over a hash of the asset as it's downloaded. Embed the 32 byte public key in the pip source and hey presto. [0] https://github.com/jfindlay/pure_pynacl/blob/master/pure_pynacl/tweetnacl.py Finding someone to audit the signature checking capabilities of [0] will have vastly lower net cost than getting the world into a situation where pip no longer runs on the >1e6 EC2 instances that will be running Ubuntu 14.04/16.04 LTS until the turn of the next decade. Requests can't be installed without a working SSL implementation? Then drop requests, it's not like it does much for pip anyway. Downloads worldwide get a huge speedup due to lack of TLS handshake latency, a million Squid caching reverse proxies worldwide jump into action caching tarballs they previously couldn't see, pip's _vendor directory drops by 4.2MB, and Python package security depends on 1k lines of memory-safe code rather than possibly *the* worst example of security-unconcious C to come into existence since the birth of our industry. Sounds like a win to me. Maybe set a standard rather than blindly follow everyone else, at the cost of.. everyone else. David

On 1 Jun 2017, at 15:10, David Wilson <dw+python-dev@hmmz.org> wrote:
So for the record I’m assuming most of the previous email was a joke: certainly it’s not going to happen. ;) But this is a real concern that does need to be addressed: Requests can’t meaningfully use this as its only TLS backend until it propagates to the wider 2.7 ecosystem, at least far enough such that pip can drop Python 2.7 releases lower than 2.7.14 (or wherever MemoryBIO ends up, if backported). So a concern emerges: if you grant my other premises about the utility of the backport, is it worth backporting at all? The answer to that is honestly not clear to me. I chatted with the pip developers, and they have 90%+ of their users currently on Python 2, but more than half of those are on 2.7.9 or later. This shows some interest in upgrading to newer Python 2s. The question, I think, is: do we end up in a position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2? I don’t have an answer to that question. I have a gut instinct that says yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”. But I’d say that from my perspective this is the crux of the problem. We can hedge against this by just choosing to backport and accepting that it may never become useful, but a reasonable person can disagree and say that it’s just not worth the effort. Frankly, I think that amidst all the other arguments this is the one that most concretely needs answering, because if we don’t think Requests can ever meaningfully rely on the presence of MemoryBIO on 2.7 (where “rely on” can be approximated to 90%+ of 2.7 users having access to it AND 2.7 still having non-trivial usage numbers) then ultimately this PEP doesn’t grant me much benefit. There are others who believe there are a few other benefits we could get from it (helping out Twisted etc.), but I don’t know that I’m well placed to make those arguments. (I also suspect I’d get accused of moving the goalposts.) Cory

On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
The answer to that is honestly not clear to me. I chatted with the pip developers, and they have 90%+ of their users currently on Python 2, but more than half of those are on 2.7.9 or later. This shows some interest in upgrading to newer Python 2s. The question, I think, is: do we end up in a position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2?
I don’t have an answer to that question. I have a gut instinct that says yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”.
Let's see. Python 2 users include people on Windows who install it themselves, and then have no mechanism for automatic updates. They'll probably stay on whatever 2.7.x they first got, until something forces them to update. But it also includes people on stable Linux distros, where they have automatic updates provided by Red Hat or Debian or whomever, so a change like this WILL propagate - particularly (a) as the window is three entire years, and (b) if the change is considered important by the distro managers, which is a smaller group of people to convince than the users themselves. By 2020, Windows 7 will be out of support. By various estimates, Win 7 represents roughly half of all current Windows users. That means that, by 2020, at least half of today's Windows users will either have upgraded to a new OS (likely with a wipe-and-fresh-install, so they'll get a newer Python), or be on an unsupported OS, on par with people still running XP today. The same is true for probably close to 100% of Linux users, since any supported Linux distro will be shipping updates between now and 2020, and I don't know much about Mac OS updates, but I rather suspect that they'll also be updating. (Can anyone confirm?) So I'd be in the "yes" category. Across the next few years, I strongly suspect that 2.7.14 will propagate reasonably well. And I also strongly suspect that, even once 2020 hits and Python 2 stops getting updates, it will still be important to a lot of people. These numbers aren't backed by much, but it's slightly better than mere gut instinct. Do you have figures for how many people use pip on Windows vs Linux vs Mac OS? ChrisA

On 1 Jun 2017, at 17:14, Chris Angelico <rosuav@gmail.com> wrote:
Do you have figures for how many people use pip on Windows vs Linux vs Mac OS?
I have figures for the download numbers, which are an awkward proxy because most people don’t CI on Windows and macOS, but they’re the best we have. Linux has approximately 20x the download numbers of either Windows or macOS, and both Windows and macOS are pretty close together. These numbers are a bit confounded due to the fact that 1/4 of Linux’s downloads are made up of systems that don’t report their platform, so the actual ratio could be anywhere from about 25:1 to 3:1 in favour of Linux for either Windows or macOS. All of this is based on the downloads made in the last month. Again, an enormous number of these downloads are going to be CI downloads which overwhelmingly favour Linux systems. For some extra perspective, the next highest platform by download count is FreeBSD, with 0.04% of the downloads of Linux. HTH, Cory

On Fri, Jun 2, 2017 at 2:35 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
I have figures for the download numbers, which are an awkward proxy because most people don’t CI on Windows and macOS, but they’re the best we have. Linux has approximately 20x the download numbers of either Windows or macOS, and both Windows and macOS are pretty close together. These numbers are a bit confounded due to the fact that 1/4 of Linux’s downloads are made up of systems that don’t report their platform, so the actual ratio could be anywhere from about 25:1 to 3:1 in favour of Linux for either Windows or macOS. All of this is based on the downloads made in the last month.
Again, an enormous number of these downloads are going to be CI downloads which overwhelmingly favour Linux systems.
Hmm. So it's really hard to know. Pity. I suppose it's too much to ask for IP-based stat exclusion for the most commonly-used CI systems (Travis, Circle, etc)? Still, it does look like most pip usage happens on Linux. Also, it seems likely that the people who use Python and pip heavily are going to be the ones who most care about keeping up-to-date with point releases, so I still stand by my belief that yes, 2.7.14+ could take the bulk of 2.7's marketshare before 2.7 itself stops being significant. Thanks for the figures. ChrisA

2017-06-01 18:51 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
It sems like PyPI statistics are public: https://langui.sh/2016/12/09/data-driven-decisions/ Another article on PyPI stats: https://hynek.me/articles/python3-2016/ 2.7: 419 millions (89%) 3.3+3.4+3.5+3.6: 51 millions (11%) (I ignored 2.6) Victor

On Jun 02, 2017, at 02:14 AM, Chris Angelico wrote:
I'm not so sure about that, given long term support releases. For Ubuntu, LTS releases live for 5 years: https://www.ubuntu.com/info/release-end-of-life By 2020, only Ubuntu 16.04 and 18.04 will still be maintained, so while 18.04 will likely contain whatever the latest 2.7 is available at that time, 16.04 won't track upstream point releases, but instead will get select cherry picks. For good reason, there's a lot of overhead to backporting fixes into stable releases, and something as big as being suggested here would, in my best guess, have a very low chance of showing up in stable releases. -Barry

On Jun 01, 2017, at 07:22 PM, Victor Stinner wrote:
I can help Canonical to backport MemoryBIO *if they want* to cherry-pick this feature ;-)
(Pedantically speaking, this falls under the Ubuntu project's responsibility, not directly Canonical.) Writing the patch is only part of the process: https://wiki.ubuntu.com/StableReleaseUpdates There's also Debian to consider. Cheers, -Barry

Using 2.7.9 as a sort of benchmark here, currently 26% of downloads from PyPI are using a version of Python older than 2.7.9, 2 months ago that number was 31%. (That’s total across all Python versions). Python >= 2.7.9, <3 is at 43% (previously 53%). So in ~2.5 years 2.7.9+ has become > 50% of all downloads from PyPI while older versions of Python 2.7 are down to only ~25% of the total number of downloads made by pip. I was also curious about how this had changed over the past year instead of just the past two months, a year ago >=2.7,<2.7.9 accounted for almost 50% of all downloads from PyPI (compared to the 25% today). It *looks* like on average we’re dropping somewhere between 1.5% and 2% each month so a conservative estimate if these numbers hold, we’re be looking at single digit numbers for >=2.7,<2.7.9 in roughly 11 months, or 3.5 years after the release of 2.7.9. If we assume that the hypothetical 2.7.14 w/ MemoryBio support would follow a similar adoption curve, we would expect to be able to mandate it for pip/etc in at a worst case scenario, 3-4 years after release. In addition to that, pip 9 comes with a new feature that makes it easier to sunset support for versions of Python without breaking the world [1]. The likely scenario is that while pip 9+ is increasing in share Python <2.7.14 will be decreasing, and that would mean that we could *likely* start mandating it earlier, maybe at the 2 year mark or so. [1] An astute reader might ask, why could you not use this same mechanism to simply move on to only supporting Python 3? It’s true we could do that, however as a rule we generally try to keep support for Pythons until the usage drops below some threshold, where that threshold varies based on how hard it is to continue supporting that version of Python and what the “win” is in terms of dropping it. Since we’re still at 90% of downloads from PyPI being done using Python 2, that suggests the threshold for Python 3.x is very far away and will extend beyond 2020 (I mean, we’re just *now* finally able to drop support for Python 2.6). In case it’s not obvious, I am very much in support of this PEP. — Donald Stufft

On Jun 1, 2017 9:20 AM, "Chris Angelico" <rosuav@gmail.com> wrote: On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield <cory@lukasa.co.uk> wrote: position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2?
I don’t have an answer to that question. I have a gut instinct that says
yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”.
Let's see. Python 2 users include people on Windows who install it themselves, and then have no mechanism for automatic updates. They'll probably stay on whatever 2.7.x they first got, until something forces them to update. But it also includes people on stable Linux distros, where they have automatic updates provided by Red Hat or Debian or whomever, so a change like this WILL propagate - particularly (a) as the window is three entire years, and (b) if the change is considered important by the distro managers, which is a smaller group of people to convince than the users themselves. I believe that for answering this question about the ssl module, it's really only Linux users that matter, since pip/requests/everyone else pushing for this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS (IIUC) is to stop using the ssl module entirely in favor of new ctypes bindings for their respective native TLS libraries. (And yes, in principle it might be possible to write new ctypes-based bindings for openssl, but (a) this whole project is already teetering on the verge of being impossible given the resources available, so adding any major extra deliverable is likely to sink the whole thing, and (b) compared to the proprietary libraries, openssl is *much* harder and riskier to wrap at the ctypes level because it has different/incompatible ABIs depending on its micro version and the vendor who distributed it. This is why manylinux packages that need openssl have to ship their own, but pip can't and shouldn't ship its own openssl for many hopefully obvious reasons.) -n

2017-06-01 19:10 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
The long term plan include one Windows implementation, one macOS implementation and one implementation using the stdlib ssl module. But it seems like right now, Cory is working alone and has limited time to implement his PEP 543 (new TLS API). The short term plans is to implement the strict minimum implementation, the one relying on the existing stdlib ssl module. Backporting MemoryBIO makes it possible to get the new TLS API "for free" on Python 2.7. IMHO Python 2.7 support is a requirement to make the PEP popular enough to make it successful. The backport is supposed to fix a chicken-and-egg issue :-)
(And yes, in principle it might be possible to write new ctypes-based bindings for openssl, but (...))
A C extension can also be considered, but I trust more code in CPython stdlib, since it would be well tested by our big farm of buildbots and have more eyes looking to the code. -- It seems like the PEP 546 (backport MemoryBIO) should make it more explicit that MemoryBIO support will be "optional": it's ok if Jython or PyPy doesn't implement it. It's ok if old Python 2.7 versions don't implement it. I expect anyway to use a fallback for those. It's just that I would prefer to avoid a fallback (likely a C extension) whenever possible, since it would cause various issues, especially for C code using OpenSSL: OpenSSL API changed many times :-/ Victor

On 01Jun2017 1010, Nathaniel Smith wrote:
How much of a stop-gap would it be (for Windows at least) to override OpenSSL's certificate validation with a call into the OS? This leaves most of the work with OpenSSL, but lets the OS say yes/no to the certificates based on its own configuration. For Windows, this is under 100 lines of C code in (probably) _ssl, and while I think an SChannel based approach is the better way to go long-term,[1] offering platform-specific certificate validation as the default in 2.7 is far more palatable than backporting new public API. I can't speak to whether there is an equivalent function for Mac (validate a certificate chain given the cert blob). Cheers, Steve [1]: though I've now spent hours looking at it and still have no idea how it's supposed to actually work...

It’s entirely do-able. This is where I reveal just how long I’ve been fretting over this problem: https://pypi.python.org/pypi/certitude <https://pypi.python.org/pypi/certitude>. Ignore the description, it’s wildly out-of-date: let me summarise the library instead. Certitude is a Python library that uses CFFI and Rust to call into the system certificate validation libraries on macOS and Windows using a single unified API. Under the covers it has a whole bunch of Rust code that translates from what OpenSSL can give you (a list of certificates in the peer cert chain in DER format) and into what those two operating systems expect. The Rust code for Windows is here[1] and is about as horrifying a chunk of Rust as you can imagine seeing (the Windows API does not translate very neatly into Rust so the word “unsafe” appears a lot), but it does appear to work, at least in the mainline cases and in the few tests I have. The macOS code is here[2] and is moderately less horrifying, containing no instances of the word “unsafe”. I lifted this approach from Chrome, because at the moment this is what they do: they use their custom fork of OpenSSL (BoringSSL) to do the actual TLS protocol manipulation, but hand the cert chain verification off to platform-native libraries on Windows and macOS. I have never productised this library because ultimately I never had the time to spend writing a sufficiently broad test-case to confirm to me that it worked in all cases. There are very real risks in calling these APIs directly because if you get it wrong it’s easy to fail open. It should be noted that right now certitude only works with, surprise, PyOpenSSL. Partly this is because the standard library does not expose SSL_get_peer_cert_chain, but even if it did that wouldn’t be enough as OpenSSL with VERIFY_NONE does not actually *save* the peer cert chain anywhere. That means even with PyOpenSSL the only way to get the peer cert chain is to hook into the verify callback and save off the certs as they come in, a gloriously absurd solution that is impossible with pure-Python code from the ssl module. While this approach could work with _ssl.c, it ultimately doesn’t resolve the issue. It involves writing a substantial amount of new code which needs to be maintained by the ssl module maintainers. All of this code needs to be tested *thoroughly*, because python-dev would be accepting responsibility for the incredibly damaging potential CVEs in that code. And it doesn’t get python-dev out of the business of shipping OpenSSL on macOS and Windows, meaning that python-dev continues to bear the burden of OpenSSL CVEs, as well as the brand new CVEs that it is at risk of introducing. Oh, and it can’t be backported to Python 2.7 or any of the bugfix-only Python 3 releases, and as I just noted the ssl module has never made it possible to use this approach from outside CPython. So it’s strictly just as bad as the situation PEP 543 is in, but with more C code. Doesn’t sound like a winning description to me. ;) Cory [1]: https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/wind... <https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/wind...> [2]: https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/osx.... <https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/osx....>

Thanks Cory for the long explanation. Let me try to summarize (tell me if I'm wrong). We have 3 options: * Do nothing: reject the PEP 546 and let each project handles security on its own (current status co) * Write *new* C code, maybe using certitude as a starting point, to offload certifcate validation on Windows and macOS * Backport existing code from master to 2.7: MemoryBIO and SSLObject Writing new code seems more risky and error-prone than backporting already "battle-tested" MemoryBIO from master. I also expect that writing code to validate certificate will be longer than the "100 lines of C code in (probably)" expected by Steve Dower. rust-certitude counts around 700 lines of Rust and 80 lines of Python code. But maybe I misunderstood the purpose of certitude: Steve Dower asked to only validate a certificate, not load or export CA. I counted 150 Python lines for SSLObject and 230 C lines for MemoryBIO. Since the long term plan is to not use stdlib ssl but a new implementation on Windows and macOS, it seems worthless to backport MemoryBIO on Python 2.7. The PEP 546 (backport MemoryBIO) is a practical solution to provide a *smooth* transition from ssl to a new TLS API. The experience showed that hard changes like "run 2to3 and drop your Python 2 code" doesn't work in practice. Users want a transition plan with small steps. Victor 2017-06-02 11:08 GMT+02:00 Cory Benfield <cory@lukasa.co.uk>:

That’s all certitude does. The docs of certitude are from an older version of the project when I was considering just doing a live-export to PEM file, before I realised the security concerns of that approach. We’d also require some other code that lives outside certitude. In particular, code needs to be written to handle the OpenSSL verify callback to save off the cert chain and to translate errors appropriately. There’s also a follow-on problem: the ssl module allows you to call SSLContext.load_default_certs and then SSLContext.load_verify_locations. If you do that, those two behave *additively*: both the default certs and custom verify locations are trusted. Certitude doesn’t allow you to do that: it says that if you choose to use the system certs then darn it that’s all you get. Working out how to do that without just importing random stuff into the user’s keychain would be…tricky. Do-able, for sure, but would require code I haven’t written for Certitude (I may have written it using ctypes elsewhere though). Cory

Thanks you for all the explanations. So to summarize my opinion, I'm still -0.5 on this PEP. I would also like to see the issues Jython, Ubuntu et al. have mentioned solved before this is accepted. Regards Antoine. On Fri, 2 Jun 2017 11:42:58 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:

The PEP's rationale is now "This PEP will help facilitate the future adoption of :pep:`543` across all supported Python versions, which will improve security for both Python 2 and Python 3 users." What exactly are these security improvements? My understanding (which may well be incorrect) is that the security improvements come in the future when the PEP 543 APIs are implemented on top of the various platform-native security libraries. These changes will not be present until some future 3.x release, and will not be backported to 2.7 (without another PEP, which I expect would be even more contentious than this one). What then is the security improvement for Python 2 users? In Tornado, I have not felt any urgency to replace wrap_socket with MemoryBIO. Is there a security-related reason I should do so sooner rather than later? (I'd also urge Cory and any other wrap_socket skeptics on the requests team to reconsider - Tornado's SSLIOStream works well. The asynchronous use of wrap_socket isn't so subtle and risky with buffering) -Ben On Fri, Jun 2, 2017 at 8:07 AM Antoine Pitrou <solipsis@pitrou.net> wrote:

On Jun 2, 2017 7:24 AM, "Ben Darnell" <ben@bendarnell.com> wrote: The PEP's rationale is now "This PEP will help facilitate the future adoption of :pep:`543` across all supported Python versions, which will improve security for both Python 2 and Python 3 users." What exactly are these security improvements? My understanding (which may well be incorrect) is that the security improvements come in the future when the PEP 543 APIs are implemented on top of the various platform-native security libraries. These changes will not be present until some future 3.x release, and will not be backported to 2.7 (without another PEP, which I expect would be even more contentious than this one). What then is the security improvement for Python 2 users? My understanding is that PEP 543 would be released as a library on pypi, as well as targeting stdlib inclusion in some future release. The goal of the MemoryBIO PEP is to allow the PEP 543 package to be pure-python, support all the major platforms, and straddle py2/py3 In Tornado, I have not felt any urgency to replace wrap_socket with MemoryBIO. Is there a security-related reason I should do so sooner rather than later? (I'd also urge Cory and any other wrap_socket skeptics on the requests team to reconsider - Tornado's SSLIOStream works well. The asynchronous use of wrap_socket isn't so subtle and risky with buffering) I'll leave the discussion of wrap_socket's reliability to others, because I don't have any experience there, but I do want to point out that that which primitive you pick has major system design consequences. MemoryBIO is an abstraction that can implement wrap_socket, but not vice-versa; if you use wrap_socket as your fundamental primitive then that leaks all over your abstraction hierarchy. Twisted has to have a MemoryBIO implementation of their TLS code, because they want to support TLS over arbitrary transports. Carrying a wrap_socket implementation as well would mean twice the tricky and security-sensitive code to maintain, plus breaking their current abstractions to expose whether any particular transport object is actually just a raw socket. The problem gets worse when you add PEP 543's pluggable TLS backends. If you can require a MemoryBIO-like API, then adding a backend becomes a matter of defining like 4 methods with relatively simple, testable interfaces, and it automatically works everywhere. Implementing a wrap_socket interface on top of this is complicated because the socket API is complicated and full of subtle corners, but it only has to be done once and libraries like tornado can adapt to the quirks of the one implementation. OTOH if you have to also support backends that only have wrap_socket, then this multiplies the complexity of everything. Now we need a way to negotiate which APIs each backend supports, and we need to somehow document all the weird corners of how wrapped sockets are supposed to behave in edge cases, and when different wrapped sockets inevitably behave differently then libraries like tornado need to discover those differences and support all the options. We need a way to explain to users why some backends work with some libraries but not others. And so on. And we still need to support MemoryBIOs in as many cases as possible. -n

On 2 June 2017 at 19:42, Victor Stinner <victor.stinner@gmail.com> wrote:
There's also a 4th option: * Introduce a dependency from requests onto PyOpenSSL when running in async mode on Python 2.7 in the general case, and figure out some other pip-specific option for ensurepip bootstrapping (like a *private* MemoryBIO implementation, or falling back to synchronous mode in requests) During the pre-publication PEP discussions, I kinda dismissed the PyOpenSSL dependency option out of hand due to the ensurepip bootstrapping issues it may introduce, but I think we need to discuss it further in the PEP as it would avoid some of the other challenges brought up here (Linux distro update latencies, potential complications for alternate Python 2.7 implementations, etc). For example: * if requests retains a synchronous mode fallback implementation, then ensurepip could use that in the absence of PyOpenSSL * even if requests drops synchronous mode entirely, we could leave the public ssl module API alone, and add an _ensurepip bootstrap module specifically for use in the absence of a full PyOpenSSL module If we adopted the latter approach, then for almost all intents and purposes, ssl.MemoryBIO and ssl.SSLObject would remain a Python 3.5+ only API, and anyone wanting access to it on 2.7 would still need to depend on PyOpenSSL. The benefit of making any backport a private API is that it would mean we weren't committing to support that API for general use: it would be supported *solely* for the use case discussed in the PEP (i.e. helping to advance the development of PEP 543 without breaking pip bootstrapping in the process). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Jun 2, 2017 at 1:29 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Whatever the eventual outcome, I don't think there's any danger someone will read this thread and think "wow, it's so easy to get new features into 2.7". -n -- Nathaniel J. Smith -- https://vorpus.org

On 3 June 2017 at 08:21, Nathaniel Smith <njs@pobox.com> wrote:
Aye, when "Be related to the Python ecosystem's collective handling of one of the essential network protocols that underpins the security of the entire internet" is a criterion to even *start* the backport discussion (let alone successfully make the case), I don't think "We might get deluged with further backport proposals" is really something we need to worry about :) Another key factor is being able to convince folks with some level of influence with commercial redistributors that the idea is worth supporting - the upstream backport in CPython is only step one in rolling out this kind of change, and we need to get it all the way through into the redistributor channels for it to really have the desired impact (hence PEP 493 as a follow-up to PEP 476 - at least some redistributors, including Red Hat, said "No" to the original hard compatibility break, but were OK with a smoother transition plan that gave end users more control over the timing of the change in behaviour). For the MemoryBIO and SSLObject backport, I'm still in the "Maybe" state myself. While I'm definitely sympathetic to the proposal (SSL/TLS is incredibly important, but cross-distro and cross-version SSL/TLS is already painful, and it's even worse once you bring cross-platform considerations and the sync/async split into the mix), it's an unfortunate fact of redistributor life that we don't typically have the luxury of pitching backport proposals to product management based on the benefits they offer to the upstream development community - we need to be able to articulate a clear and relatively immediate benefit to customers or commercial partners. For this PEP, "We want to make the new Python TLS abstraction API work" would potentially qualify as such a benefit (although I can't make any promises about that), but we're not even at that stage yet - that's a point we'd only reach if the library was initially built with a dependency on 3.5+ and 2.7.14+, *and* we either saw significant demand for that new API amongst customers, or else we genuinely needed to bring it in as a dependency of something else (and even then, we prefer approaches like a clearly distinct PyOpenSSL dependency that can be handled in a higher level product like OpenStack rather than needing to be baked directly into the operating system itself). As a result, as awkward and annoying as it would be, I'm wondering if the structure of the requests migration may need to be something like: - switch unconditionally to an async backend on Py3 - switch conditionally to an async backend on Py2 when PyOpenSSL is available - retain sufficient sync-only support on Py2 to allow pip to disable the PyOpenSSL dependency As a redistributor, ensuring that sufficiently recent versions of PyOpenSSL are available is likely to be a *far* more tractable problem than getting backports implemented at the standard library level (even as a private API), and even if that doesn't happen, developers that are using 3rd party modules like requests usually have significantly more freedom to install additional Python level dependencies than they do to upgrade their production Python runtime. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ultimately we don’t have the resources to do this. The requests core development team is 4 people, 3 of whom are part time best-effort maintainers and one of whom is me, who has foolishly decided to also work on PEP 543 as well as take over the lead maintenance role on urllib3. urllib3 has another two additional best-effort maintainers. We simply do not have the development resources to support two parallel branches of code. Even if we did, doing so is a recipe for adding bugs and inconsistencies between the two, leading to increased workload as we break our users and fail to keep the two branches in sync. It also makes it much harder for users to migrate from our synchronous backend to our async one, as those two no longer use identical underlying implementations and so will have subtle inconsistencies. The TL;DR there is: no, we’d rather stay sync only than do that. Cory

On 5 June 2017 at 18:42, Cory Benfield <cory@lukasa.co.uk> wrote:
Would you be OK with the notion of a "just for pip bootstrapping" private backend in _ensurepip_ssl? That is, the only officially supported async-backed requests configuration on Py2 would be with the PyOpenSSL dependency installed, but in collaboration with the pip devs we'd also plumb in the pieces to let a new async-backed requests work without any extension modules other than those in the standard library. That way, the only thing truly gated on the backport would be *pip* updating its bundled version of requests to the async-backed version - for normal third party use, the change would be "you need PyOpenSSL", rather than "you need a newer version of Python". We'd still effectively end up with two different code execution paths (one backed by PyOpenSSL, one backed by the new private _ensurepip_ssl extension module), but the level of divergence would be much lower (it's just a question of where MemoryBIO and SSLObject are coming from) and the support scope for the less frequently used path would be much narrower (i.e. if a problem report isn't related to pip bootstrapping, it can be closed as "won't fix") Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

It’s not clear to me what the structure of that looks like, or what work is required to achieve it. Right now Twisted never uses MemoryBIO and SSLObject: it always uses PyOpenSSL. That seems like we’d need to add MemoryBIO and SSLObject support to Twisted which can be enabled in some way other than feature detection (that is, so it can be installed). Without PEP 543 this is pretty gross. With PEP 543 it sucks less, but it also gates this work behind PEP 543 being successfully implemented and landed. I guess we are *open* to that approach? It’s not clear to me how beneficial that is, and it doesn’t gain any of the ecosystem benefits (no-one else on Py 2 can ever use this chunk of tested, understood code), but it’s certainly an option. The indirection gives me pause though. Cory

Is pip allowed to use the hypothetical _ensurepip_ssl outside of ensurepip? Thinking about this reminded me about the *other* reason pip avoids dependencies— avoiding making assertions about what versions of software can actually be installed. IOW if pip depends on pyOpenSSL >= X.Y, then you essentially can’t install any other version of pyOpenSSL and you’d be forced to upgrade. This isn’t end of the world and pyOpenSSL is generally stable enough we could *probably* get away with a unversioned dependency on a non Windows platform. On Windows we’d have to uh, I guess use a SChannel c-types backend? Not having our TLS library in the stdlib makes it a bit more difficult, but from pip’s own POV if there’s a MemoryBio that we’re allowed to use and then requests uses pyOpenSSL normally (we’d apply a small patch to pip’s copy of requests to make it use it, but that’s easy to do with our setup) I think that would be workable. I think that’s a less optimal solution than just accepting PEP 546 which I think is beneficial to the Python ecosystem as a whole, making it easier to port more code onto 3.x (and allowing unlocking 3.x’s capabilities) and will make it easier to maintain the ssl library on 2.7, given it will have less of a divergence from 3.x again. — Donald Stufft

On 5 June 2017 at 21:44, Donald Stufft <donald@stufft.io> wrote:
Is pip allowed to use the hypothetical _ensurepip_ssl outside of ensurepip?
Yes, so something like _tls_bootstrap would probably be a better name for the helper module (assuming we go down the "private API to bootstrap 3rd party SSL modules" path). The leading underscore would be akin to the one on _thread - it isn't that the API would be undocumented or inaccessible, it's that it wouldn't have any backwards compatibility guarantees for general use, so you shouldn't use it without a really good reason.
Indeed, and I think that's sufficient justification for at least adding the private TLS bootstrapping API.
Right, I'm just trying to make sure we clearly separate the two arguments: pip bootstrapping (which can potentially be addressed through a private alternate SSL/TLS API) and ecosystem advancement (making it easier for SSL/TLS capable applications to retain Python 2.7.x compatibility at least until 2020, while still supporting the improvements in asynchronous IO capabilities in the 3.x series). Separating the two main options like that gives us two future scenarios: 1. Private TLS bootstrapping API - requests & any other libraries that want this functionality would need a dependency on PyOpenSSL for Python versions prior to 3.5 - the base SSL/TLS support in Python 2.7 remains focused on synchronous IO, with only limited support for sans-io style protocol library implementations - some time in 2017 or 2018, pip starts requiring that environments provide either an externally managed _tls_bootstrap module, *or* a sufficiently recent PyOpenSSL - the first CPython maintenance release to include that pip update would also provide the required _tls_bootstrap module by default - redistributors will be able to backport _tls_bootstrap as an independent module (e.g. as part of their system pip packages) rather than necessarily including it directly in their Python runtime - this distribution model should be resilient over time, allowing _tls_bootstrap to be rebased freely without risking breaking end user applications that weren't expecting it (this approach would likely even be useful in LTS 3.x environments as they age - e.g. Ubuntu 14.04 and 16.04) - the private _tls_bootstrap API would either be a straight backport of the 3.6 ssl module, or else a shim around the corresponding standard library's ssl module to add the required features, whichever was judged easiest to implement and maintain 2. Public standard library ssl module API update - libraries needing the functionality *either* depend on PyOpenSSL *or* set their minimum Python version to 2.7.14+ - if libraries add a Python version check to their installation metadata, LTS distributions are out of luck, since we typically don't rebase Python (e.g. the RHEL 7 system Python still advertises itself as 2.7.5, even though assorted backports have been applied on top of that) - even if libraries rely on API feature detection rather than explicit version checks, redistributors still need to actually patch the standard library - we can't just add a guaranteed-side-effect-free module as a separate post-installation step - while some system administrators may be willing and able to deploy their own variant a _tls_boostrap module in advance of their redistributors providing one, relatively few are going to be willing to apply custom patches to Python before deploying it So honestly, I don't think the current "ecosystem advancement" argument in PEP 546 actually works, as the enhancement is *significantly* easier to roll out to older 2.7.x releases if the adoption model is "Supporting SSL/TLS with sans-io style protocol implementations on Python versions prior to Python 3.5 requires the use of PyOpenSSL or a suitable PEP 543 compatible library (unless you're a package management tool, in which case you should check for a separate platform provided Python version independent _tls_bootstrap module before falling back to PyOpenSSL)". As long as we actually document the _tls_bootstrap API (even if that's just a cross-reference to the Python 3.6 ssl module documentation), then folks wanting to share asynchronous IO code between Python 2 & 3 will have the following options: - use wrap_socket() (as Tornado does today) - depend on PyOpenSSL (as Twisted does today) - use _tls_bootstrap if available, fall back to PyOpenSSL otherwise <-- new piece added by PEP 546 due to the problems around pip attempting to upgrade its own dependencies Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jun 5, 2017 7:01 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote: On 5 June 2017 at 21:44, Donald Stufft <donald@stufft.io> wrote:
Is pip allowed to use the hypothetical _ensurepip_ssl outside of ensurepip?
Yes, so something like _tls_bootstrap would probably be a better name for the helper module (assuming we go down the "private API to bootstrap 3rd party SSL modules" path). It seems like there's a risk here that we end up with two divergent copies of ssl.py and _ssl.c inside the python 2 tree, and that this will make it harder to do future bug fixing and backports. Is this going to cause problems? Is there any way to mitigate them? -n

On 6 June 2017 at 10:59, Nathaniel Smith <njs@pobox.com> wrote:
Aye, I spent some time thinking about potentially viable implementation architectures, and realised that any "private API" style solution pretty much *has* to be structured as a monkeypatching API to be effective. Otherwise there is too much risk of divergence in things like exception definitions that end up causing cryptic "Why is my SSL exception handler not catching my SSL connection error?" type bugs. The gist of the approach would be that for libraries and non-bootstrapped applications, their SSL/TLS feature detection code would ultimately look something like this: try: # Don't do anything special if we don't need to from ssl import MemoryBIO, SSLObject expect ImportError: # Otherwise fall back to using PyOpenSSL try: from OpenSSL.SSL import Connection except ImportError: raise ImportError("Failed to import asynchronous SSL/TLS support: <details>") It's currently more complex than that in practice (since PyOpenSSL doesn't natively offer an emulation of the affected parts of the standard library's ssl module API), but that's what it would effectively boil down to at the lowest level of any compatibility wrapper. Conveniently, this is *already* what libraries and non-bootstrapped modules have to do if they're looking to support older Python versions without requiring PyOpenSSL on newer releases, so there wouldn't actually be any changes on that front. By contrast, for bootstrapped applications (including pip) the oversimplified compatibility import summary would instead look more like this: try: # Don't do anything special if we don't need to from ssl import MemoryBIO, SSLObject expect ImportError: # See if we can do a runtime in-place upgrade of the standard library try: import _tls_bootstrap except ImportError: # Otherwise fall back to using PyOpenSSL try: from OpenSSL.SSL import Connection except ImportError: raise ImportError("Failed to bootstrap asynchronous SSL/TLS support: <details>") else: _tls_bootstrap.monkeypatch_ssl() The reason this kind of approach is really attractive to redistributors from a customer risk management perspective is that like gevent's monkeypatching of synchronous networking APIs, it's *opt-in at runtime*, so the risk of our accidentally inflicting it on a customer that doesn't want it and doesn't need it is almost exactly zero - if none of their own code includes the "import _tls_bootstrap; _tls_bootstrap.monkeypatch_ssl()" invocation and none of their dependencies start enabling it as an implicit side effect of some other operation, they'll never even know the enhancement is there. Instead, the compatibility risks get concentrated in the applications relying on the bootstrapping API, since the monkeypatching process is a potentially new source of bugs that don't exist in the more conventional execution models. The reason that's still preferable though, is that it means the monkeypatching demonstrably doesn't have to be 100% perfect: it just has to be close enough to the way the Python 3 standard library API works to meet the needs of the applications that actually use it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Jun 5, 2017 at 8:49 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
OK. To unpack that, I think it would mean: 2.7's ssl.py and _ssl.c remain exactly as they are. We create a new _tls_bootstrap.py and __tls_bootstrap.c, which start out as ~copies of the current ssl.py and _ssl.c code in master. (Unfortunately the SSLObject code can't just be separated out from _ssl.c into a new file, because the implementations of SSLObject and SSLSocket are heavily intertwined.) Then the copied files are hacked up to work in this monkeypatching context (so that e.g. instead of defining the SSLError hierarchy in C code, it gets imported from the main ssl.py). So: we end up with two copies of the ssl code in the py27 tree, both diverged from the copy that in the py3 tree, in different ways. I know that the ssl maintainers won't be *happy* about this given that keeping the py2 and py3 code similar is an ongoing issue and was one of the stated advantages of backporting SSLObject, but I don't know whether it's a showstopper or not; it'd be good to hear their perspective. -n -- Nathaniel J. Smith -- https://vorpus.org

With Nick's suggestion of _tls_bootstrap, this has certainly become more complicated. I'm still thinking of the ramifications for future Jython 2.7 support, if Python dev goes down this path. It still seems easier to me to avoid exposing the SSLObject/MemoryBIO model to 2.7 and instead concentrate on just having native TLS support. Yes, this might require more work than a simple backport, but everyone is resource constrained, not just the Requests or Jython teams. My concrete suggestion is that any work on PEP 543 will benefit from improving the testing currently found in test_ssl as part of its implementation. For a variety of reasons, test functions like ssl_io_loop ( https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691) avoid asserting more than it can properly manage the framing of wrapped/unwrapped data, so for example one doesn't want to state explicitly when SSL_ERROR_WANT_READ would be called (too much white box at that point). On the other hand, the lack of unit testing of, for example, SSLObject.read, but instead only doing at the functional test level, means that there's nothing explicit testing "will raise an SSLWantReadError if it needs more data than the incoming BIO has available" (per the docs), or similar use of signaling (SSLEOFError comes to mind). The problem we have seen with Jython supporting similar functionality in the past in socket/select/ssl is not the happy path aspects like what ssl_io_loop tests, but underdocumented/undertested error states. So what benefits SChannel support, for example, should benefit Jython 3's implementation that would use Java's counterpart to SSLObject, SSLEngine. That would be a very good outcome of these proposed PEPs. - Jim On Tue, Jun 6, 2017 at 4:08 AM, Nathaniel Smith <njs@pobox.com> wrote:

On 6 Jun 2017, at 18:49, Jim Baker <jim.baker@python.org> wrote:
With Nick's suggestion of _tls_bootstrap, this has certainly become more complicated. I'm still thinking of the ramifications for future Jython 2.7 support, if Python dev goes down this path. It still seems easier to me to avoid exposing the SSLObject/MemoryBIO model to 2.7 and instead concentrate on just having native TLS support. Yes, this might require more work than a simple backport, but everyone is resource constrained, not just the Requests or Jython teams.
Oh. This hadn’t occurred to me until just now, but doesn’t PyOpenSSL (or anything built on CFFI) just basically not run on Jython? Or am I mistaken?
My concrete suggestion is that any work on PEP 543 will benefit from improving the testing currently found in test_ssl as part of its implementation. For a variety of reasons, test functions like ssl_io_loop (https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691 <https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691>) avoid asserting more than it can properly manage the framing of wrapped/unwrapped data, so for example one doesn't want to state explicitly when SSL_ERROR_WANT_READ would be called (too much white box at that point). On the other hand, the lack of unit testing of, for example, SSLObject.read, but instead only doing at the functional test level, means that there's nothing explicit testing "will raise an SSLWantReadError if it needs more data than the incoming BIO has available" (per the docs), or similar use of signaling (SSLEOFError comes to mind).
Yeah, PEP 543 just basically assumes the stdlib’s testing of TLS doesn’t exist and that it’ll have to manufacture its own. Unfortunately, because it is attempting to abstract across many different implementations the tests will need to be fairly high-level: for example, there is no consistent way to discuss the actual size of the buffers in the buffer-based TLS implementation, and they are allowed to be unbounded buffers, so tests cannot validate that TLSWantReadError and TLSWantWriteError are ever actually raised: all they can do is run tests that will handle them in the even that they are raised and confirm the data is transferred appropriately. Cory

On Wed, Jun 7, 2017 at 2:31 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
Sorry, I wish this were true, but PyOpenSSL is not available on Jython, because we do not yet support CFFI for Jython. CFFI support is something we have looked at, but we have not implemented. (There is a related and far more ambitious project to implement support for the C Extension API, http://jyni.org/, which Stefan Richthofer is working on with me under the Google Summer of Code.) Having said that, bundling PyOpenSSL for use by pip is something that we would *not* want to do for Jython itself. We want to use the native Java ecosystem where possible for built-in functionality, in part because we already have native support already to the underlying system trust store on *all platforms*. (Java development did all the heavy lifting for us.) Instead our current implementation of socket/ssl/select is built on Netty, plus some additional work for managing SSLContext in such a way that is compatible with Python. There is an out-of-date doc I prepared describing what was done (but broad aspects are still current): https://github.com/ jimbaker/socket-reboot
Right, so such improved testing, regardless of level, will still be an improvement, and we look forward to seeing it for use by Jython as well. I should mention that we sometimes see undocumented functionality leak out. For example, https://docs.python.org/3/library/socket.html#socket.socket.listen doesn't say anything about backlog=0, but the requests test suite (last time I looked on Jython) now depends on it. We assumed it was something like http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html, but as described in http://bugs.python.org/issue8498, it now means that "a server application in python that accepts exactly 1 connection", by passing to the underlying C. More tests, more docs, please, even though of course that's a lot of dev effort. - Jim

On 7 Jun 2017, at 20:06, Jim Baker <jim.baker@python.org> wrote:
Sorry, I wish this were true, but PyOpenSSL is not available on Jython, because we do not yet support CFFI for Jython. CFFI support is something we have looked at, but we have not implemented. (There is a related and far more ambitious project to implement support for the C Extension API, http://jyni.org/ <http://jyni.org/>, which Stefan Richthofer is working on with me under the Google Summer of Code.)
This is what I was worried about. Moving to require PyOpenSSL *also* locks us out of Jython support, at least for the time being. That’s another point in the “con” column for making PyOpenSSL a mandatory dependency.
I should mention that we sometimes see undocumented functionality leak out. For example, https://docs.python.org/3/library/socket.html#socket.socket.listen <https://docs.python.org/3/library/socket.html#socket.socket.listen> doesn't say anything about backlog=0, but the requests test suite (last time I looked on Jython) now depends on it. We assumed it was something like http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html <http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html>, but as described in http://bugs.python.org/issue8498 <http://bugs.python.org/issue8498>, it now means that "a server application in python that accepts exactly 1 connection", by passing to the underlying C. More tests, more docs, please, even though of course that's a lot of dev effort.
Ah, yes, we do. In our defense, this is the semantic of the listen syscall, and the socket module is generally a pretty thin wrapper around most of the socket syscalls. But in hindsight this is definitely the kind of thing that gets tricky for someone trying to reimplement the socket module in terms of a different abstraction. I don’t want to dive down this rabbit hole because if we do I’ll have to start talking about how the complexity of the socket API is one of the other implementation hurdles for PEP 543, but I think that’s a conversation for another time. Cory

2017-06-08 10:30 GMT+02:00 Cory Benfield <cory@lukasa.co.uk>:
Even if we do backport MemoryBIO to the next Python 2.7.14, I don't think that you can require MemoryBIO. What about all existing operating systems which provide a Python 2.7 without MemoryBIO? You need to have a workaround anyway. For example, make the new asynchronous API optional and use the old blocking mode in the meanwhile. Victor

I mentioned it earlier, but using the current download numbers from PyPI, <2.7.9 is rapidly dropping and is likely going to be single digit %age within the next 6-8 months IIRC. If 2.7.14 follows a similar trajectory, requests can depend on it within like… 2 years? Maybe 3? Likely it will depend on whether 2.7.14 gets into the next Ubuntu LTS or not. — Donald Stufft

Maybe the intent of my PEP is unclear: the goal is not to allow Requests to require MemoryBIO, but to get a wide adoption of a future implementation of the new TLS API (PEP). IMHO having an implementation working on the latest Python 2.7 version should make it possible to use it on some kinds of applications. I should take time to read the last messages in this thread and try to summarize them in the PEP ;-) Victor 2017-06-08 13:06 GMT+02:00 Donald Stufft <donald@stufft.io>:

On 8 June 2017 at 21:30, Victor Stinner <victor.stinner@gmail.com> wrote:
I think the short version of the main unanswered questions would be: 1. How hard would it be to define a full wheel-installable "backport.ssl" module that made the latest 3.x ssl module API available to Python 2 applications? 2. Given such a module, how hard would it be to add an "install()" function that added it to sys.modules as the "ssl" module? While that wouldn't allow pip or requests to depend on it without some solution to the bootstrapping problem, it would clearly separate the two questions, and also give us an architecture closer to the way importlib2 works (i.e. if you want to make that the implementation of import in a Python 2 application, you have to explicitly enable it at runtime, but you can also install it whenever you want, rather than having to rely on your Python implementation to provide it). We'd also still be left with the probably of figuring out how to handle Jython, but I figure we can consider that part of the bootstrapping problem: how to let applications reliably get hold of "backport.ssl" under a name provided by the host Python 2.7 implementation, rather than the normal pip-installable name. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I'm just going to straight up admit that I've lost track of the point of this thread. It sounds like we don't *need* to backport any of ssl into the Python 2.7 standard library, as long as we can bundle a 3rd-party backport for pip? I assume that, at a high level, the operation needed is to download content over https using the system trust stores. Is that what we're trying to achieve here? If it is, can we just provide an enhanced urlretrieve()? Certainly on Windows, and presumably on macOS, it's much easier to do the high-level GET operation than to reimplement it using primitives. As far as I can tell (bearing in mind that my brain can't keep track of this thread anymore), the only argument against this is if someone wants sensible default behaviour with local configuration tweaks: treat one specific certificate as valid while also treating the system (and user) certificate stores as valid without actually putting your certificate into the system (or user) store. My gut reaction to that is to say "too bad - choices are 100% system configuration or 100% custom configuration". I suspect that's not a suitable reaction, but I can't explain why. (And bear in mind that the current state of the ssl module on Windows will never get 100% system configuration.) Can someone explain to me why pip can't use a simple "system store urlretrieve()" function without configuration and "OpenSSL urlretrieve()" function with fully custom configuration? Even if only to bootstrap something that *can* merge two entirely different configuration systems? Cheers, Steve

On 8 June 2017 at 17:40, Steve Dower <steve.dower@python.org> wrote:
I'm just going to straight up admit that I've lost track of the point of this thread.
You have my sympathies - I'm not really following it either :-(
It sounds like we don't *need* to backport any of ssl into the Python 2.7 standard library, as long as we can bundle a 3rd-party backport for pip?
My understanding is that the PEP is asking to backport a new feature. The problem is that as a new feature, this received some (justifiable) pushback. The arguments for why this new feature is needed then got messy - as I understand it, it's something to do with how the requests library moves forward - they are blocked from supporting newer async features from 3.x because they need to support 2.7. This feature would relieve that logjam for them. Obviously, as a 3rd party library, their issues aren't particularly compelling for the stdlib - but because pip uses requests, and pip is shipped with Python, things get complicated.
The problem is that pip uses more features of requests than just issuing GET requests. We aren't going to be in a position to switch to a simple urlretrieve operation, even as some sort of fallback. What I'm personally not at all clear on is why we can't just ship a version of pip that supports 2.7 with 2.7, and a later version with 3.x. That doesn't make the problem for pip and requests any easier, but it does make it not python-dev's problem. The issue is that the gulf between 2.7 and 3.x is getting wider, and it's starting to cause real pain to 3rd party projects like requests. Does that justify backporting this specific feature to 2.7? I don't know. Note that I haven't actually read the original PEP. I don't have a view on networking issues, security, or Python 2.7 support. So I didn't really feel the need to more than skim this thread. My only interest really is where pip gets involved - and that's where I get confused as I don't really see why (ensure)pip makes the problem so much more complicated. Paul PS I'd be amazed if my summary above isn't wrong in at least some key points. Corrections welcome!

The basic yak stak here is: * PEP 543 should be the future, it is a much much better way of handling TLS than our current ssl module is. * Cory can’t spend his work time on PEP 543 unless he can say it is useful for requests. * In order for PEP 543 to be useful for requests, he needs a way to provide a backport for it for Python 2.7. * This backport *CAN* be OpenSSL only, but needs to be able to provide the same API. * PEP 543 wants to work with MemoryBIOs instead of sockets, because a MemoryBio is a much much better way of implementing this problem for a variety of reasons, and it would be a mistake to use a socket primitive again. * Indepently, requests also wants to be able to provide the ability for people to use it with asyncio, however it can’t drop support for Python 2.7 in the quest for doing that. Twisted provides a way forward that lets requests work on both 2.x and 3.x and integrate with asyncio, but Twisted requires MemoryBio to do so. * pyOpenSSL *could* be used to provide the MemoryBio needed on 2.7 for both cases from up above, however, pip cannot depend on a C library that isn’t part of the standard library - in addition this would break alternative runtimes like Jython where pyOpenSSL doesn’t work. Thus, adding MemoryBio to 2.7’s ssl means that requests can use it instead of depending on a C package (which it can’t because of pip), which means that Cory can then justify working on PEP 543 as part of his requests work, because he can say on Python 3.7, We use PEP 543 natively, and on Python < 3.7 we either no longer support that Python or we can wrap ssl.MemoryBio using a pure Python backport shim that provides the same API as PEP 543. Indendently of that, adding MemoryBio to 2.7’s ssl means that Twisted can use it instead of depending on a C package, which means requests can depend on Twisted (which it otherwise can’t, again because of pip). This then means that requests can refactor itself internally to use Twisted to write asyncio compatible code that provides a synchronous API and a asynchronous API that works on 2.7 and 3.x with asyncio. All of the other options require effectively forking the code or the ecosystem by either having “this library you use for sync” and “the library you use for async” largely duplicating code OR requires all of the network libraries to drop support for 2.7 (can’t, most of their users are on 2.7 still) or requires forking the library to have a 2.x and a 3.x version (we tried this before early on in the 3.x split and we settled on the fact that a single code base is a much better way to handle straddling the 2.x/3.x line). So basically back porting MemoryBio unlocks two important things for the health of the Python ecosystem: * Allows forward progress on PEP 543, which provides a wealth of great benefits like using the platform trust model and removing the need for pip, requests, etc to bundle a CA bundle internally and removing the need (long term anyways) for Python to ship a copy of OpenSSL on platforms that don’t provide it. * Allows requests and other libraries to continue to straddle the 2.x/3.x line where they need to, while still providing people who are using Python 3.x a way to use asyncio without having to fork the entire ecosystem into having an aio* copy of every single network library that exists.
Can someone explain to me why pip can't use a simple "system store urlretrieve()" function without configuration and "OpenSSL urlretrieve()" function with fully custom configuration? Even if only to bootstrap something that *can* merge two entirely different configuration systems?
It would require rewriting significant parts of our code that interfaces with HTTP, including the fact that we rely on some additional requests libraries (like cache control) to implement HTTP caching in pip, so unless urlretireve supported that as well we’d have to completely rewrite that from scratch. — Donald Stufft

Sorry I forgot one other important benefit: * It reduces the delta between the 3.x and the 2.x ssl and _ssl modules, which makes actually maintaining those modules easier because this code is fiddly and hard to get right, so the more we can just directly backport security fixes from one to the other rather than having to rewrite the patch, the better off we are. And the downside here is pretty small honestly? It’s not changing behavior of anything that currently exists since it’s adding a new thing inside the ssl module and Alex has already written the patch so there’s little extra work to do and it actually makes maintenance easier since patches can more readily be applied straight from `master`. The primary argument I can see against it, is one of purity, that 2.7 shouldn’t get new features but as we know, practicality beats purity ;) (and we’ve already accepted that TLS is a special case, special enough to break the rules, so the main question is whether this specific thing is worthwhile— which given it’s benefits to the Python ecosystem and to maintaining Python, I think it is). — Donald Stufft

On 9 June 2017 at 05:51, Donald Stufft <donald@stufft.io> wrote:
And the downside here is pretty small honestly?
Unfortunately, one of the downsides is "Doesn't provide any clearly compelling benefits to users of LTS distributions, so even those of us in a position to advocate for such backports probably won't win the related arguments". So the question is whether or not we're OK with saying that users affected by that will need to switch to a different set of Python binaries to get the latest pip & requests (e.g. upgrading their base distro, or else adopting Red Hat SCLs or one of the cross-platform, and hence cross-distro, distributions).
It isn't a purity argument so much as a "Will the change reach a sufficiently large proportion of 2.7 users to actually solve the compatibility problem it is aiming to solve?" argument (There's a *bit* of a purity argument behind that, in that the PEP 466 backport *did* break customer applications due to the incompatibilities with some of the async frameworks that were using underscore prefixed APIs that changed behaviour, so our credibility for "trust us, it won't break anything" is low in this context, but it's not the primary objection) I'm 100% confident that we'd reach a large enough audience with a compatibility shim that gets installed on disk as something other than "ssl" (e.g. "backport.ssl" or "_ssl_bootstrap"), and that has the virtue of enabling a multi-tiered distribution approach: - backport.ssl on PyPI as an independently installed module - _ssl_bootstrap as a runtime or redistributor provided module This approach also allows the API to be updated as necessary to meet the needs of PEP 543, without needing to go through the full PEP process again. The downside is adding a 3rd stack to maintain (Py3, Py2-compatible Py3 backport, Py2) without making maintenance on the Py2 stack any easier. I'm markedly less confident of reaching a sufficiently large audience with a "So stop using the system Python, then" approach (Don't get me wrong, I'd love for people to stop using the system Python for things that aren't part of the operating system, but people also *really like* the convenience of doing that). OTOH, if the aim is to make the change now, so it gets into Ubuntu 18.04, with a view to projects only starting to fully rely on it in mid-to-late 2018 or so? That timeline might actually work, and this approach has the benefit of actually making the Py2 SSL stack easier to maintain between now and 2020. So honestly, I'd be +1 for either approach: - stdlib backport to make dual-stack maintenance easier for the current volunteers, and we'll see how things work out on the ease-of-adoption front - PyPI backport to make 2.7 adoption easier, and we'll continue pestering redistributors to actually fund maintenance of Python 2.7's SSL stack properly (and encourage customers of those redistributors to do the same) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Cory can correct me if I’m wrong, but yea that’s the general idea here. We need a tree, and the best time to plant a tree is 20 years ago, but the second best time to plant a tree is today ;) I don’t think anyone is going to immediately start depending on a hypothetical 2.7.14 with MemoryBio support, but getting it out now means that Cory can start working on PEP 543 as part of a longer term effort for requests as well as starting to work on async-ifying requests. Neither of those efforts are going to be quick or simple tasks, so the work to make them possible can happen while we wait for adoption to naturally grow on 2.7.14. — Donald Stufft

On 09Jun2017 0343, Nick Coghlan wrote:
My draft reply to Donald sat overnight, so I abandoned it in favour of agreeing with Nick. I'm in principle in favour of anything that makes 2.7 less of a burden to maintain (up to and including EOL :) ), so if backporting parts of ssl/_ssl makes that easier then I'm +0. However, I do prefer the PyPI backport with some tool bundled in order to obtain it. In fact, given the nature of OpenSSL, I'd be in favour of that approach for all versions of Python (at least on Windows it would likely work well - probably less so on other platforms where we couldn't include a prebuilt fallback easily, though those tend to include compilers...). That hypothetical "_ensuressl" module in my mind really doesn't have to do much other than determine which file to download and then download and extract it, which can be done with OS level tools rather than needing our own stack. It may also be the necessary mechanism to make ssl pip-updateable, since we have the locking problem that prevents it being possible normally. Cheers, Steve

A ensuressl style module that tries to install an OpenSSL module is actually fairly hard to do securely. The fundamental issue being that fetching a file securely from the network before you have the primary tool for fetching things securely from a network gets to be a bit of a chicken and egg problem. There are *possible* ways around that, but they essentially boil down to having to ship something akin to a CA Bundle inside of Python which we just shift the responsibility from needing to have an updated OpenSSL in all versions of Python. The good news here is that PEP 543 is the thing that will enable using SChannel on Windows, SecureTransport on macOS, and OpenSSL on Linux getting python-dev out of the business of shipping an OpenSSL at all [1]. Backporting ssl.MemoryBio represents the smallest reasonable change to 2.7 we can make [2] that paves the path forward to being able to do that in some number of years, because we can’t do that until PEP 543 is a thing and people are using it. [1] Ok, the ``ssl`` module is still going to be ssl dependent, and there is a lot of code out there using it so we would need to keep it for awhile to maintain compatibility…. but at some point we can stop bundling it in our installers under the guise of “we have a newer, better API for handling this, and if you still need 2.7 compatibility, here’s the pip installable libraries you need to install to get it”. [2] Where reasonable means, it’s not sacrificing good things or making things harder to use for the sake of being smaller. — Donald Stufft

On 09Jun2017 1118, Donald Stufft wrote:
One of the reasons I'm wanting to push this way is that there are other APIs on Windows to do the core client use cases we seem to care about. For example, https://msdn.microsoft.com/en-us/library/windows/desktop/aa384106(v=vs.85).a... could easily let us implement the requests API with a direct call to the operating system, without having to get into the SChannel world of hurt. I assume macOS has similarly high-level APIs. These are totally fine for implementing a requests-like API that relies on system configuration for HTTPS connections. They are not sufficient for building a server, but they should be sufficient for downloading the packages you need to build a server. This is starting to feel to me like we're stuck on "building the thing right" and have skipped over "building the right thing". But maybe it's just me... Cheers, Steve

For the purposes of this PEP I think we should exclude this. The only way this works on a decent number of platforms (hi there BSDs and Linuxes) is if the option on those platforms is libcurl. Linux does not ship a high level HTTP client library: libcurl is basically the option. That would require pip binding libcurl, NSURLSession, and the Windows API. It’s not clear to me that the solution to “we don’t want to backport some SSL code” is “write a bunch of bindings to other third-party libraries”. Cory

I’m personally not super interested in having platform specific HTTP handling. I honestly want as little platform specific code as I can get away with, and the primary reason why I’m okay with drawing the line at the TLS stack (instead of the HTTP stack) is because being able to get operating system updates for the TLS stack is of immense benefit in addition to the fact that it’s really the *only* reasonable way to get access to the platform specific trust store. Relying on the OS to handle HTTP means that we cannot depend on anything that isn’t the lowest common denominator across all of the variety of platforms we support, which (as I see Cory has mentioned) isn’t really a thing outside of macOS/Windows. For those platforms we’d just be adding yet another C dependency to Python (or pip, or requests, or whatever). There’s a lot of HTTP handling code in pip that would need to be rewritten to support this, and I don’t think it would be of a particular benefit much less be doable in any reasonable timeframe. This isn’t really being particularly innovative in this area, it’s essentially what the browsers are doing. It also matches what curl is doing. All that being said, if someone *does* want pip to use WinHTTP, requests provides a mechanism where you can plug in your own network handling code, so someone could write a requests-winhttp adapter that did that, and we could possibly even expose the ability to ask pip to use that. However that’s entirely in the domain of pip/requests feature requests and not really related to python-dev/this PEP itself. — Donald Stufft

On 9 Jun 2017, at 20:54, Donald Stufft <donald@stufft.io> wrote:
All that being said, if someone *does* want pip to use WinHTTP, requests provides a mechanism where you can plug in your own network handling code, so someone could write a requests-winhttp adapter that did that, and we could possibly even expose the ability to ask pip to use that. However that’s entirely in the domain of pip/requests feature requests and not really related to python-dev/this PEP itself.
As an FYI I played with this on Mac far enough to prove that it’d work: https://github.com/Lukasa/requests-darwin <https://github.com/Lukasa/requests-darwin> It’s not anywhere near feature complete, but it basically works pretty ok. Of course, ideally Requests would just get entirely out of the way at this point because it duplicates a lot of NSURLSession’s logic, but it’s an approach that can work. Cory

On Jun 09, 2017, at 08:43 PM, Nick Coghlan wrote:
As I've said, I'm okay with this approach as long as we don't expose new public APIs in 2.7. That won't prevent other folks from using the private APIs, but we have no responsibility to support them.
Of course, lots of distributions are completely voluntary, so that kind of pestering falls on underfunded ears. ;) But a PyPI backport might still be useful since those third party packages can be distro-fied into current and backported channels. -Barry

On 08Jun2017 1237, Donald Stufft wrote:
Awesome, this is exactly what I needed to see. What if Python 2.7 just exposed the OpenSSL primitives necessary so that ctypes could use them? Is that a viable approach here? Presumably then a MemoryBIO backport could be pure-Python. It doesn't help the other *ythons, but unless they have MemoryBIO implementations ready to backport then I can't think of anything that will help them short of a completely new API. Cheers, Steve

I would have to let Cory answer the question about feasibility here since he’s much more familiar with OpenSSL’s API (and even binding something like this with ctypes) than I am. The first thing that really stands out to me though is it just feels a bit like shuffling deckchairs? At least I don’t see what adding the new feature of “thing you can use ctypes with to get a MemoryBio” does that adding the ssl.MemoryBio feature doesn’t other than downsides: * It creates yet another divergence between 3.x and 2.x that makes it harder to maintain ssl and _ssl. * It’s harder to use for requests/Twisted/etc. * It’s not already written (Ok this is minor, but still!). What do you see the benefits of that approach being over just backporting ssl.MemoryBio? — Donald Stufft

The short answer is that, while it’s do-able, we have some problems with ABI compatibility. OpenSSL 1.1 and 1.0 are ABI incompatible, so I have to write divergent ctypes code to handle each case. It may also be relevant to support OpenSSL 0.9.x so we roll into the same ABI compatibility concern all over again. Doubly annoyingly a bunch of OpenSSL code in 1.0 is actually macros which don’t work in ctypes so there’ll be a lot of futzing about in structures to get what I need to do done. This also doesn’t get into the difficulty of some operating systems shipping a LibreSSL masquerading as an OpenSSL, which is subtly incompatible in ways I don’t fully understand at this time.

On Thu, 8 Jun 2017 09:30:37 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
Ah, yes, we do. In our defense, this is the semantic of the listen syscall,[...]
According to POSIX, the backlog is only a hint, i.e. Jython is probably ok in not observing its value: """The backlog argument provides a hint to the implementation which the implementation shall use to limit the number of outstanding connections in the socket's listen queue.""" http://pubs.opengroup.org/onlinepubs/9699919799/functions/listen.html Regards Antoine.

On Tue, Jun 6, 2017 at 10:49 AM, Jim Baker <jim.baker@python.org> wrote:
Yeah, presumably the preferred way to do PEP 543 on Jython would be to have a native backend based on javax.net.ssl.SSLEngine. Not sure how that affects the MemoryBIO discussion – maybe it means that Jython just doesn't need a MemoryBIO backport in the stdlib ssl module, because everyone who might use it will find themselves using SSLEngine instead.
Another testing challenge is that the stdlib ssl module has no way to trigger a renegotiation, and therefore there's no way to write tests to check that it properly handles a renegotiation, even though renegotiation is by far the trickiest part of the protocol to get right. (In particular, renegotiation is the only case where attempting to read can give WantWrite and vice-versa.) -n -- Nathaniel J. Smith -- https://vorpus.org

2017-06-07 10:56 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
Renegociation was the source of a vulnerability in SSL/TLS protocols, so maybe it's a good thing that it's not implemented :-) https://www.rapid7.com/db/vulnerabilities/tls-sess-renegotiation Renegociation was removed from the new TLS 1.3 protocol: https://tlswg.github.io/tls13-spec/ "TLS 1.3 forbids renegotiation" Victor

On Jun 7, 2017 6:29 AM, "Victor Stinner" <victor.stinner@gmail.com> wrote: 2017-06-07 10:56 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
Renegociation was the source of a vulnerability in SSL/TLS protocols, so maybe it's a good thing that it's not implemented :-) https://www.rapid7.com/db/vulnerabilities/tls-sess-renegotiation Renegociation was removed from the new TLS 1.3 protocol: https://tlswg.github.io/tls13-spec/ "TLS 1.3 forbids renegotiation" Oh, sure, renegotiation is awful, no question. The HTTP/2 spec also forbids it. But it does still get used and python totally implements it :-). If python is talking to some peer and the peer says "hey, let's renegotiate now" then it will. There just aren't any tests for what happens next. Not that this has much to do with MemoryBIOs. Sorry for the tangent. -n

On 6 June 2017 at 20:08, Nathaniel Smith <njs@pobox.com> wrote:
If we went down this path, I think the only way to keep it maintainable would be to do one of two things: 1. Do the refactoring in Py3 first, but with the namespace pairing being "ssl/_enable_ssl_backport" rather than "_tls_bootstrap/ssl" 2. Don't try to share any implementation details with Py2, and instead provide a full "_ssl_backport" module more along the lines of importlib2 With the second option, we'd just do a wholesale backport of the 3.6 SSL module under a different name, and projects like pip would use it as follows: # See if there is an updated SSL module backport available try: import _ssl_backport except ImportError: pass else: _ssl_backport.install() # Fails if ssl or _ssl were already imported # At this point, sys.modules["ssl"] and sys.modules["_ssl"] would match the # API of the Python 3.6 versions, *not* the Python 2.7 versions. # At the first 2.7.x maintenance release after 3.7, they'd be updated # to match 3.7, and then the same again for 3.8. # (3.9 is likely to be after the end of 2.7 maintenance) # After that, proceed as usual with feature-based checks try: from ssl import MemoryBIO, SSLObject expect ImportError: # Otherwise fall back to using PyOpenSSL try: from OpenSSL.SSL import Connection except ImportError: raise ImportError("Failed to bootstrap asynchronous SSL/TLS support: <details>") Since there's no chance of breaking applications that don't opt in, while still making the newer network security features available to customers that want them, I think I could make a reasonable case for backporting such an "_ssl_backport.install()" implementation into the RHEL 7 system Python, but even if I wasn't able to do that, this kind of runtime injection into sys.modules could still be shipped as an add-on library in a way that an updated standard library SSL module can't. Such an approach would make the 2.7 regression tests noticeably slower (since we'd need to re-run a subset of them with the backport being injected by regrtest before anything loads the ssl module), and means backporting SSL fixes to 2.7 would become a two step process (once to resync the _ssl_backport module, and the second to update the default implementation), but I think there would be enough benefits to make that worth the pain: - we'd have a baseline 2.7 SSL/TLS implementation that we can readily keep in line with its 3.x counterparts - this approach has a reasonable chance of making it through the review processes for long term support distributions - system and application integrators can more readily do their own backports independently of redistributors Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

It’s not just bootstrapping that pip has a problem with for C extensions, it also prevents upgrading PyOpenSSL on Windows because having pip import PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the pip process exits and no longer imports PyOpenSSL. This isn’t a problem on Linux or macOS or the other *nix clients though. We patch requests as it is today to prevent it from importing simplejson and cryptography for this reason. — Donald Stufft

On 3 June 2017 at 02:22, Donald Stufft <donald@stufft.io> wrote:
Would requests be loading PyOpenSSL on Windows, though? If the aim is to facilitate PEP 543, then I'd expect it to be using the SChannel backend in that case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I’m not sure! The exact specifics of how it’d get implemented and the transition from what we have now to that could be yes or no (particularly the transition period). I’m just making sure that the constraint we have in pip is clearly defined here to make sure we don’t accept something that ends up not actually being suitable. I don’t have an opinion on the private bootstrap module (well I do, I like it less than just back porting MemoryBio “for real", but not one on whether it’d work or not). — Donald Stufft

On Fri, 2 Jun 2017 12:22:06 -0400 Donald Stufft <donald@stufft.io> wrote:
It’s not just bootstrapping that pip has a problem with for C extensions, it also prevents upgrading PyOpenSSL on Windows because having pip import PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the pip process exits and no longer imports PyOpenSSL. This isn’t a problem on Linux or macOS or the other *nix clients though. We patch requests as it is today to prevent it from importing simplejson and cryptography for this reason.
Does pip use any advanced features in Requests, at least when it comes to downloading packages (which is where the bootstrapping issue lies AFAIU)? Because at this point it should like you may be better off with a simple pure Python HTTP downloader. Regards Antoine.

It’s hard to fully answer the question because it sort of depends? Could we switch to just like, urllib2 or something? Yea we could, in fact we used to use that and switched to using requests because we had to backport security work around / fixes ourselves (the big one at the time was host name matching/verification) and we were really bad at keeping up with tracking what patches needed applying and when. Switching to requests let us offload that work to the requests team who are doing a phenomenal job at it. Beyond that though, getting HTTP right is hard, and pip used to have to try and implement work arounds to either broken or less optimal urllib2 behavior whereas requests generally gets it right for us out of the box. Closer to your specific questions about features, we’re using the requests session support to handle connection pooling to speed up downloading (since we don’t need to open a new connection for every download then), the adapter API to handle transparently allowing file:// URLs, the auth framework to handle holding authentication for multiple domains at once, and the third party library cachecontrol to handle our HTTP caching using a browser style cache. I suspect (though I’d let him speak for himself) that Cory would rather continue to be sync only than require pip to go back to not using requests. — Donald Stufft

On 2 Jun 2017, at 17:59, Donald Stufft <donald@stufft.io> wrote:
I suspect (though I’d let him speak for himself) that Cory would rather continue to be sync only than require pip to go back to not using requests.
We are not wedded to supporting pip, but I think the interaction between the two tools is positive. I think pip gains a lot from depending on us, and we get a lot of value from having pip as a dependency. So the cost of supporting pip would have to be pretty darn high for us to want to stop doing that, and so on this issue I think Donald is right. Cory

On Sat, Jun 03, 2017 at 02:10:50AM +1000, Nick Coghlan wrote:
Ignoring Ben's assertion regarding the legitimacy of async wrap_socket() (which seems to render this entire conversation moot), if you still really want to go this route, could ctypes be abused to provide the missing implementation from the underlying libs? It'd be a hack, but it would only be necessary during bootstrapping. David

On 1 June 2017 at 17:14, Chris Angelico <rosuav@gmail.com> wrote:
However, it is trivial for Windows users to upgrade if asked to, as there's no issue around system packages depending on a particular version (or indeed, much of anything depending - 3rd party applications on Windows bundle their own Python, they don't use the globally installed one). So in principle, there should be no problem expecting Windows users to be on the latest version of 2.7.x. In fact, I suspect that the proportion of Windows users on Python 3 is noticeably higher than the proportion of Linux/Mac OS users on Python 3 (for the same reason). So this problem may overall be less pressing for Windows users. I have no evidence that isn't anecdotal to back this last assertion up, though. Linux users often use the OS-supplied Python, and so getting the distributions to upgrade, and to backport upgrades to old versions of their OS and (push those backports as required updates) is the route to get the bulk of the users there. Experience on pip seems to indicate this is unlikely to happen, in practice. Mac OS users who use the system Python are, as I understand it, stuck with a pretty broken version (I don't know if newer versions of the OS change that). But distributions like Macports are more common and more up to date. Apart from the Windows details, these are purely my impressions.
Do you have figures for how many people use pip on Windows vs Linux vs Mac OS?
No. But we do get plenty of bug reports from Windows users, so I don't think there's any reason to assume it's particularly low (given the relative numbers of *python* users - in fact, it may be proportionately higher as Windows users don't have alternative options like yum). Paul

Note that on macOS, within the next year macOS users using the system Python are going to be unable to talk to PyPI anyways (unless Apple does something here, which I think they will), but in either case, Apple was pretty good about upgrading to 2.7.9 (I think they had the first OS released that supported 2.7.9?). — Donald Stufft

On Jun 1, 2017, at 3:57 PM, Donald Stufft <donald@stufft.io> wrote:
Note that on macOS, within the next year macOS users using the system Python are going to be unable to talk to PyPI anyways (unless Apple does something here, which I think they will), but in either case, Apple was pretty good about upgrading to 2.7.9 (I think they had the first OS released that supported 2.7.9?).
Forgot to mention that pip 10.0 will work around this, thus forcing macOS users to upgrade or be cut off. — Donald Stufft

On Thu, Jun 01, 2017 at 04:01:54PM +0100, Cory Benfield wrote:
So for the record I’m assuming most of the previous email was a joke: certainly it’s not going to happen. ;)
But this is a real concern that does need to be addressed
Unfortunately it wasn't, but at least I'm glad to have accidentally made a valid point amidst the cloud of caffeine-fuelled irritability :/ Apologies for the previous post, it was hardly constructive. David

On Thu, 1 Jun 2017 11:22:21 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
Who is the “we” that should move on? Python core dev? Or the Python ecosystem?
Sorry. Python core dev certainly. As for the rest of the ecosystem, it is moving on as well.
Well, certain features could be 3.x-only, couldn't they?
I don't get what async/await keywords have to do with this. We're talking about backporting the ssl memory BIO object... (also, as much as I think asyncio is a good thing, I'm not sure it will do much for the problem of downloading packages from HTTP, even in parallel)
I want to move on, but I want to bring that 80% of our userbase with us when we do. My reading of your post is that you would rather Requests not adopt the async/await paradigm than backport MemoryBIO: is my understanding correct?
Well you cannot use async/await on 2.7 in any case, and you cannot use asyncio on 2.7 (Trollius, which was maintained by Victor, has been abandoned AFAIK). If you want to use coroutines in 2.7, you need to use Tornado or Twisted. Twisted may not, but Tornado works fine with the stdlib ssl module. Regards Antoine.

Moving, sure, but slowly. Again, I point to the 80% download number.
In principle, sure. In practice, that means most of our users don’t use those features and so we don’t get any feedback on whether they’re good solutions to the problem. This is not great. Ideally we want features to be available across as wide a deploy base as possible, otherwise we risk shipping features that don’t solve the actual problem very well. Good software comes, in part, from getting user feedback.
All of this is related. I wrote a very, very long email initially and deleted it all because it was just too long to expect any normal human being to read it, but the TL;DR here is that we also want to support async/await, and doing so requires a memory BIO object.
I can use Twisted on 2.7, and Twisted has great integration with async/await and asyncio when they are available. Great and getting greater, in fact, thanks to the work of the Twisted and asyncio teams. As to Tornado, the biggest concern there is that there is no support for composing the TLS over non-TCP sockets as far as I am aware. The wrapped socket approach is not suitable for some kinds of stream-based I/O that users really should be able to use with Requests (e.g. UNIX pipes). Not a complete non-starter, but also not something I’d like to forego. Cory

On Thu, 1 Jun 2017 12:01:41 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
In principle, sure. In practice, that means most of our users don’t use those features and so we don’t get any feedback on whether they’re good solutions to the problem.
On bugs.python.org we get plenty of feedback from people using Python 3's features, and we have been for years. Your concern would have been very valid in the Python 3.2 timeframe, but I don't think it is anymore.
All of this is related. I wrote a very, very long email initially and deleted it all because it was just too long to expect any normal human being to read it, but the TL;DR here is that we also want to support async/await, and doing so requires a memory BIO object.
async/await doesn't require a memory BIO object. For example, Tornado supports async/await (*) even though it doesn't use a memory BIO object for its SSL layer. And asyncio started with a non-memory BIO SSL implementation while still using "yield from". (*) Despite the fact that Tornado's own coroutines are yield-based generators.
As to Tornado, the biggest concern there is that there is no support for composing the TLS over non-TCP sockets as far as I am aware. The wrapped socket approach is not suitable for some kinds of stream-based I/O that users really should be able to use with Requests (e.g. UNIX pipes).
Hmm, why would you use TLS on UNIX pipes except as an academic experiment? Tornado is far from a full-fledged networking package like Twisted, but its HTTP(S) support should be very sufficient (understandably, since it is the core use case for it). Regards Antoine.

Ok? I guess? I don’t know what to do with that answer, really. I gave you some data (80%+ of requests downloads over the last month were Python 2), and you responded with “it doesn’t cause us problems”. That’s good for you, I suppose, and well done, but it doesn’t seem immediately applicable to the concern I have.
You are right, sorry. I should not have used the word “require”. Allow me to rephrase. MemoryBIO objects are vastly, vastly more predictable and tractable than wrapped sockets when combined with non-blocking I/O. Using wrapped sockets and select/poll/epoll/kqueue, while possible, requires extremely subtle code that is easy to get wrong, and can nonetheless still have awkward bugs in it. I would be extremely loathe to use such an implementation, but you are correct, such an implementation can exist.
Let me be clear that there is no intention to use either Tornado or Twisted’s HTTP/1.1 parsers or engines. With all due respect to both projects, I have concerns about both their client implementations. Tornado’s default is definitely not suitable for use in Requests, and the curl backend is but, surprise surprise, requires a C extension and oh god we’re back here again. I have similar concerns about Twisted’s default HTTP/1.1 client. Tornado’s HTTP/1.1 server is certainly sufficient, but also not of much use to Requests. Requests very much intends to use our own HTTP logic, not least because we’re sick of relying on someone else’s. Literally what we want is to have an event loop backing us that we can integrate with async/await and that requires us to reinvent as few wheels as possible while giving an overall better end-user experience. If I were to use Tornado, because I would want to integrate PEP 543 support into Tornado I’d ultimately have to rewrite Tornado’s TLS implementation *anyway* to replace it with a PEP 543 version. If I’m doing that, I’d much rather do it with MemoryBIO than wrapped sockets, for all of the reasons above. As a final note, because I think we’re getting into the weeds here: this is not *necessary*. None of this is *necessary*. Requests exists, and works today. We’ll get Windows TLS support regardless of anything that’s done here, because I’ll just shim it into urllib3 like we did for macOS. What I am pushing for with PEP 543 is an improvement that would benefit the whole ecosystem: all I want to do is to make it possible for me to actually use it and ship it to users in the tools I maintain. It is reasonable and coherent for python-dev to say “well, good luck, but no backports to help you out”. The result of that is that I put PEP 543 on the backburner (because it doesn’t solve Requests/urllib3’s problems, and ultimately my day job is about resolving those issues), and probably that we shutter the async discussion for Requests until we drop Python 2 support. That’s fine, Python is your project, not mine. But I don’t see that there’s any reason for us not to ask for this backport. After all, the worst you can do is say no, and my problems remain the same. Cory

Le 01/06/2017 à 15:12, Cory Benfield a écrit :
I don’t know what to do with that answer, really. I gave you some data (80%+ of requests downloads over the last month were Python 2), and you responded with “it doesn’t cause us problems”.
And indeed it doesn't. Unless the target user base for pip is widely different than Python's, it shouldn't cause you any problems either.
As a final note, because I think we’re getting into the weeds here: this is not *necessary*. None of this is *necessary*. Requests exists, and works today.
And pip could even bundle a frozen 2.7-compatible version of Requests if it wanted/needed to...
Then the PEP is really wrong or misleading in the way it states its own motivations. Regards Antoine.

Maybe not now, but I think it’s fair to say that it did, right? As I recall, Python spent a long time with two fully supported Python versions, and then an even longer time with a version that was getting bugfixes. Tell me, which did you get more feedback on during that time? Generally speaking it is fair to say that at this point *every line of code in Requests* is exercised or depended on by one of our users. If we write new code available to a small fraction of them, and it is in any way sizeable, then that stops being true. Again, we should look at the fact that most libraries that successfully support Python 2 and Python 3 do so through having codebases that share as much code as possible between the two implementations. Each line of code that is exercised in only one implementation becomes a vector for a long, lingering bug. Anyway, all I know is that the last big project to do this kind of hard cut was Python, and while many of us are glad that Python 3 is real and glad that we pushed through the pain, I don’t think anyone would argue that the move was painless. A lesson can be learned there, especially for Requests which is not currently nursing a problem as fundamental to it as Python was.
Sure, if pip wants to internalise supporting and maintaining that version. One of the advantages of the pip/Requests relationship is that pip gets to stop worrying about HTTP: if there’s a HTTP problem, that’s on someone else to fix. Bundling that would remove that advantage.
How so? TLS is not a part of the HTTP parser. It’s an intermediary layer between the transport (resolutely owned by the network layer in Twisted/Tornado) and the parsing layer (resolutely owned by Requests). Ideally we would not roll our own. Cory

On Thu, 1 Jun 2017 14:37:55 +0100 Cory Benfield <cory@lukasa.co.uk> wrote:
Until Python 3.2 and perhaps 3.3, yes. Since 3.4, definitely not. For example asyncio quickly grew a sizable community around it, even though it had established Python 2-compatible competitors.
In the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO", and also in the general feeling it gives that the backport is motivated by security reasons primarily. I understand that some users would like more features in Python 2.7. That has been the case since it was decided that feature development in the 2.x line would end in favour of Python 3 development. But our maintenance policy has been and is to develop new features on Python 3 (which some people have described as a "carrot" for migrating, which is certainly true). Regards Antoine.

Sure, but “until 3.2” covers a long enough time to take us from now to “deprecation of Python 2”. Given that the Requests team is 4 people, unlike python-dev’s much larger number, I suspect we’d have at least as much pain proportionally as Python did. I’m not wild about signing up for that.
Ok, let’s address those together. There are security reasons to do the backport, but they are “it helps us build a pathway to PEP 543”. Right now there are a lot of people interested in seeing PEP 543 happen, but vastly fewer in a position to do the work. I am, but only if I can actually use it for the things that are in my job. If I can’t, then PEP 543 becomes an “evenings and weekends” activity for me *at best*, and something I have to drop entirely at worst. Adopting PEP 543 *would* be a security benefit, so while this PEP itself is not directly in and of itself a security benefit, it builds a pathway to something that is. As to the plans to move Requests to a more event loop-y model, I think that it does stand in the way of this, but only insomuch as, again, we want our event loopy model to be as bug-free as possible. But I can concede that rewording on that point would be valuable. *However*, it’s my understanding that even if I did that rewording, you’d still be against it. Is that correct? Cory
participants (16)
-
Antoine Pitrou
-
Antoine Pitrou
-
Barry Warsaw
-
Ben Darnell
-
Chris Angelico
-
Cory Benfield
-
David Wilson
-
Donald Stufft
-
Jim Baker
-
Nathaniel Smith
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Steve Dower
-
Terry Reedy
-
Victor Stinner