[Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

Donald Stufft donald at stufft.io
Thu Jun 8 15:37:53 EDT 2017


> On Jun 8, 2017, at 12:40 PM, Steve Dower <steve.dower at python.org> wrote:
> 
> I'm just going to straight up admit that I've lost track of the point of this thread.
> 
> It sounds like we don't *need* to backport any of ssl into the Python 2.7 standard library, as long as we can bundle a 3rd-party backport for pip?
> 
> I assume that, at a high level, the operation needed is to download content over https using the system trust stores. Is that what we're trying to achieve here?


The basic yak stak here is:

* PEP 543 should be the future, it is a much much better way of handling TLS than our current ssl module is.
* Cory can’t spend his work time on PEP 543 unless he can say it is useful for requests.
* In order for PEP 543 to be useful for requests, he needs a way to provide a backport for it for Python 2.7.
   * This backport *CAN* be OpenSSL only, but needs to be able to provide the same API.
* PEP 543 wants to work with MemoryBIOs instead of sockets, because a MemoryBio is a much much better way of implementing this problem for a variety of reasons, and it would be a mistake to use a socket primitive again.
* Indepently, requests also wants to be able to provide the ability for people to use it with asyncio, however it can’t drop support for Python 2.7 in the quest for doing that. Twisted provides a way forward that lets requests work on both 2.x and 3.x and integrate with asyncio, but Twisted requires MemoryBio to do so.
* pyOpenSSL *could* be used to provide the MemoryBio needed on 2.7 for both cases from up above, however, pip cannot depend on a C library that isn’t part of the standard library - in addition this would break alternative runtimes like Jython where pyOpenSSL doesn’t work.

Thus, adding MemoryBio to 2.7’s ssl means that requests can use it instead of depending on a C package (which it can’t because of pip), which means that Cory can then justify working on PEP 543 as part of his requests work, because he can say on Python 3.7, We use PEP 543 natively, and on Python < 3.7 we either no longer support that Python or we can wrap ssl.MemoryBio using a pure Python backport shim that provides the same API as PEP 543.

Indendently of that, adding MemoryBio to 2.7’s ssl means that Twisted can use it instead of depending on a C package, which means requests can depend on Twisted (which it otherwise can’t, again because of pip). This then means that requests can refactor itself internally to use Twisted to write asyncio compatible code that provides a synchronous API and a asynchronous API that works on 2.7 and 3.x with asyncio. All of the other options require effectively forking the code or the ecosystem by either having “this library you use for sync” and “the library you use for async” largely duplicating code OR requires all of the network libraries to drop support for 2.7 (can’t, most of their users are on 2.7 still) or requires forking the library to have a 2.x and a 3.x version (we tried this before early on in the 3.x split and we settled on the fact that a single code base is a much better way to handle straddling the 2.x/3.x line).

So basically back porting MemoryBio unlocks two important things for the health of the Python ecosystem:

* Allows forward progress on PEP 543, which provides a wealth of great benefits like using the platform trust model and removing the need for pip, requests, etc to bundle a CA bundle internally and removing the need (long term anyways) for Python to ship a copy of OpenSSL on platforms that don’t provide it.
* Allows requests and other libraries to continue to straddle the 2.x/3.x line where they need to, while still providing people who are using Python 3.x a way to use asyncio without having to fork the entire ecosystem into having an aio* copy of every single network library that exists.


> 
> Can someone explain to me why pip can't use a simple "system store urlretrieve()" function without configuration and "OpenSSL urlretrieve()" function with fully custom configuration? Even if only to bootstrap something that *can* merge two entirely different configuration systems?

It would require rewriting significant parts of our code that interfaces with HTTP, including the fact that we rely on some additional requests libraries (like cache control) to implement HTTP caching in pip, so unless urlretireve supported that as well we’d have to completely rewrite that from scratch.


—
Donald Stufft



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170608/38301503/attachment.html>


More information about the Python-Dev mailing list