Warts both in asyncio way and greenlet-based approach
Long story short: in async world every IO library should use asyncio. Unfortunately for the most popular libraries (requests, django, flask, sqlalchemy etc.) it's impossible to rewrite the code keeping backward compatibility. Much easier to create a new library than rewrite existing one. Regarding to gevent -- please read excellent Glyph's article 'Unyielding': https://glyph.twistedmatrix.com/2014/02/unyielding.html It covers pretty well why explicit yield points are better than implicit ones. Shortly explicit `await`'s give you 'atomic consistency' but using gevent you should expect context switch in every opcode instruction. It's much easier to test cover code with several (ok, many) context switch points. But test covering the code which may switch on every line and several times in the same line is practically impossible. Also my 2 cents. We used gevent. Even in very tiny application (command line client to upload huge files over HTTP, about 1k lines of code) under high load we got reports about gevent hub crashes. Even gevent core (hub is something like asyncio loop) is not 100% stable. Unfortunately we was not able to reproduce the problem by our test suite -- it requires really high load and occurs very rare. Well, the most gevent users can live with that -- gunicorn just restarts dead worker and that's it. But it is the sign for the problem: gevent crashes are extremely hard to debug. P.S. You might keep my writings about gevent as my private opinion, and it is. But from my perspective asyncio based solutions have much more predictable behavior and much more friendly to debug-and-fix problems. -- Thanks, Andrew Svetlov
On 2 Aug 2016, at 19:39, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Long story short: in async world every IO library should use asyncio. Unfortunately for the most popular libraries (requests, django, flask, sqlalchemy etc.) it's impossible to rewrite the code keeping backward compatibility. Much easier to create a new library than rewrite existing one.
Removing my hyper hat for a moment and replacing it with my Requests hat: we disagree. Some of the discussion has happened on GitHub at [1], but I can summarise it again here. One of the nice things with an event loop is that you can retrofit synchronous code on top of it. That is, if you have an event loop, you can always turn your evented code (code that returned a Deferred (or Deferred-alike) or a coroutine) into synchronous code by simply running the event loop in the synchronous call. That is, given a method signature like this: async def get(url): return await do_the_work(url) It can be turned into this: def sync_get(url): return some_event_loop.run(get(url)) The reason Requests has not previously done this is that it required us to do one of two things. In the first instance, we could bundle or depend on an event loop (e.g. Twisted’s reactors). That’s annoying: it’s a lot of code that we don’t really care about in order to achieve a task that is fundamentally an implementation detail. The other option is to write backends for all possible event loops *and* a kind of event-loop-alike synchronous approach that can be run as a series of coroutines, but that is actually entirely synchronous under the covers (essentially, ship a coroutine runner that never waits for anything). Both of these were a lot of work for fairly minimal gain. However, with asyncio’s event loop becoming shipped with the language, and with the other event loop implementations shimming in compatibility layers, libraries like Requests have a perfect escape hatch. Now we only need *two* backends: one for the asyncio event loop, and one synchronous one. And if we’re prepared to depend on Python 3.4+, we only need *one* backend: asyncio, because we can rely on it being present. This drastic reduction in the space of event loops we need to support suddenly makes it much more viable to consider adjusting the way we do I/O. There’s still a *lot* of work there, and no-one has actually started sitting down to do the hard stuff, but it’s certainly not impossible any longer. Cory [1]: https://github.com/kennethreitz/requests/issues/1390#issuecomment-224772923 <https://github.com/kennethreitz/requests/issues/1390#issuecomment-224772923>
@cory I didn't want to say "re-implementing requests in asyncio way is impossible". I just want to say: it's hard. Requires massive code rewriting. Moreover, it's pain to support both Python 2 and 3 with asyncio inthe same code base. Really only after dropping python 2 support you are able to adopt requests for asyncio. Hopefully after 2020. Requests library (as well as sqlalchemy) are very crucial scaffolds for python community, I suspect you will be forced to support Python 2 up to the official death of branch at least. On Wed, Aug 3, 2016 at 5:27 PM Cory Benfield <cory@lukasa.co.uk> wrote:
On 2 Aug 2016, at 19:39, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Long story short: in async world every IO library should use asyncio. Unfortunately for the most popular libraries (requests, django, flask, sqlalchemy etc.) it's impossible to rewrite the code keeping backward compatibility. Much easier to create a new library than rewrite existing one.
Removing my hyper hat for a moment and replacing it with my Requests hat: we disagree. Some of the discussion has happened on GitHub at [1], but I can summarise it again here.
One of the nice things with an event loop is that you can retrofit synchronous code on top of it. That is, if you have an event loop, you can always turn your evented code (code that returned a Deferred (or Deferred-alike) or a coroutine) into synchronous code by simply running the event loop in the synchronous call. That is, given a method signature like this:
async def get(url): return await do_the_work(url)
It can be turned into this:
def sync_get(url): return some_event_loop.run(get(url))
The reason Requests has not previously done this is that it required us to do one of two things. In the first instance, we could bundle or depend on an event loop (e.g. Twisted’s reactors). That’s annoying: it’s a lot of code that we don’t really care about in order to achieve a task that is fundamentally an implementation detail. The other option is to write backends for all possible event loops *and* a kind of event-loop-alike synchronous approach that can be run as a series of coroutines, but that is actually entirely synchronous under the covers (essentially, ship a coroutine runner that never waits for anything). Both of these were a lot of work for fairly minimal gain.
However, with asyncio’s event loop becoming shipped with the language, and with the other event loop implementations shimming in compatibility layers, libraries like Requests have a perfect escape hatch. Now we only need *two* backends: one for the asyncio event loop, and one synchronous one. And if we’re prepared to depend on Python 3.4+, we only need *one* backend: asyncio, because we can rely on it being present.
This drastic reduction in the space of event loops we need to support suddenly makes it much more viable to consider adjusting the way we do I/O. There’s still a *lot* of work there, and no-one has actually started sitting down to do the hard stuff, but it’s certainly not impossible any longer.
Cory
[1]: https://github.com/kennethreitz/requests/issues/1390#issuecomment-224772923
--
Thanks, Andrew Svetlov
On Aug 4, 2016, at 4:47 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
@cory I didn't want to say "re-implementing requests in asyncio way is impossible".
I just want to say: it's hard.
I can't comment on the total difficulty as I'm not a requests maintainer, but I think you're overestimating it.
Requires massive code rewriting.
In fact, requests had an async mode in the past, and if you look at the internal factoring, there are a totally manageable number of places where it actually depends on blocking. It would not be a "massive" rewrite, just a refactoring around those integration points between layers.
Moreover, it's pain to support both Python 2 and 3 with asyncio inthe same code base.
Not really. You can't use the asyncio task scheduler, but we've been upgrading to be able to support asyncio from Twisted and it's a lot easier than many other Python 3 porting tasks :).
Really only after dropping python 2 support you are able to adopt requests for asyncio.
Also not correct; we (Twisted) have supported many python3 features (for example; 'return' out of @inlineCallbacks-decorated generators) for several years despite being on python 2; nothing about the _interfaces_ that you have to support in order for asyncio users to use it are in any way version-specific.
Hopefully after 2020.
We should be so lucky :).
Requests library (as well as sqlalchemy) are very crucial scaffolds for python community, I suspect you will be forced to support Python 2 up to the official death of branch at least.
I don't think that's in question. -g
On Wed, Aug 3, 2016 at 5:27 PM Cory Benfield <cory@lukasa.co.uk <mailto:cory@lukasa.co.uk>> wrote:
On 2 Aug 2016, at 19:39, Andrew Svetlov <andrew.svetlov@gmail.com <mailto:andrew.svetlov@gmail.com>> wrote:
Long story short: in async world every IO library should use asyncio. Unfortunately for the most popular libraries (requests, django, flask, sqlalchemy etc.) it's impossible to rewrite the code keeping backward compatibility. Much easier to create a new library than rewrite existing one.
Removing my hyper hat for a moment and replacing it with my Requests hat: we disagree. Some of the discussion has happened on GitHub at [1], but I can summarise it again here.
One of the nice things with an event loop is that you can retrofit synchronous code on top of it. That is, if you have an event loop, you can always turn your evented code (code that returned a Deferred (or Deferred-alike) or a coroutine) into synchronous code by simply running the event loop in the synchronous call. That is, given a method signature like this:
async def get(url): return await do_the_work(url)
It can be turned into this:
def sync_get(url): return some_event_loop.run(get(url))
The reason Requests has not previously done this is that it required us to do one of two things. In the first instance, we could bundle or depend on an event loop (e.g. Twisted’s reactors). That’s annoying: it’s a lot of code that we don’t really care about in order to achieve a task that is fundamentally an implementation detail. The other option is to write backends for all possible event loops *and* a kind of event-loop-alike synchronous approach that can be run as a series of coroutines, but that is actually entirely synchronous under the covers (essentially, ship a coroutine runner that never waits for anything). Both of these were a lot of work for fairly minimal gain.
However, with asyncio’s event loop becoming shipped with the language, and with the other event loop implementations shimming in compatibility layers, libraries like Requests have a perfect escape hatch. Now we only need *two* backends: one for the asyncio event loop, and one synchronous one. And if we’re prepared to depend on Python 3.4+, we only need *one* backend: asyncio, because we can rely on it being present.
This drastic reduction in the space of event loops we need to support suddenly makes it much more viable to consider adjusting the way we do I/O. There’s still a *lot* of work there, and no-one has actually started sitting down to do the hard stuff, but it’s certainly not impossible any longer.
Cory
[1]: https://github.com/kennethreitz/requests/issues/1390#issuecomment-224772923 <https://github.com/kennethreitz/requests/issues/1390#issuecomment-224772923> -- Thanks, Andrew Svetlov _______________________________________________ Async-sig mailing list Async-sig@python.org <mailto:Async-sig@python.org> https://mail.python.org/mailman/listinfo/async-sig <https://mail.python.org/mailman/listinfo/async-sig> Code of Conduct: https://www.python.org/psf/codeofconduct/ <https://www.python.org/psf/codeofconduct/>
On 5 Aug 2016, at 02:21, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Not really. You can't use the asyncio task scheduler, but we've been upgrading to be able to support asyncio from Twisted and it's a lot easier than many other Python 3 porting tasks :).
To follow up on Glyph’s point, what’s key is that the asyncio support doesn’t have to be present. It only has to be present if you assume that, in my previous example, I was always doing “import asyncio; asyncio.run_until_complete” (not real, I need an event loop, but I’m not trying to dirty up the example here). However, run() can be *anything*. In particular, if we had “separate but equal” backends, one for sync code and one for asyncio’s event loop, it doesn’t matter if the asyncio event loop isn’t present in the Python runtime: we just see the import error and say “ok, only synchronous mode is available”. The goal there is to provide a transition: synchronous mode becomes the “legacy” mode in Requests, with the expectation that we’ll transition over to a purely evented backend in some future release.
Requests library (as well as sqlalchemy) are very crucial scaffolds for python community, I suspect you will be forced to support Python 2 up to the official death of branch at least.
I don't think that's in question.
I think we should support Python 2 up to end of support. However, I don’t think we should support Python 2 for *one day longer* than that unless a third-party vendor wants to cover the engineering cost of doing so. Additionally, I have exactly no problem with calling the synchronous legacy mode in Requests exactly that: it will remain unchanged and so will probably continue to work, but we are no longer developing it. Requests is an extremely important part of the community, which is why I have no problem using it to exert subtle but real pressure to encourage people to move to Python 3. Not right now, of course: we’ve still got a few years before that’s going to be necessary. =) Cory
participants (3)
-
Andrew Svetlov
-
Cory Benfield
-
Glyph Lefkowitz