Some thoughts on asynchronous API design in a post-async/await world

I just posted a long blog/essay that's probably of interest to folks here: https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-a... The short version: I think curio something important to teach us; I tried to figure out what that is and how we can learn from it. -n -- Nathaniel J. Smith -- https://vorpus.org

On 6 Nov 2016, at 00:09, Nathaniel Smith <njs@pobox.com> wrote:
I just posted a long blog/essay that's probably of interest to folks here:
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-a...
The short version: I think curio something important to teach us; I tried to figure out what that is and how we can learn from it.
This is a great post Nathaniel, and I think there’s a lot of value to extract from this for everyone. In the short term, I’m going to address Twisted’s issue because it seems like the most glaring “this is a stupid bug” problem, and a quick glance at the relevant interfaces suggests it’s easily resolved. It also undermines my own personal mission to make Twisted a great HTTP/2 server. ;) Cory

I'll also reiterate Cory's compliments that the post was great! For me there are two questions the post raises. One is how do we keep people from tying themselves to any one event loop? I view sans-io as the start of this, but it does require getting people to know about it, implement protocols that way, and then continuing to abstract themselves in such a way so as to not tie themselves down. But then we start to go up a level to things that work at the HTTP level and then it starts to get complicated and I don't know if we have a solution yet for e.g. an async GitHub SDK library that is event loop-agnostic when it comes to HTTP requests/responses. Two, how long do we put off "the future" for some async/await-native event loop to emerge and hit production quality? I totally understand David wanting to keep curio experimental, but at some point something will either need to reach stable for people to seriously use or "the future" will simply stay in the future and people will just tie themselves to the asyncio abstractions, making it that much harder for people to try something else. As I think all of us realize, async/await is giving us a rare opportunity to set the future tone for networking code in Python, but at some point a tipping point will be reached and whatever common practice is in place at that point might calcify, so figuring out how we want things to be in the general landscape would be good. I would also like to thank everyone involved with defining async/await. I think the fact that async/await is an API and not something tightly bound to any specific event loop like in pretty much every other async-supporting language has been very beneficial to us and something to celebrate. The sheer fact that people who don't like asyncio, Twisted, Tornado, or curio have other options is fantastic. On Sun, 6 Nov 2016 at 02:55 Cory Benfield <cory@lukasa.co.uk> wrote:
On 6 Nov 2016, at 00:09, Nathaniel Smith <njs@pobox.com> wrote:
I just posted a long blog/essay that's probably of interest to folks here:
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-a...
The short version: I think curio something important to teach us; I tried to figure out what that is and how we can learn from it.
This is a great post Nathaniel, and I think there’s a lot of value to extract from this for everyone.
In the short term, I’m going to address Twisted’s issue because it seems like the most glaring “this is a stupid bug” problem, and a quick glance at the relevant interfaces suggests it’s easily resolved. It also undermines my own personal mission to make Twisted a great HTTP/2 server. ;)
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/

On Sun, 6 Nov 2016 at 22:41 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 6, 2016, at 8:20 PM, Brett Cannon <brett@python.org> wrote:
For me there are two questions the post raises. One is how do we keep people from tying themselves to any one event loop?
Deprecate, then remove, get_event_loop() :-).
Is there a bug filed for that at https://github.com/python/asyncio? -Brett
-glyph

On Nov 7, 2016, at 12:50 PM, Brett Cannon <brett@python.org> wrote:
On Sun, 6 Nov 2016 at 22:41 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 6, 2016, at 8:20 PM, Brett Cannon <brett@python.org> wrote:
For me there are two questions the post raises. One is how do we keep people from tying themselves to any one event loop?
Deprecate, then remove, get_event_loop() :-).
Is there a bug filed for that at https://github.com/python/asyncio?
I don’t think we need to deprecate get_event_loop(). With https://github.com/python/asyncio/pull/452 merged in 3.6, get_event_loop becomes more predictable. Now it’s a documentation issue (I’m trying to work on that) to explain asyncio users not to use it (and where they *do* need to use it). I will also open a PR soon to add asyncio.main() function (or asyncio.run()) to further simplify working with asyncio & its documentation. That should make get_event_loop to disappear for end users. Yury

On Nov 7, 2016, at 9:56 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
On Nov 7, 2016, at 12:50 PM, Brett Cannon <brett@python.org> wrote:
On Sun, 6 Nov 2016 at 22:41 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 6, 2016, at 8:20 PM, Brett Cannon <brett@python.org> wrote:
For me there are two questions the post raises. One is how do we keep people from tying themselves to any one event loop?
Deprecate, then remove, get_event_loop() :-).
Is there a bug filed for that at https://github.com/python/asyncio?
I don’t think we need to deprecate get_event_loop(). With https://github.com/python/asyncio/pull/452 merged in 3.6, get_event_loop becomes more predictable.
Now it’s a documentation issue (I’m trying to work on that) to explain asyncio users not to use it (and where they *do* need to use it).
I will also open a PR soon to add asyncio.main() function (or asyncio.run()) to further simplify working with asyncio & its documentation. That should make get_event_loop to disappear for end users.
Sorry, this was a bit tongue in cheek. This was something I said to Guido at the *very* beginning of Tulip development, when asked about mistakes Twisted has made: "don't have a global event loop, you'll never get away from it". I still think getting rid of a global loop would always be an improvement, although I suspect it's too late at this point. `await current_event_loop()` might make more sense in Asyncio as that's not really "global", similar to Curio's trap of the same design; however, I assume that this was an intentional design disagreement for a reason and I don't see that reason as having changed (as Yury indicates). -glyph

[..]
Sorry, this was a bit tongue in cheek. This was something I said to Guido at the *very* beginning of Tulip development, when asked about mistakes Twisted has made: "don't have a global event loop, you'll never get away from it".
I still think getting rid of a global loop would always be an improvement, although I suspect it's too late at this point. `await current_event_loop()` might make more sense in Asyncio as that's not really "global", similar to Curio's trap of the same design; however, I assume that this was an intentional design disagreement for a reason and I don't see that reason as having changed (as Yury indicates).
The latest update of get_event_loop is a step in the right direction. At least now we can document the best practices: 1. Have one “main” coroutine to bootstrap/run your program; 2. Don’t design APIs that accept the loop parameter; instead design coroutine-first APIs and use get_event_loop in your library if you absolutely need the loop. 3. I want to add “asyncio.main(coro)” function, which would create the loop, run the “coro” coroutine, and correctly clean everything up. What you propose, IIUC is a step further: * Deprecate get_event_loop(); * Add “current_event_loop()” coroutine. This will enforce (1) and (2), making asyncio library devs/users to focus more on coroutines and async/await. Am I understanding this all correctly? Yury

On Nov 7, 2016, at 11:08 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
[..]
Sorry, this was a bit tongue in cheek. This was something I said to Guido at the *very* beginning of Tulip development, when asked about mistakes Twisted has made: "don't have a global event loop, you'll never get away from it".
I still think getting rid of a global loop would always be an improvement, although I suspect it's too late at this point. `await current_event_loop()` might make more sense in Asyncio as that's not really "global", similar to Curio's trap of the same design; however, I assume that this was an intentional design disagreement for a reason and I don't see that reason as having changed (as Yury indicates).
The latest update of get_event_loop is a step in the right direction. At least now we can document the best practices:
1. Have one “main” coroutine to bootstrap/run your program;
2. Don’t design APIs that accept the loop parameter; instead design coroutine-first APIs and use get_event_loop in your library if you absolutely need the loop.
3. I want to add “asyncio.main(coro)” function, which would create the loop, run the “coro” coroutine, and correctly clean everything up.
What you propose, IIUC is a step further:
* Deprecate get_event_loop();
* Add “current_event_loop()” coroutine.
This will enforce (1) and (2), making asyncio library devs/users to focus more on coroutines and async/await.
Am I understanding this all correctly?
Yep. It's not so much making users focus more on coroutines, as having a way to pass a loop to a coroutine that is explicit (the coro needs to be scheduled on a loop already, so the binding has been explicitly specified) but unobtrusive. -glyph

I would caution against rushing into anything rash here. Nathaniel's post will stand as one of the most influential posts (about async I/O in Python) of this generation, and curio is a beacon of clarity compared to asyncio. However, asyncio has a much bigger responsibility at this point, as it's in the stdlib, and it must continue to support its existing APIs, on all supported platforms, whether we like them or not. I would love to see a design for a new API that focuses more on coroutines. But it should be a new PEP aimed at Python 3.7 or 3.8. I am tempted to start defending asyncio, but I'll resist, because nothing good can come from that. On Mon, Nov 7, 2016 at 11:41 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 7, 2016, at 11:08 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
[..]
Sorry, this was a bit tongue in cheek. This was something I said to Guido at the *very* beginning of Tulip development, when asked about mistakes Twisted has made: "don't have a global event loop, you'll never get away from it".
I still think getting rid of a global loop would always be an improvement, although I suspect it's too late at this point. `await current_event_loop()` might make more sense in Asyncio as that's not really "global", similar to Curio's trap of the same design; however, I assume that this was an intentional design disagreement for a reason and I don't see that reason as having changed (as Yury indicates).
The latest update of get_event_loop is a step in the right direction. At least now we can document the best practices:
1. Have one “main” coroutine to bootstrap/run your program;
2. Don’t design APIs that accept the loop parameter; instead design coroutine-first APIs and use get_event_loop in your library if you absolutely need the loop.
3. I want to add “asyncio.main(coro)” function, which would create the loop, run the “coro” coroutine, and correctly clean everything up.
What you propose, IIUC is a step further:
* Deprecate get_event_loop();
* Add “current_event_loop()” coroutine.
This will enforce (1) and (2), making asyncio library devs/users to focus more on coroutines and async/await.
Am I understanding this all correctly?
Yep. It's not so much making users focus *more* on coroutines, as having a way to pass a loop to a coroutine that is explicit (the coro needs to be scheduled on a loop already, so the binding has been explicitly specified) but unobtrusive.
-glyph
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)

Guido, from my perspective asyncio is just great. Because: 1. It raised striving for better coroutine based solutions like curio. We could borrow best ideas and incorporate them into asyncio. I don't want to rush, don't get me wrong please. Maybe some third-party library built on top of asyncio but designed with curio spirit for proving the concept. BTW Curio was born after async/await syntax which was in turn born by real asyncio use cases. 2. asyncio was designed as the grand basement standard for asynchronous development. It *right now* is supported by Tornado and Twisted. The best practice for tornado users (if they has been ported their codebase to Python 3) is using asyncio compatible libraries for communicating with databases etc AFAIC. I'm considering this as very big win for both asyncio and tornado/twisted. But this way is possible only because asyncio supports futures, callbacks, protocols as low-level API. On Mon, Nov 7, 2016 at 9:58 PM Guido van Rossum <guido@python.org> wrote:
I would caution against rushing into anything rash here. Nathaniel's post will stand as one of the most influential posts (about async I/O in Python) of this generation, and curio is a beacon of clarity compared to asyncio. However, asyncio has a much bigger responsibility at this point, as it's in the stdlib, and it must continue to support its existing APIs, on all supported platforms, whether we like them or not.
I would love to see a design for a new API that focuses more on coroutines. But it should be a new PEP aimed at Python 3.7 or 3.8.
I am tempted to start defending asyncio, but I'll resist, because nothing good can come from that.
On Mon, Nov 7, 2016 at 11:41 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 7, 2016, at 11:08 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
[..]
Sorry, this was a bit tongue in cheek. This was something I said to Guido at the *very* beginning of Tulip development, when asked about mistakes Twisted has made: "don't have a global event loop, you'll never get away from it".
I still think getting rid of a global loop would always be an improvement, although I suspect it's too late at this point. `await current_event_loop()` might make more sense in Asyncio as that's not really "global", similar to Curio's trap of the same design; however, I assume that this was an intentional design disagreement for a reason and I don't see that reason as having changed (as Yury indicates).
The latest update of get_event_loop is a step in the right direction. At least now we can document the best practices:
1. Have one “main” coroutine to bootstrap/run your program;
2. Don’t design APIs that accept the loop parameter; instead design coroutine-first APIs and use get_event_loop in your library if you absolutely need the loop.
3. I want to add “asyncio.main(coro)” function, which would create the loop, run the “coro” coroutine, and correctly clean everything up.
What you propose, IIUC is a step further:
* Deprecate get_event_loop();
* Add “current_event_loop()” coroutine.
This will enforce (1) and (2), making asyncio library devs/users to focus more on coroutines and async/await.
Am I understanding this all correctly?
Yep. It's not so much making users focus *more* on coroutines, as having a way to pass a loop to a coroutine that is explicit (the coro needs to be scheduled on a loop already, so the binding has been explicitly specified) but unobtrusive.
-glyph
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov

On Nov 7, 2016, at 11:58 AM, Guido van Rossum <guido@python.org> wrote:
I would caution against rushing into anything rash here. Nathaniel's post will stand as one of the most influential posts (about async I/O in Python) of this generation, and curio is a beacon of clarity compared to asyncio. However, asyncio has a much bigger responsibility at this point, as it's in the stdlib, and it must continue to support its existing APIs, on all supported platforms, whether we like them or not.
My smiley may have been insufficiently forceful. I was not intending to seriously suggest a departure from the current API. A 3.7/3.8 refinement into preferring a 'current_event_loop' coroutine might be a nice future direction but it is not something that should happen lightly.
I would love to see a design for a new API that focuses more on coroutines. But it should be a new PEP aimed at Python 3.7 or 3.8.
I am tempted to start defending asyncio, but I'll resist, because nothing good can come from that.
TBH I think that this discussion stems from a strength of asyncio's design, not a weakness. As David did, let me underscore Brett's comment: the fact that asyncio has multiple, separable layers which each interact via well-defined interfaces has allowed for a tremendous amount of experimentation and refinement. Most languages with async features are locked into a particular substrate, and languages without async features end up being an uncoordinated mess of incompatible APIs. I feel like we're really getting the best of both worlds: language-level support with interoperability and ecosystem considerations baked in right from the start. The potential for growth and improvement necessarily comes along with disagreement and criticism but it seems like overall this is a very healthy development. Right now we're talking about this at the async event layer, but previous work at the loop layer (uvloop) also points in exciting future directions for community improvements that maintain interoperability across the whole ecosystem. All the integration points exposed in asyncio's design already seem to be benefiting from community-wide scrutiny and tinkering. -glyph

On Mon, Nov 7, 2016 at 4:18 PM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 7, 2016, at 11:58 AM, Guido van Rossum <guido@python.org> wrote:
I would caution against rushing into anything rash here. Nathaniel's post will stand as one of the most influential posts (about async I/O in Python) of this generation, and curio is a beacon of clarity compared to asyncio. However, asyncio has a much bigger responsibility at this point, as it's in the stdlib, and it must continue to support its existing APIs, on all supported platforms, whether we like them or not.
My smiley may have been insufficiently forceful. I was not intending to seriously suggest a departure from the current API. A 3.7/3.8 refinement into preferring a 'current_event_loop' coroutine might be a nice future direction but it is not something that should happen lightly.
It wasn't aimed at you, it was aimed at asyncio's staunchest supporter/maintainers. (Including myself. :-)
I would love to see a design for a new API that focuses more on coroutines. But it should be a new PEP aimed at Python 3.7 or 3.8.
I am tempted to start defending asyncio, but I'll resist, because nothing good can come from that.
TBH I think that this discussion stems from a *strength* of asyncio's design, not a weakness. As David did, let me underscore Brett's comment: the fact that asyncio has multiple, separable layers which each interact via well-defined interfaces has allowed for a tremendous amount of experimentation and refinement. Most languages with async features are locked into a particular substrate, and languages without async features end up being an uncoordinated mess of incompatible APIs. I feel like we're really getting the best of both worlds: language-level support with interoperability and ecosystem considerations baked in right from the start.
The potential for growth and improvement necessarily comes along with disagreement and criticism but it seems like overall this is a very healthy development.
Right now we're talking about this at the async event layer, but previous work at the loop layer (uvloop) also points in exciting future directions for community improvements that maintain interoperability across the whole ecosystem. All the integration points exposed in asyncio's design already seem to be benefiting from community-wide scrutiny and tinkering.
I think Nathaniel expressed his appreciation for this as well (especially in his acknowledgments section). In terns of Python's historic development, I think it's fascinating that this can still be seen as a development building on Python's approach to iteration: In Python 0.0 all we had was `for x in <list or tuple>` (I believe even then, it was possible to write extension types in C that also supported iteration, but I'm not sure). We then expanded upon this to add `__iter__` and `__next__` (néé `next`), then yield, then send() and throw(), then `yield from` (I still regret we didn't manage to get this into 2.7 due to some stupid process issue), then the asyncio library, then async/await, and it seems this (degenerate :-) tree will still bear more fruit in the future. (And I may even have missed a PEP or two.) Just imagine the alternate universe where Python 0.0 had borrowed its for-loop from C instead of from ABC! Which means the seed of all this goes back over 35 years. I hope Lambert Meertens and Leo Geurts are proud. -- --Guido van Rossum (python.org/~guido)

Maybe we need couple other coroutines: 1. asyncio.run_in_executor() 2. asyncio.create_task(). Yes, ensure_future() does the same job but the name is confusing. At least it is confusing for my training course attendees. Also the are unfortunately tries to use `asyncio.wait()` because the name is very attractive. But `wait()` doesn't signal about raised exceptions, you know. That's why proper `wait()` usage is: while tasks: done, tasks = await asyncio.wait(tasks, timeout=12) for t in done: await t but nobody call it properly. `asyncio.gather()` is much better alternative, we should promote it. On Mon, Nov 7, 2016 at 9:08 PM Yury Selivanov <yselivanov@gmail.com> wrote:
Sorry, this was a bit tongue in cheek. This was something I said to Guido at the *very* beginning of Tulip development, when asked about mistakes Twisted has made: "don't have a global event loop, you'll never get away from it".
I still think getting rid of a global loop would always be an improvement, although I suspect it's too late at this point. `await current_event_loop()` might make more sense in Asyncio as that's not really "global", similar to Curio's trap of the same design; however, I assume
[..] that this was an intentional design disagreement for a reason and I don't see that reason as having changed (as Yury indicates).
The latest update of get_event_loop is a step in the right direction. At least now we can document the best practices:
1. Have one “main” coroutine to bootstrap/run your program;
2. Don’t design APIs that accept the loop parameter; instead design coroutine-first APIs and use get_event_loop in your library if you absolutely need the loop.
3. I want to add “asyncio.main(coro)” function, which would create the loop, run the “coro” coroutine, and correctly clean everything up.
What you propose, IIUC is a step further:
* Deprecate get_event_loop();
* Add “current_event_loop()” coroutine.
This will enforce (1) and (2), making asyncio library devs/users to focus more on coroutines and async/await.
Am I understanding this all correctly?
Yury _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov

I don't know if curio (or something like it) is the future or not, but it is something that I'm building for myself so that I can use it. I like it. It fits my brain. I'd just like to reiterate Brett's comment about async/await being a protocol in Python and something that can be customized. I don't know if that was by design or a happy accident, but so far as I can tell, it might be unique to Python. Every other language that has this seems to have it pretty well locked down to a very specific runtime implementation involving callbacks and futures. Python's approach allowed me to run with it in a completely different direction. I think that's pretty neat. Cheers, Dave
I would also like to thank everyone involved with defining async/await. I think the fact that async/await is an API and not something tightly bound to any specific event loop like in pretty much every other async-supporting language has been very beneficial to us and something to celebrate. The sheer fact that people who don't like asyncio, Twisted, Tornado, or curio have other options is fantastic.
On Sun, 6 Nov 2016 at 02:55 Cory Benfield <cory@lukasa.co.uk <mailto:cory@lukasa.co.uk>> wrote:
On 6 Nov 2016, at 00:09, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
I just posted a long blog/essay that's probably of interest to folks here:
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-a... <https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-a...>
The short version: I think curio something important to teach us; I tried to figure out what that is and how we can learn from it.
This is a great post Nathaniel, and I think there’s a lot of value to extract from this for everyone.
In the short term, I’m going to address Twisted’s issue because it seems like the most glaring “this is a stupid bug” problem, and a quick glance at the relevant interfaces suggests it’s easily resolved. It also undermines my own personal mission to make Twisted a great HTTP/2 server. ;)
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org <mailto:Async-sig@python.org> https://mail.python.org/mailman/listinfo/async-sig <https://mail.python.org/mailman/listinfo/async-sig> Code of Conduct: https://www.python.org/psf/codeofconduct/ <https://www.python.org/psf/codeofconduct/>_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/

Thanks Nathaniel for the great post! It's indeed very impressive to see how curio worked his way around problems that I thought were part of any async library, by simply giving up on callbacks. One aspect of curio that I find particularly interesting is how it hides the event loop (or kernel) by: 1 - Using "yield (TRAP, *args)" as the main way to communicate with the event loop (no need to have it as a reference). 2 - Exposing a run function that starts the loop and make sure it safely terminates. Even though there's a long way to go before we can have point 1 in asyncio (or at least a compatibility layer), I think point 2 is easy to implement and could bring something valuable. So here's a proposal: Add an asyncio.run function =========================== ... and promote it as the standard to run asynchronous applications. Implementation -------------- It could roughly be implemented as: ``` def run(main_coro, *, loop=None): if loop is not None: loop = asyncio.get_event_loop() try: return loop.run_until_complete(main_coro) finally: # More clean-up here? loop.close() ``` Example ------- Instead of writing: ``` loop = asyncio.get_event_loop() queue = asyncio.Queue(loop=loop) producer_coro = produce(queue, 10) consumer_coro = consume(queue) gather = asyncio.gather(producer_coro, consumer_coro, loop=loop) loop.run_until_complete(gather) loop.close() ``` We could promote the following structure: ``` async def main(): queue = asyncio.Queue() producer_coro = produce(queue, 10) consumer_coro = consume(queue) await asyncio.gather(producer_coro, consumer_coro) if __name__ == '__main__': asyncio.run(main()) ``` What do we get from that? ------------------------- - A clear separation between the synchronous and the asynchronous world. Asynchronous objects should only be created inside an asynchronous context. - No explicit vs implicit loop issues, PR #452 guarantees that objects created inside coroutines and callbacks will get the right event loop (so loop references can be omitted everywhere). - The event loop disappears completely from the user code and becomes a low-level detail. - It provides a proper way to clean things up after running the loop (e.g make sure all the pending callbacks are executed before the loop is closed, or maybe a curio-like behavior to wait for all "non-daemonic" tasks to complete.) Any limitations? ---------------- One issue I can think of is the handling of KeyboardInterrupt. For instance, how to transform the TCP server example from the docs to fit the new standard? Ideally, we should be able to write something like this: ``` async def main(): server = await asyncio.start_server(handle_echo, '127.0.0.1', 8888) print('Serving on {}'.format(server.sockets[0].getsockname())) await asyncio.wait_for_interrupt() server.close() await server.wait_closed()) ``` But the handling of interrupts is not completely settled yet (see PR #305 and issue #341). Also, some asyncio-based library might already implement a similar but specific run function. Could there be a conflict here? Related topics -------------- - [Explicit vs Implicit event loop discussion on python-tulip][1] - [asyncio PR #452: Make get_event_loop() return the current loop if called from coroutines/callbacks][2] [1]: https://groups.google.com/forum/#!topic/python-tulip/yF9C-rFpiKk [2]: https://github.com/python/asyncio/pull/452 I hope it makes sense. /Vincent

On Nov 5, 2016, at 5:09 PM, Nathaniel Smith <njs@pobox.com> wrote:
I just posted a long blog/essay that's probably of interest to folks here:
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-a...
The short version: I think curio something important to teach us; I tried to figure out what that is and how we can learn from it.
I still haven't had time to read the whole thing yet (there's quite a lot to unpack here!) but I think that <https://github.com/twisted/tubes> might be of interest in examining ways to deal with backpressure that are more declarative; flows are set up ahead of time and then manipulated explicitly as flows, rather than relying on the imperative structure of pseudo-blocking in coroutines. I should note that while Tubes's present implementation is Twisted-specific, the Twisted-specific bits are all around the edges of the system. The core has been explicitly factored to be usable on any event-driven architecture, as long as you have a notion of backpressure and a way to ingest and send data. -glyph

On 7 Nov 2016, at 19:54, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
I still haven't had time to read the whole thing yet (there's quite a lot to unpack here!) but I think that <https://github.com/twisted/tubes> might be of interest in examining ways to deal with backpressure that are more declarative; flows are set up ahead of time and then manipulated explicitly as flows, rather than relying on the imperative structure of pseudo-blocking in coroutines.
I’d like to fork the discussion off to talk about backpressure for a while. I think the issue of propagating backpressure is one that is really well suited to this list, because even *knowing* about backpressure is the marker of an engineer who has designed complex asynchronous systems. I suspect this list has a higher proportion of people capable of talking sensibly about backpressure propagation than most. For my part, I actually think the backpressure discussion in Nathaniel’s post was the most interesting *mechanical* part of the post. Nathaniel has correctly identified that any system that does buffered writes/reads (e.g. asyncio and Twisted) needs powerful systems for propagating backpressure in a sensible way. If we want people to really develop resilient asynchronous systems in Python it must be possible for those developers to be able to sensibly propagate backpressure through the system, and ideally to fall into a pit of success whereby the things developers naturally do propagate backpressure as a matter of course. Unfortunately I’d like to suggest that neither Twisted nor asyncio are in possession of really great APIs for exerting and managing backpressure. Twisted’s IPushProducer/IConsumer interface is present and moderately effective, but it has a few limits. The most frustrating limitation of this design is that it does not allow for easy construction of a pipeline: an object can implement IConsumer/IPushProducer in only one “direction”: that is, if you have a chain of objects A <-> B <-> C, B can propagate backpressure only to C or to A. That’s problematic for protocols like HTTP/2 which require balancing backpressure in multiple directions at once. An additional problem is that Twisted’s APIs here are one-to-one: that is, each B can only have one A and one C associated with it. That makes it very hard to do a fan-in/fan-out design with IPushProducer or IConsumer. Both of these problems can be worked around of course: the creation of proxy objects or implicit interfaces between multiple objects can allow this to work, and that’s effectively what Twisted’s HTTP/2 layer does. But these are complex APIs, and they are definitely expert-oriented. On top of this, the unfriendliness of that interface means that developers are likely to consider it an optional part of their protocol implementation and will thus fail to exert backpressure at all. (All of the problems of Twisted’s interface here apply to asyncio, by the by, I just happen to be able to talk in more detail about Twisted’s.) Tubes is unquestionably a better API for this, but suffers from a lack of accessibility. It would definitely be interesting to see if tubes can be easily replicated on top of asyncio (and indeed curio), because at least for me something like tubes is what I’ve wanted for a long time in the Python world. If the design of tubes is interesting to the asyncio team, it would justify spending more time trying to integrate it While we’re on the topic, we should also discuss forms of backpressure response that even curio doesn’t neatly handle. For example, in many systems that would like to be highly responsive it is common to want to propagate backpressure to the edge of the system and then to use that information to rapidly provide error messages to inbound work items (in HTTP speak, if your WSGI server is overloaded it would often be better to rapidly provide a 503 Service Unavailable response with a Retry-After header than to sit on the request for a potentially unbounded amount of time. The curio backpressure propagation mechanisms that Nathaniel outlined do not tolerate this situation well at all (not a slight on curio, just a natural extension of the behaviour of the socket-like APIs). I’m interested to see what other thoughts people in this space have though. Do alternative API designs seem sensible to people? Cory

On Nov 8, 2016, at 2:17 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
Tubes is unquestionably a better API for this, but suffers from a lack of accessibility.
What is "accessibility" in this context?
It would definitely be interesting to see if tubes can be easily replicated on top of asyncio (and indeed curio), because at least for me something like tubes is what I’ve wanted for a long time in the Python world. If the design of tubes is interesting to the asyncio team, it would justify spending more time trying to integrate it
I obviously can't speak to the interest of the asyncio team, but binding it to asyncio (you shouldn't need to "replicate" it, the library is designed to be portable to multiple backends) should be pretty easy. All you need to do is to write asyncio versions of these two modules: https://github.com/twisted/tubes/blob/2089781479a8f4a2d3027c88560bb5f39cfd90... https://github.com/twisted/tubes/blob/2089781479a8f4a2d3027c88560bb5f39cfd90... That's <500 lines of heavily documented, generously spaced code. Nothing else in the package ought to import Twisted. Given the similarity of asyncio's and Twisted's low-level callback interfaces, I imagine the translation would be fairly literal; the only real stumbling block is the lack of interfaces like IStreamServerEndpoint and IStreamClientEndpoint in asyncio. However, given that these interfaces are just one method each, adding a literal equivalent that just calls event_loop.create_connection and event_loop.create_server just for Tubes would give you _exactly_ the same interface. -glyph

On Nov 8, 2016, at 1:07 PM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
That's <500 lines of heavily documented, generously spaced code. Nothing else in the package ought to import Twisted.
Oh, um. There's some protocol parsing code that incidentally depends on Twisted right now too, but that dependency could easily be eliminated (and it doesn't touch any reactor APIs, so you can ignore it for the purposes of porting as long as Twisted is installed).

On 8 Nov 2016, at 21:07, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Nov 8, 2016, at 2:17 AM, Cory Benfield <cory@lukasa.co.uk <mailto:cory@lukasa.co.uk>> wrote:
Tubes is unquestionably a better API for this, but suffers from a lack of accessibility.
What is "accessibility" in this context?
“accessibility” in this context is essentially the collection of things that make it easy for users to a) identify a need for tubes, b) work out how to plug tubes into their application, and c) have a sensible evolution to handle evolving backpressure needs into tubes. Mostly this is a documentation thing, but there’s also a chicken-and-egg problem here, specifically: tubes provides a high-level API for flow control but requires that pre-existing code use a low-level one. How do we get from there to somewhere we can actually tell people “yeah, go use tubes”? On top of that we have: how do we justify using tubes when so much of for example Twisted’s codebase does not implement IPushProducer/IConsumer? How do people migrate a pre-existing codebase to something like tubes? How do people extend tubes to do something other than *propagate* backpressure (e.g. to implement a fast-fail path to error out rather than stop reading from a socket). All of these questions *have* answers, but those answers aren’t easily accessible. Part of this is an ongoing cultural problem which is that people who build small or non-distributed applications often don’t have to think about backpressure, so there’s another problem that also needs addressing: it needs to be so easy for people to extend their async producers and consumers of data to propagate and respond to backpressure appropriately that there’s no good reason *not* to do it. All of this complex mess of things is what I mean by “accessibility”. It needs to be easier to do the right thing than the wrong thing. Cory

On Nov 9, 2016, at 7:59 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 8 Nov 2016, at 21:07, Glyph Lefkowitz <glyph@twistedmatrix.com <mailto:glyph@twistedmatrix.com>> wrote:
On Nov 8, 2016, at 2:17 AM, Cory Benfield <cory@lukasa.co.uk <mailto:cory@lukasa.co.uk>> wrote:
Tubes is unquestionably a better API for this, but suffers from a lack of accessibility.
What is "accessibility" in this context?
“accessibility” in this context is essentially the collection of things that make it easy for users to a) identify a need for tubes, b) work out how to plug tubes into their application, and c) have a sensible evolution to handle evolving backpressure needs into tubes.
I see what you mean. I've struggled with this problem a lot myself.
Mostly this is a documentation thing, but there’s also a chicken-and-egg problem here, specifically: tubes provides a high-level API for flow control but requires that pre-existing code use a low-level one. How do we get from there to somewhere we can actually tell people “yeah, go use tubes”?
Aah, yes. This is exactly where the project is stuck. A large amount of infrastructure has to be retrofitted to be Tubes all the way down before you can make them useful. And the problem isn't just with Tubes at the infrastructure level; most applications are fundamentally not stream processing, and it will be idiomatically challenging to express them as such. HTTP connections are short-lived and interfaces to comprehending them (i.e. 'json.loads') are not themselves stream oriented. Even if you did have a stream-oriented JSON parser, expressing what you want from it is hard; you want a data structure that you can simultaneously inspect multiple elements from, not a stream of "object began" / "string began" / "string ended" / "list began" / "list ended" events.
On top of that we have: how do we justify using tubes when so much of for example Twisted’s codebase does not implement IPushProducer/IConsumer?
I think this may be slightly misleading. You're making it sound like there is a proliferation of transports or stream interfaces that don't provide these interfaces, but should. At a low level, almost everything in the Twisted codebase which is actually a _stream_ of data (rather than a request/response) does implement these interfaces, or has a public `transport` attribute which does so. For example, the HTTP response object that comes back from Agent does have a transport. The issue is not that the interfaces aren't provided on specific objects where they should be, but that the entire shape of the arbitrarily-large-request/arbitrarily-large-response pattern expects to be able to store whole responses. By the time you get to the layer which "doesn't implement" IPushProducer/IConsumer, you're at a level where such an implementation would be meaningless, or unhelpful.
How do people migrate a pre-existing codebase to something like tubes? How do people extend tubes to do something other than *propagate* backpressure (e.g. to implement a fast-fail path to error out rather than stop reading from a socket). All of these questions *have* answers, but those answers aren’t easily accessible.
I think one way to start to drill into this would be (sorry for the over-specificty to Twisted here) to address something like <https://twistedmatrix.com/trac/ticket/288> with Tubes. There are at least a few clear-cut cases where we _do_ have a large stream of data which needs to be directed to an appropriate location, and the sooner we can make the standard interface for that into "return a Fount", the sooner we can start to make Tubes just as much of the idiomatic lexicon of async I/O as awaitables, Futures or Deferreds. Pulling a dependency like this into asyncio would obviously be challenging, but 3rd-party packages like Twisted - or, for that matter, aiohttp! - could start to depend on tubes as-is.
Part of this is an ongoing cultural problem which is that people who build small or non-distributed applications often don’t have to think about backpressure, so there’s another problem that also needs addressing: it needs to be so easy for people to extend their async producers and consumers of data to propagate and respond to backpressure appropriately that there’s no good reason *not* to do it.
The extension of this cultural problem is that most of the tools used in large distributed systems are built as hobby projects for small, non-distributed systems, so even at scale and in these environments we still find ourselves fighting with layers that don't want to deal with backpressure properly. More importantly, backpressure at scale in distributed systems often means really weird stuff, like, traffic shaping on a front-end tier by coordinating with a data store or back-end tier to identify problem networks or network ranges. Tubes operates at a simpler level: connections are individual entities, and backpressure is applied uniformly across all of them. Granted, this is the basic layer you need in place to make addressing backpressure throughout a system work properly, but it's also not an exciting product that solves a super hard or complex problem.
All of this complex mess of things is what I mean by “accessibility”. It needs to be easier to do the right thing than the wrong thing.
I'm definitely open to more ideas on this topic. Retrofitting backpressure into existing systems is hard, and harder still when you're trying to expose an idiomatic, high-level API.

On 10 Nov 2016, at 02:43, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
And the problem isn't just with Tubes at the infrastructure level; most applications are fundamentally not stream processing, and it will be idiomatically challenging to express them as such. HTTP connections are short-lived and interfaces to comprehending them (i.e. 'json.loads') are not themselves stream oriented. Even if you did have a stream-oriented JSON parser, expressing what you want from it is hard; you want a data structure that you can simultaneously inspect multiple elements from, not a stream of "object began" / "string began" / "string ended" / "list began" / "list ended" events.
[snip]
More importantly, backpressure at scale in distributed systems often means really weird stuff, like, traffic shaping on a front-end tier by coordinating with a data store or back-end tier to identify problem networks or network ranges. Tubes operates at a simpler level: connections are individual entities, and backpressure is applied uniformly across all of them. Granted, this is the basic layer you need in place to make addressing backpressure throughout a system work properly, but it's also not an exciting product that solves a super hard or complex problem.
So these two things here are the bit I’m most interested in focusing on. You’re totally right: backpressure in stream oriented systems is most effectively managed in the form that tubes allows. However, many systems are not stream-oriented but instead focus on quanta of work. However, backpressure is still a real part of system design for systems like that, and it would be good to have a higher-level API for designing backpressure propagation for quantised work. The biggest problem there I think is that there’s not just one way to do that. For example, a common mechanism is to provide something like a token bucket at the edges of your system whereby you allow for only N work items to be outstanding at any one time. The obvious problem with this is that there is no one true value for N: it depends on how intensive your work items are on the system and what their latency/throughput characteristics look like. That means that, for example, Twisted cannot simply choose a value for this for its users. At this point we’re probably off into the weeds though. More important, I think, is to make a system that is amenable to having something like a token bucket attached to it and integrated into the stream-based flow control mechanisms. Cory
participants (9)
-
Andrew Svetlov
-
Brett Cannon
-
Cory Benfield
-
David Beazley
-
Glyph Lefkowitz
-
Guido van Rossum
-
Nathaniel Smith
-
Vincent Michel
-
Yury Selivanov