Auto-wrapping coroutines into Tasks

Hi all, This is a bit of a wacky idea, but I think it might be doable and have significant benefits, so throwing it out there to see what people think. In asyncio, there are currently three kinds of calling conventions for asynchronous functions: 1) Ones which return a Future 2) Ones which return a raw coroutine object 3) Ones which return a Future, but are documented to return a coroutine object, because we want to possibly switch to doing that in the future and are hoping people won't depend on them returning a Future In practice these have slightly different semantics. For example, types (1) and (3) start executing immediately, while type (2) doesn't start executing until passed to 'await' or some function like asyncio.gather. For type (1), you can immediately call .add_done_callback: func_returning_future().add_done_callback(...) while for type (2) and (3), you have to explicitly call ensure_future first: asyncio.ensure_future(func_returning_coro()).add_done_callback(...) In practice, these distinctions are mostly irrelevant and annoying to users; the only thing you can do with a raw coroutine is pass it to ensure_future() or equivalent, and the existence of type (3) functions means that you can't even assume that functions documented as returning raw coroutines actually return raw coroutines, or that these will stay the same across versions. But it is a source of confusion, see e.g. this thread on async-sig [1], or this one [2]. It also makes it harder to evolve asyncio, since any function documented as returning a Future cannot take advantage of async/await syntax. And it's forced the creation of awkward APIs like the "coroutine hook" used in asyncio's debug mode. Other languages with async/await, like C# and Javascript, don't have these problems, because they don't have raw coroutine objects at all: when you mark a function as async, that directly converts it into a function that returns a Future (or local equivalent). So the difference between async functions and Future-returning functions is only relevant to the person writing the function; callers don't have to care, and can assume that the full Future interface is always available. I think Python did a very smart thing in *not* hard-coding Futures into the language, like C#/JS do. But, I also think it would be nice if we didn't force regular asyncio users to be aware of all these details. So here's an idea: we add a new kind of hook that coroutine runners can set. In async_function.__call__, it creates a coroutine object, and then invokes this hook, which then can wrap the coroutine into a Task (or Deferred or whatever is appropriate for the current coroutine runner). This way, from the point of view of regular asyncio users, *all* async functions become functions-returning-Futures (type 1 above): async def foo(): pass # This returns a Task running on the current loop foo() Of course, async loops need a way to get at the actual coroutine objects, so we should also provide some method on async functions to do that: foo.__corocall__() -> returns a raw coroutine object And as an optimization, we can make 'await <funcall>' invoke this, so that in regular async function -> async function calls, we don't pay the cost of setting up an unnecessary Task object: # This await foo(*args, **kwargs) # Becomes sugar for: try: _callable = foo.__corocall__ except AttributeError: # Fallback, so 'await function_returning_promise()' still works: _callable = foo _awaitable = _callable(*args, **kwargs) await _awaitable (So this hook is actually quite similar to the existing coroutine hook, except that it's specifically only invoked on bare calls, not on await-calls.) Of course, if no coroutine runner hook is registered, then the default should remain the same as now. This also means that common idioms like: loop.run_until_complete(asyncfn()) still work, because at the time asyncfn() is called, no loop is running, asyncfn() silently returns a regular coroutine object, and then run_until_complete knows how to handle that. This would also help libraries like Trio that remove Futures altogether; in Trio, the convention is that 'await asyncfn()' is simply the only way to call asyncfn, and writing a bare 'asyncfn()' is always a mistake – but one that is currently confusing and difficult to detect because all it does is produce a warning ("coroutine was never awaited") at some potentially-distant location that depends on what the GC does. In this proposal, Trio could register a hook that raises an immediate error on bare 'asyncfn()' calls. This would also allow libraries built on Trio-or-similar to migrate a function from sync->async or async->sync with a deprecation period. Since in Trio sync functions would always use __call__, and async functions would always use __corocall__, then during a transition period one could use a custom object that defines both, and has one of them emit a DeprecationWarning. This is a problem that comes up a lot in new libraries, and currently doesn't have any decent solution. (It's actually happened to Trio itself, and created the only case where Trio has been forced to break API without a deprecation period.) The main problem I can see here is cases where you have multiple incompatible coroutine runners in the same program. In this case, problems could arise when you call asyncfn() under runner A and pass it to runner B for execution, so you end up with a B-flavored coroutine getting passed to A's wrapping hook. The kinds of cases where this might happen are: - Using the trio-asyncio library to run asyncio libraries under trio - Using Twisted's Future<->Deferred conversion layer - Using async/await to implement an ad hoc coroutine runner (e.g. a state machine) inside a program that also uses an async library I'm not sure if this is OK or not, but I think it might be? trio-asyncio's API is already designed to avoid passing coroutine objects across the boundary – to call an asyncio function from trio you write await run_asyncio(async_fn, *args) and then run_asyncio switches into asyncio context before actually calling async_fn, so that's actually totally fine. Twisted's API does not currently work this way, but I think it's still in enough of an early provisional state that it could be fixed. And for users implementing ad hoc coroutine runners, I think this is (a) rare, (b) using generators is probably better style, since the only difference between async/await and generators is that async/await is explicitly supposed to be opaque to users and signal "this is your async library", (c) if they're writing a coroutine runner then they can set up their coroutine runner hook appropriately anyway. But there would be some costs here; the trade-off would be a significant simplification and increase in usability, because regular users could simply stop having to know about 'coroutine objects' and 'awaitables' and all that entirely, and we'd be able to take more advantage of async/await in existing libraries. What do you think? -n [1] https://mail.python.org/pipermail/async-sig/2018-May/000484.html [2] https://mail.python.org/pipermail/async-sig/2018-April/000470.html -- Nathaniel J. Smith -- https://vorpus.org

First, "start executing immediately" is an overstatement, right? They won't run until the caller executes a (possibly unrelated) `await`. And I'm still unclear why anyone would care, *except* in the case where they've somehow learned by observation that "real" coroutines don't start immediately and build a dependency on this in their code. (The happy eyeballs use case that was brought up here earlier today seems like it would be better off not depending on this either way, and it wouldn't be hard to do this either.) Second, when adding callbacks (if you *have to* -- if you're not a framework author you're likely doing something wrong if you find yourself adding callbacks), the right thing to do is obviously to *always* call ensure_future() first. Third, hooks like this feel like a great way to create an even bigger mess -- it implicitly teaches users that all coroutines are Futures, which will then cause disappointments when they find themselves in an environment where the hook is not enabled. Perhaps we should go the other way and wrap most ways of creating Futures in coroutines? (Though there would have to be a way for ensure_future() to *unwrap* it instead of wrapping it in a second Future.) On Fri, May 4, 2018 at 2:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Fri, May 4, 2018 at 2:58 PM, Guido van Rossum <guido@python.org> wrote:
First, "start executing immediately" is an overstatement, right? They won't run until the caller executes a (possibly unrelated) `await`.
Well, traditional Future-returning functions often do execute some logic immediately, but right, what I meant was something like "starts executing without further intervention". I'm sure you know what I mean, but here's a concrete example to make sure it's clear to everyone else. Say we write this code: async_log("it happened!") If async_log is a traditional Future-returning function, then this line is sufficient to cause the message to be logged (eventually). If async_log is an async coroutine-returning function, then it's a no-op (except for generating a "coroutine was never awaited" warning). With this proposal, it would always work.
Async/await often lets you avoid working with the Future API directly, but Futures are still a major part of asyncio's public API, and so are synchronous-flavored systems like protocols/transports, where you can't use 'await'. I've been told that we need to keep that in mind when thinking about asyncio extensions ;-). And if the right thing to do is to *always* call a function, that's a good argument that the library should call it for you, right? :-) In practice I think cases like my 'async_log' example are the main place where people are likely to run into this – there are a lot of functions out there a bare call works to run something in the background, and a lot where it doesn't. (In particular, all existing Tornado and Twisted APIs are Future-returning, not async.)
Switching between async libraries is always going to be a pretty messy. So I guess the only case people are likely to actually encounter an unexpected hook configuration is in the period before they enter asyncio (or whatever library they're using). Like, if you've learned that async functions always return Futures, you might expect this to work: fut = some_async_fun() # Error, 'fut' is actually a coroutine b/c the hook isn't set up yet fut.add_done_callback(...) asyncio.run(fut) That's a bit of a wart. But this is something that basically never worked and can't work, and very few people are likely to run into, so while it's sad that it's a wart I don't think it's an argument against fixing the other 99% of cases? (And of course this doesn't arise for libraries like Trio, where you just never call async functions outside of async context.)
So there's a few reasons I didn't suggest going this direction: - Just in practical terms, I don't know how we could make this change. There's one place that all coroutines are created, so we at least have the technical ability to change their behavior all at once. OTOH Future-returning functions are just regular functions that happen to return a Future, so we'd have to go fix them one at a time, right? - For regular asyncio users, the Future API is pretty much a superset of the coroutine API. (The only thing you can do with an coroutine is await it or call ensure_future, and Futures allow both of those.) That means that turning coroutines into Futures is mostly backwards compatible, but turning Futures into coroutines isn't. - Similarly, having coroutine-returning functions start running without further intervention is *mostly* backwards compatible, because it's very unusual to intentionally create a coroutine object and then never actually run it (via await or ensure_future or whatever). But I suspect it is fairly common to call Future-returning functions and never await them, like in the async_log example above. This is why we have the weird "category 3" in the first place: people would like to refactor Future-returning APIs to take advantage of async/await, but right now that's a compatibility-breaking change. - Exposing raw coroutine objects to users has led to various gross-ish hacks, like the hoops that asyncio debug mode has to jump through to try to give better warnings about missing 'await'. Eliminating raw coroutine objects from public APIs would remove the need for these hacks. Making coroutine objects more prominent would have the opposite effect :-). - And also of course it wouldn't have the benefits for Trio (better error messages for forgetting an 'await', ability to transition a function between sync and async with a deprecation period), for whatever that's worth. -n -- Nathaniel J. Smith -- https://vorpus.org

First, "start executing immediately" is an overstatement, right? They won't run until the caller executes a (possibly unrelated) `await`. And I'm still unclear why anyone would care, *except* in the case where they've somehow learned by observation that "real" coroutines don't start immediately and build a dependency on this in their code. (The happy eyeballs use case that was brought up here earlier today seems like it would be better off not depending on this either way, and it wouldn't be hard to do this either.) Second, when adding callbacks (if you *have to* -- if you're not a framework author you're likely doing something wrong if you find yourself adding callbacks), the right thing to do is obviously to *always* call ensure_future() first. Third, hooks like this feel like a great way to create an even bigger mess -- it implicitly teaches users that all coroutines are Futures, which will then cause disappointments when they find themselves in an environment where the hook is not enabled. Perhaps we should go the other way and wrap most ways of creating Futures in coroutines? (Though there would have to be a way for ensure_future() to *unwrap* it instead of wrapping it in a second Future.) On Fri, May 4, 2018 at 2:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Fri, May 4, 2018 at 2:58 PM, Guido van Rossum <guido@python.org> wrote:
First, "start executing immediately" is an overstatement, right? They won't run until the caller executes a (possibly unrelated) `await`.
Well, traditional Future-returning functions often do execute some logic immediately, but right, what I meant was something like "starts executing without further intervention". I'm sure you know what I mean, but here's a concrete example to make sure it's clear to everyone else. Say we write this code: async_log("it happened!") If async_log is a traditional Future-returning function, then this line is sufficient to cause the message to be logged (eventually). If async_log is an async coroutine-returning function, then it's a no-op (except for generating a "coroutine was never awaited" warning). With this proposal, it would always work.
Async/await often lets you avoid working with the Future API directly, but Futures are still a major part of asyncio's public API, and so are synchronous-flavored systems like protocols/transports, where you can't use 'await'. I've been told that we need to keep that in mind when thinking about asyncio extensions ;-). And if the right thing to do is to *always* call a function, that's a good argument that the library should call it for you, right? :-) In practice I think cases like my 'async_log' example are the main place where people are likely to run into this – there are a lot of functions out there a bare call works to run something in the background, and a lot where it doesn't. (In particular, all existing Tornado and Twisted APIs are Future-returning, not async.)
Switching between async libraries is always going to be a pretty messy. So I guess the only case people are likely to actually encounter an unexpected hook configuration is in the period before they enter asyncio (or whatever library they're using). Like, if you've learned that async functions always return Futures, you might expect this to work: fut = some_async_fun() # Error, 'fut' is actually a coroutine b/c the hook isn't set up yet fut.add_done_callback(...) asyncio.run(fut) That's a bit of a wart. But this is something that basically never worked and can't work, and very few people are likely to run into, so while it's sad that it's a wart I don't think it's an argument against fixing the other 99% of cases? (And of course this doesn't arise for libraries like Trio, where you just never call async functions outside of async context.)
So there's a few reasons I didn't suggest going this direction: - Just in practical terms, I don't know how we could make this change. There's one place that all coroutines are created, so we at least have the technical ability to change their behavior all at once. OTOH Future-returning functions are just regular functions that happen to return a Future, so we'd have to go fix them one at a time, right? - For regular asyncio users, the Future API is pretty much a superset of the coroutine API. (The only thing you can do with an coroutine is await it or call ensure_future, and Futures allow both of those.) That means that turning coroutines into Futures is mostly backwards compatible, but turning Futures into coroutines isn't. - Similarly, having coroutine-returning functions start running without further intervention is *mostly* backwards compatible, because it's very unusual to intentionally create a coroutine object and then never actually run it (via await or ensure_future or whatever). But I suspect it is fairly common to call Future-returning functions and never await them, like in the async_log example above. This is why we have the weird "category 3" in the first place: people would like to refactor Future-returning APIs to take advantage of async/await, but right now that's a compatibility-breaking change. - Exposing raw coroutine objects to users has led to various gross-ish hacks, like the hoops that asyncio debug mode has to jump through to try to give better warnings about missing 'await'. Eliminating raw coroutine objects from public APIs would remove the need for these hacks. Making coroutine objects more prominent would have the opposite effect :-). - And also of course it wouldn't have the benefits for Trio (better error messages for forgetting an 'await', ability to transition a function between sync and async with a deprecation period), for whatever that's worth. -n -- Nathaniel J. Smith -- https://vorpus.org
participants (2)
-
Guido van Rossum
-
Nathaniel Smith