[Python-ideas] Auto-wrapping coroutines into Tasks

Fri May 4 17:41:25 EDT 2018

Hi all,

This is a bit of a wacky idea, but I think it might be doable and have
significant benefits, so throwing it out there to see what people
think.

In asyncio, there are currently three kinds of calling conventions for
asynchronous functions:

1) Ones which return a Future
2) Ones which return a raw coroutine object
3) Ones which return a Future, but are documented to return a
coroutine object, because we want to possibly switch to doing that in
the future and are hoping people won't depend on them returning a
Future

In practice these have slightly different semantics. For example,
types (1) and (3) start executing immediately, while type (2) doesn't
start executing until passed to 'await' or some function like
asyncio.gather. For type (1), you can immediately call
.add_done_callback:

  func_returning_future().add_done_callback(...)

while for type (2) and (3), you have to explicitly call ensure_future first:

  asyncio.ensure_future(func_returning_coro()).add_done_callback(...)

In practice, these distinctions are mostly irrelevant and annoying to
users; the only thing you can do with a raw coroutine is pass it to
ensure_future() or equivalent, and the existence of type (3) functions
means that you can't even assume that functions documented as
returning raw coroutines actually return raw coroutines, or that these
will stay the same across versions. But it is a source of confusion,
see e.g. this thread on async-sig [1], or this one [2]. It also makes
it harder to evolve asyncio, since any function documented as
returning a Future cannot take advantage of async/await syntax. And
it's forced the creation of awkward APIs like the "coroutine hook"
used in asyncio's debug mode.

Other languages with async/await, like C# and Javascript, don't have
these problems, because they don't have raw coroutine objects at all:
when you mark a function as async, that directly converts it into a
function that returns a Future (or local equivalent). So the
difference between async functions and Future-returning functions is
only relevant to the person writing the function; callers don't have
to care, and can assume that the full Future interface is always
available.

I think Python did a very smart thing in *not* hard-coding Futures
into the language, like C#/JS do. But, I also think it would be nice
if we didn't force regular asyncio users to be aware of all these
details.

So here's an idea: we add a new kind of hook that coroutine runners
can set. In async_function.__call__, it creates a coroutine object,
and then invokes this hook, which then can wrap the coroutine into a
Task (or Deferred or whatever is appropriate for the current coroutine
runner). This way, from the point of view of regular asyncio users,
*all* async functions become functions-returning-Futures (type 1
above):

async def foo():
    pass

# This returns a Task running on the current loop
foo()

Of course, async loops need a way to get at the actual coroutine
objects, so we should also provide some method on async functions to
do that:

foo.__corocall__() -> returns a raw coroutine object

And as an optimization, we can make 'await <funcall>' invoke this, so
that in regular async function -> async function calls, we don't pay
the cost of setting up an unnecessary Task object:

# This
await foo(*args, **kwargs)
# Becomes sugar for:
try:
    _callable = foo.__corocall__
except AttributeError:
    # Fallback, so 'await function_returning_promise()' still works:
    _callable = foo
_awaitable = _callable(*args, **kwargs)
await _awaitable

(So this hook is actually quite similar to the existing coroutine
hook, except that it's specifically only invoked on bare calls, not on
await-calls.)

Of course, if no coroutine runner hook is registered, then the default
should remain the same as now. This also means that common idioms
like:

loop.run_until_complete(asyncfn())

still work, because at the time asyncfn() is called, no loop is
running, asyncfn() silently returns a regular coroutine object, and
then run_until_complete knows how to handle that.

This would also help libraries like Trio that remove Futures
altogether; in Trio, the convention is that 'await asyncfn()' is
simply the only way to call asyncfn, and writing a bare 'asyncfn()' is
always a mistake – but one that is currently confusing and difficult
to detect because all it does is produce a warning ("coroutine was
never awaited") at some potentially-distant location that depends on
what the GC does. In this proposal, Trio could register a hook that
raises an immediate error on bare 'asyncfn()' calls.

This would also allow libraries built on Trio-or-similar to migrate a
function from sync->async or async->sync with a deprecation period.
Since in Trio sync functions would always use __call__, and async
functions would always use __corocall__, then during a transition
period one could use a custom object that defines both, and has one of
them emit a DeprecationWarning. This is a problem that comes up a lot
in new libraries, and currently doesn't have any decent solution.
(It's actually happened to Trio itself, and created the only case
where Trio has been forced to break API without a deprecation period.)

The main problem I can see here is cases where you have multiple
incompatible coroutine runners in the same program. In this case,
problems could arise when you call asyncfn() under runner A and pass
it to runner B for execution, so you end up with a B-flavored
coroutine getting passed to A's wrapping hook. The kinds of cases
where this might happen are:

- Using the trio-asyncio library to run asyncio libraries under trio
- Using Twisted's Future<->Deferred conversion layer
- Using async/await to implement an ad hoc coroutine runner (e.g. a
state machine) inside a program that also uses an async library

I'm not sure if this is OK or not, but I think it might be?
trio-asyncio's API is already designed to avoid passing coroutine
objects across the boundary – to call an asyncio function from trio
you write

await run_asyncio(async_fn, *args)

and then run_asyncio switches into asyncio context before actually
calling async_fn, so that's actually totally fine. Twisted's API does
not currently work this way, but I think it's still in enough of an
early provisional state that it could be fixed. And for users
implementing ad hoc coroutine runners, I think this is (a) rare, (b)
using generators is probably better style, since the only difference
between async/await and generators is that async/await is explicitly
supposed to be opaque to users and signal "this is your async
library", (c) if they're writing a coroutine runner then they can set
up their coroutine runner hook appropriately anyway. But there would
be some costs here; the trade-off would be a significant
simplification and increase in usability, because regular users could
simply stop having to know about 'coroutine objects' and 'awaitables'
and all that entirely, and we'd be able to take more advantage of
async/await in existing libraries.

What do you think?

-n

[1] https://mail.python.org/pipermail/async-sig/2018-May/000484.html
[2] https://mail.python.org/pipermail/async-sig/2018-April/000470.html

-- 
Nathaniel J. Smith -- https://vorpus.org