[Python-ideas] Auto-wrapping coroutines into Tasks

Fri May 4 17:58:59 EDT 2018

First, "start executing immediately" is an overstatement, right? They won't
run until the caller executes a (possibly unrelated) `await`. And I'm still
unclear why anyone would care, *except* in the case where they've somehow
learned by observation that "real" coroutines don't start immediately and
build a dependency on this in their code. (The happy eyeballs use case that
was brought up here earlier today seems like it would be better off not
depending on this either way, and it wouldn't be hard to do this either.)

Second, when adding callbacks (if you *have to* -- if you're not a
framework author you're likely doing something wrong if you find yourself
adding callbacks), the right thing to do is obviously to *always* call
ensure_future() first.

Third, hooks like this feel like a great way to create an even bigger mess
-- it implicitly teaches users that all coroutines are Futures, which will
then cause disappointments when they find themselves in an environment
where the hook is not enabled.

Perhaps we should go the other way and wrap most ways of creating Futures
in coroutines? (Though there would have to be a way for ensure_future() to
*unwrap* it instead of wrapping it in a second Future.)

On Fri, May 4, 2018 at 2:41 PM, Nathaniel Smith <njs at pobox.com> wrote:

> Hi all,
>
> This is a bit of a wacky idea, but I think it might be doable and have
> significant benefits, so throwing it out there to see what people
> think.
>
> In asyncio, there are currently three kinds of calling conventions for
> asynchronous functions:
>
> 1) Ones which return a Future
> 2) Ones which return a raw coroutine object
> 3) Ones which return a Future, but are documented to return a
> coroutine object, because we want to possibly switch to doing that in
> the future and are hoping people won't depend on them returning a
> Future
>
> In practice these have slightly different semantics. For example,
> types (1) and (3) start executing immediately, while type (2) doesn't
> start executing until passed to 'await' or some function like
> asyncio.gather. For type (1), you can immediately call
> .add_done_callback:
>
>   func_returning_future().add_done_callback(...)
>
> while for type (2) and (3), you have to explicitly call ensure_future
> first:
>
>   asyncio.ensure_future(func_returning_coro()).add_done_callback(...)
>
> In practice, these distinctions are mostly irrelevant and annoying to
> users; the only thing you can do with a raw coroutine is pass it to
> ensure_future() or equivalent, and the existence of type (3) functions
> means that you can't even assume that functions documented as
> returning raw coroutines actually return raw coroutines, or that these
> will stay the same across versions. But it is a source of confusion,
> see e.g. this thread on async-sig [1], or this one [2]. It also makes
> it harder to evolve asyncio, since any function documented as
> returning a Future cannot take advantage of async/await syntax. And
> it's forced the creation of awkward APIs like the "coroutine hook"
> used in asyncio's debug mode.
>
> Other languages with async/await, like C# and Javascript, don't have
> these problems, because they don't have raw coroutine objects at all:
> when you mark a function as async, that directly converts it into a
> function that returns a Future (or local equivalent). So the
> difference between async functions and Future-returning functions is
> only relevant to the person writing the function; callers don't have
> to care, and can assume that the full Future interface is always
> available.
>
> I think Python did a very smart thing in *not* hard-coding Futures
> into the language, like C#/JS do. But, I also think it would be nice
> if we didn't force regular asyncio users to be aware of all these
> details.
>
> So here's an idea: we add a new kind of hook that coroutine runners
> can set. In async_function.__call__, it creates a coroutine object,
> and then invokes this hook, which then can wrap the coroutine into a
> Task (or Deferred or whatever is appropriate for the current coroutine
> runner). This way, from the point of view of regular asyncio users,
> *all* async functions become functions-returning-Futures (type 1
> above):
>
> async def foo():
>     pass
>
> # This returns a Task running on the current loop
> foo()
>
> Of course, async loops need a way to get at the actual coroutine
> objects, so we should also provide some method on async functions to
> do that:
>
> foo.__corocall__() -> returns a raw coroutine object
>
> And as an optimization, we can make 'await <funcall>' invoke this, so
> that in regular async function -> async function calls, we don't pay
> the cost of setting up an unnecessary Task object:
>
> # This
> await foo(*args, **kwargs)
> # Becomes sugar for:
> try:
>     _callable = foo.__corocall__
> except AttributeError:
>     # Fallback, so 'await function_returning_promise()' still works:
>     _callable = foo
> _awaitable = _callable(*args, **kwargs)
> await _awaitable
>
> (So this hook is actually quite similar to the existing coroutine
> hook, except that it's specifically only invoked on bare calls, not on
> await-calls.)
>
> Of course, if no coroutine runner hook is registered, then the default
> should remain the same as now. This also means that common idioms
> like:
>
> loop.run_until_complete(asyncfn())
>
> still work, because at the time asyncfn() is called, no loop is
> running, asyncfn() silently returns a regular coroutine object, and
> then run_until_complete knows how to handle that.
>
> This would also help libraries like Trio that remove Futures
> altogether; in Trio, the convention is that 'await asyncfn()' is
> simply the only way to call asyncfn, and writing a bare 'asyncfn()' is
> always a mistake – but one that is currently confusing and difficult
> to detect because all it does is produce a warning ("coroutine was
> never awaited") at some potentially-distant location that depends on
> what the GC does. In this proposal, Trio could register a hook that
> raises an immediate error on bare 'asyncfn()' calls.
>
> This would also allow libraries built on Trio-or-similar to migrate a
> function from sync->async or async->sync with a deprecation period.
> Since in Trio sync functions would always use __call__, and async
> functions would always use __corocall__, then during a transition
> period one could use a custom object that defines both, and has one of
> them emit a DeprecationWarning. This is a problem that comes up a lot
> in new libraries, and currently doesn't have any decent solution.
> (It's actually happened to Trio itself, and created the only case
> where Trio has been forced to break API without a deprecation period.)
>
> The main problem I can see here is cases where you have multiple
> incompatible coroutine runners in the same program. In this case,
> problems could arise when you call asyncfn() under runner A and pass
> it to runner B for execution, so you end up with a B-flavored
> coroutine getting passed to A's wrapping hook. The kinds of cases
> where this might happen are:
>
> - Using the trio-asyncio library to run asyncio libraries under trio
> - Using Twisted's Future<->Deferred conversion layer
> - Using async/await to implement an ad hoc coroutine runner (e.g. a
> state machine) inside a program that also uses an async library
>
> I'm not sure if this is OK or not, but I think it might be?
> trio-asyncio's API is already designed to avoid passing coroutine
> objects across the boundary – to call an asyncio function from trio
> you write
>
> await run_asyncio(async_fn, *args)
>
> and then run_asyncio switches into asyncio context before actually
> calling async_fn, so that's actually totally fine. Twisted's API does
> not currently work this way, but I think it's still in enough of an
> early provisional state that it could be fixed. And for users
> implementing ad hoc coroutine runners, I think this is (a) rare, (b)
> using generators is probably better style, since the only difference
> between async/await and generators is that async/await is explicitly
> supposed to be opaque to users and signal "this is your async
> library", (c) if they're writing a coroutine runner then they can set
> up their coroutine runner hook appropriately anyway. But there would
> be some costs here; the trade-off would be a significant
> simplification and increase in usability, because regular users could
> simply stop having to know about 'coroutine objects' and 'awaitables'
> and all that entirely, and we'd be able to take more advantage of
> async/await in existing libraries.
>
> What do you think?
>
> -n
>
> [1] https://mail.python.org/pipermail/async-sig/2018-May/000484.html
> [2] https://mail.python.org/pipermail/async-sig/2018-April/000470.html
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180504/047d12bd/attachment.html>