[Python-ideas] The async API of the future: Twisted and Deferreds

Sun Oct 14 01:42:09 CEST 2012

There has to be some way to contract emails sent in discussions rather
than exploding them. I swear I'm trying to be concise, yet readable.
It's not working.

On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum <guido at python.org> wrote:
> I also don't doubt that using classic Futures you can't do this -- the
> chaining really matter for this style, and I presume this (modulo
> unimportant API differences) is what typical Twisted code looks like.

My experience has been unfortunately rather devoid of deferreds in
Twisted. I always feel like the odd one out when people discuss this
confusion. For me, it was all Protocol this and Protocol that, and
deferreds only came up when I used Twisted's great AMP (Asynchronous
Messaging Protocol) library.

> However, Python has yield, and you can do much better (I'll write
> plain yield for now, but it works the same with yield-from):
>
> try:
>   value1 = yield step1(<args>)
>   value2 = yield step2(value1)
>   value3 = yield step3(value2)
>   # Do something with value4
> except Exception:
>   # Handle any error from step1 through step4
>
--snip--
>
> This form is more flexible, since it is easier to catch different
> exceptions at different points. It is also much easier to pass extra
> information around. E.g. what if your flow ends up having to pass both
> value1 and value2 into step3()? Sure, you can do that by making value2
> a tuple (or a dict, or an object) incorporating value1 and the
> original value2, but that's exactly where this style becomes
> cumbersome, whereas in the yield-based form, such things can remain
> simple local variables. All in all I find it more readable.

Well, first of all, deferreds have ways of joining values together. For example:

    from __future__ import print_function
    from twisted.internet import defer

    def example_joined():
        d1 = defer.Deferred()
        d2 = defer.Deferred()
        # consumeErrors looks scary, but it only means that
        # d1 and d2's errbacks aren't called. Instead, the error is sent to d's
        # errback.
        d = defer.gatherResults([d1, d2], consumeErrors=True)

        d.addCallback(print)
        d.addErrback(lambda v: print("ERROR!"))

        d1.callback("The first deferred has succeeded")
        # now we're waiting on the second deferred to succeed,
        # which we'll let the caller handle
        return d2

    example_joined().callback("The second deferred has succeeded too!")
    print("==============")
    example_joined().errback("The second deferred has failed...")

I agree it's easier to use the generator style in many complicated
cases. That doesn't preclude manual deferreds from also being useful.

> So, in the end, for Python 3.4 and beyond, I want to promote a style
> that mixes simple callbacks (perhaps augmented with simple Futures)
> and generator-based coroutines (either PEP 342, yield/send-based, or
> PEP 380 yield-from-based). I'm looking to Twisted for the best
> reactors (see other thread). But for transport/protocol
> implementations I think that generator/coroutines offers a cleaner,
> better interface than incorporating Deferred.

Egh. I mean, sure, supposed we have those things. But what if you want
to send the result of a callback to a generator-coroutine? Presumably
generator coroutines work by yielding deferreds and being called back
when the future resolves (deferred fires). But if those
futures/deferreds aren't unexposed, and instead only the generator
stuff is exposed, then bridging the gap between callbacks and
generator-coroutines is impossible. So every callback function has to
also be defined to use something else. And worse, other APIs using
callbacks are left in the dust.

Suppose, OTOH, futures/deferreds are exposed. Then we can easily
bridge between callbacks and generators, by returning a future whose
`set_result` is the callback to our callback function (deferred whose
`callback` is the callback).

But if we're exposing futures/deferreds, why have callbacks in the
first place? The difference between these two functions, is that the
second can be used in generator-coroutines trivially and the first
cannot:

    # callbacks:
    reactor.timer(10, print, "hello world")

    # deferreds
    reactor.timer(10).addCallback(print, "hello world")

Now here's another thing: suppose we have a list of "deferred events",
but instead of handling all 10 at once, we want to handle them "as
they arrive", and then synthesize a result at the bottom. How do you
do this with pure generator coroutines?

For example, perhaps I am implementing a game server, where all the
players choose their characters and then the game begins. Whenever a
character is chosen, everyone else has to know about it so that they
can plan their strategy based on who has chosen a character. Character
selections are final, just so that I can use deferreds (hee hee).

I am imagining something like the following:

    # WRONG: handles players in a certain order, rather than as they come in
    def player_lobby(reactor, players):
        for player in players:
            player_character = yield player.wait_for_confirm(reactor)
            player.set_character(player_character)
            # tell all the other players what character the player has chosen
            notify_choice((player, player_character), players)

        start_game(players)

This is wrong, because it goes in a certain order and "blocks" the
coroutine until every character is chosen. Players will not know who
has chosen what characters in an appropriate order.

But hypothetically, maybe we could do the following:

    # Hypothetical magical code?
    def player_lobby(reactor, players):
        confirmation_events =
UnorderedEventList([player.wait_for_confirm(reactor) for player in
players])
        while confirmation_events:
            player_character = yield confirmation_events.get_next()
            player.set_character(player_character)
            # tell all the other players what character the player has chosen
            notify_choice((player, player_character), players)

        start_game(players)

But then, how do we write UnorderedEventList? I don't really know. I
suspect I've made the problem harder, not easier! eek. Plus, it
doesn't even read very well. Especially not compared to the deferred
version:

This is how I would personally do it in Twisted, without using
UnorderedEventList (no magic!):

    @inlineCallbacks
    def player_lobby(reactor, players):
        events = []
        for player in players:
            confirm_event = player.wait_for_confirm(reactor)
            @confirm_event.addCallback
            def on_confirmation(player_character, player=player)
                player.set_character(player_character)
                # tell all the other players what character the player
has chosen
                notify_choice((player, player_character), players)

        yield gatherResults(events)
        start_game(players)

Notice how I dropped down into the level of manipulating deferreds so
that I could add this "as they come in" functionality, and then went
back. Actually it wouldn't've hurt much to just not bother with
inlineCallbacks at all.

I don't think this is particularly unreadable. More importantly, I
actually know how to do it. I have no idea how I would do this without
using addCallback, or without reimplementing addCallback using
inlineCallbacks.

And then, supposing we don't have these deferreds/futures exposed...
how do we implement delayed computation stuff from extension modules?
What if we want to do these kinds of compositions within said
extension modules? What if we want to write our own version of @tasks
or @inlineCallbacks with extra features, or generate callback chains
from XML files, and so on?

I don't really like the prospect of having just the "sugary syntax"
available, without a flexible underlying representation also exposed.
I don't know if you've ever shared that worry -- sometimes the pretty
syntax gets in the way of getting stuff done.

> I hope that the path forward for Twisted will be simple enough: it
> should be possible to hook Deferred into the simpler callback APIs
> (perhaps a new implementation using some form of adaptation, but
> keeping the interface the same). In a sense, the greenlet/gevent crowd
> will be the biggest losers, since they currently write async code
> without either callbacks or yield, using microthreads instead. I
> wouldn't want to have to start putting yield back everywhere into that
> code. But the stdlib will still support yield-free blocking calls
> (even if under the hood some of these use yield/send-based or
> yield-from-based couroutines) so the monkey-patchey tradition can
> continue.

Surely it's no harder to make yourself into a generator than to make
yourself into a low-level thread-like context switching function with
a saved callstack implemented by hand in assembler, and so on?

I'm sure they'll be fine.

>> 1. Explicit callbacks:
>>
>>     For example, reactor.callLater(t, lambda: print("woo hoo"))
>
> I actually like this, as it's a lowest-common-denominator approach
> which everyone can easily adapt to their purposes. See the thread I
> started about reactors.

Will do (but also see my response above about why not "everyone" can).

>> 2. Method dispatch callbacks:
>>
>>     Similar to the above, the reactor or somebody has a handle on your
>> object, and calls methods that you've defined when events happen
>>     e.g. IProtocol's dataReceived method
>
> While I'm sure it's expedient and captures certain common patterns
> well, I like this the least of all -- calling fixed methods on an
> object sounds like a step back; it smells of the old Java way (before
> it had some equivalent of anonymous functions), and of asyncore, which
> (nearly) everybody agrees is kind of bad due to its insistence that
> you subclass its classes. (Notice how subclassing as the prevalent
> approach to structuring your code has gotten into a lot of discredit
> since 1996.)

I only used asyncore once, indirectly, so I don't know anything about
it. I'm willing to dismiss it (and, in fact, various parts of twisted
(I'm looking at you twisted.words)) as not good examples of the
pattern.

First of all, I'd like to separate the notion of subclassing and
method dispatch. They're entirely unrelated. If I pass my object to
you, and you call different methods depending on what happens
elsewhere, that's method dispatch. And my object doesn't have to be
subclassed or anything for it to happen.

Now here's the thing. Suppose we're writing, for example, an IRC bot.
(Everyone loves IRC bots.)  My IRC bot needs to handle several
different possible events, such as:

    private messages
    channel join event
    CTCP event

and so on. My event handlers for each of these events probably
manipulate some internal state (such as a log file, or a GUI). We'd
probably organize this as a class, or else as a bunch of functions
accessing global state. Or, perhaps a collection of closures. This
last one is pretty unlikely.

For the most part, these functions are all intrinsically related and
can't be sensibly treated separately. You can't take the private
message callback of Bot A, and the channel join callback of bot B, and
register these and expect a result that makes sense.

If we look at this, we're expecting to deal with a set of functions
that manage shared data. The abstraction for this is usually an
object, and we'd really probably write the callbacks in a class unless
we were being contrarian. And it's not too crazy for the dispatcher to
know this and expect you to write it as a class that supports a
certain interface (certain methods correspond to certain events).
Missing methods can be assumed to have the empty implementation (no
subclassing, just catching AttributeError).

This isn't too much of an imposition on the user -- any collection of
functions (with shared state via globals or closure variables) can be
converted to an object with callable attributes very simply (thanks to
types.SimpleNamespace, especially). And I only really think this is OK
when writing it as an object -- as a collection of functions with
shared state -- is the eminently obvious primary use case, so that
that situation wouldn't come up very often.

So, as an example, a protocol that passes data on further down the
line needs to be notified when data is received, but also when the
connection begins and ends. So the twisted protocol interface has
"dataReceived", "connectionMade", and "connectionLost" callbacks.
These really do belong together, they manage a single connection
between computers and how it gets mapped to events usable by a twisted
application. So I like the convenience and suggestiveness of them all
being methods on an object.

>> 4. Generator coroutines
>>
>>     These are a syntactic wrapper around deferreds. If you yield a
>> deferred, you will be sent the result if the deferred succeeds, or an
>> exception if the deferred fails.
>>     e.g. examples from previous message
>
> Seeing them as syntactic sugar for Deferreds is one way of looking at
> it; no doubt this is how they're seen in the Twisted community because
> Deferreds are older and more entrenched. But there's no requirement
> that an architecture has to have Deferreds in order to use generator
> coroutines -- simple Futures will do just fine, and Greg Ewing has
> shown that using yield-from you can even do without those. (But he
> does use simple, explicit callbacks at the lowest level of his
> system.)

I meant it as a factual explanation of what generator coroutines are
in Twisted, not what they are in general. Sorry for the confusion. We
are probably agreed here.

After a cursory examination, I don't really understand Greg Ewing's
thing. I'd have to dig deeper into the logs for when he first
introduced it.

> I'd like to come back to that Django example though. You are implying
> that there are some opportunities for concurrency here, and I agree,
> assuming we believe disk I/O is slow enough to bother making it
> asynchronously. (In App Engine it's not, and we can't anyways, but in
> other contexts I agree that it would be bad if a slow disk seek were
> to hold up all processing -- not to mention that it might really be
> NFS...)
>
--snip--
> How would you code that using Twisted Deferreds?

Well. I'd replace the @task in your NDB thing with @inlineCallbacks
and call it a day. ;)

(I think there's enough deferred examples above, and I'm getting tired
and it's been a day since I started writing this damned email.)

>> For that stuff, you'd have to speak to the main authors of Twisted.
>> I'm just a twisted user. :(
>
> They seem to be mostly ignoring this conversation, so your standing in
> as a proxy for them is much appreciated!

Well. We are on Python-Ideas... :(

>> In the end it really doesn't matter what API you go with. The Twisted
>> people will wrap it up so that they are compatible, as far as that is
>> possible.
>
> And I want to ensure that that is possible and preferably easy, if I
> can do it without introducing too many warts in the API that
> non-Twisted users see and use.

I probably lack the expertise to help too much with this. I can point
out anything that sticks out, if/when an extended futures proposal is
made.

-- Devin