[Python-ideas] The async API of the future: Twisted and Deferreds

Sun Oct 14 17:53:15 CEST 2012

On Sat, Oct 13, 2012 at 4:42 PM, Devin Jeanpierre
<jeanpierreda at gmail.com> wrote:
> There has to be some way to contract emails sent in discussions rather
> than exploding them. I swear I'm trying to be concise, yet readable.
> It's not working.

Don't worry too much. I took essentially all Friday starting those
four new threads. I am up at night thinking about the issues. I can't
expect everyone else to have this much time to devote to Python!

> On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum <guido at python.org> wrote:
>> I also don't doubt that using classic Futures you can't do this -- the
>> chaining really matter for this style, and I presume this (modulo
>> unimportant API differences) is what typical Twisted code looks like.
>
> My experience has been unfortunately rather devoid of deferreds in
> Twisted. I always feel like the odd one out when people discuss this
> confusion. For me, it was all Protocol this and Protocol that, and
> deferreds only came up when I used Twisted's great AMP (Asynchronous
> Messaging Protocol) library.

Especially odd since you jumped into the discussion when I called
Deferreds a bad name. :-)

>> However, Python has yield, and you can do much better (I'll write
>> plain yield for now, but it works the same with yield-from):
>>
>> try:
>>   value1 = yield step1(<args>)
>>   value2 = yield step2(value1)
>>   value3 = yield step3(value2)
>>   # Do something with value4
>> except Exception:
>>   # Handle any error from step1 through step4
>>
> --snip--
>>
>> This form is more flexible, since it is easier to catch different
>> exceptions at different points. It is also much easier to pass extra
>> information around. E.g. what if your flow ends up having to pass both
>> value1 and value2 into step3()? Sure, you can do that by making value2
>> a tuple (or a dict, or an object) incorporating value1 and the
>> original value2, but that's exactly where this style becomes
>> cumbersome, whereas in the yield-based form, such things can remain
>> simple local variables. All in all I find it more readable.
>
> Well, first of all, deferreds have ways of joining values together. For example:
>
>     from __future__ import print_function
>     from twisted.internet import defer
>
>     def example_joined():
>         d1 = defer.Deferred()
>         d2 = defer.Deferred()
>         # consumeErrors looks scary, but it only means that
>         # d1 and d2's errbacks aren't called. Instead, the error is sent to d's
>         # errback.
>         d = defer.gatherResults([d1, d2], consumeErrors=True)
>
>         d.addCallback(print)
>         d.addErrback(lambda v: print("ERROR!"))
>
>         d1.callback("The first deferred has succeeded")
>         # now we're waiting on the second deferred to succeed,
>         # which we'll let the caller handle
>         return d2
>
>     example_joined().callback("The second deferred has succeeded too!")
>     print("==============")
>     example_joined().errback("The second deferred has failed...")

I'm sorry, but that's not very readable at all. You needed a lambda
(which if there was anything more would have to be expanded using
'def') and you're cheating by passing print as a callable (which saves
you a second lambda, but only in this simple case).

A readable version of this could should not have to use lambdas.

> I agree it's easier to use the generator style in many complicated
> cases. That doesn't preclude manual deferreds from also being useful.

Yeah, but things should be as simple as they can. If you can do
everything using plain callbacks, Futures and coroutines, why add
Deferreds even if you can? (Except for backward compatibility of
course. That's a totally different topic. But we're first defining the
API of the future.) If Greg Ewing had his way we'd even do without
Futures -- I'm still considering that bid. (In the yield-from thread
I'm asking for common patterns that the new API should be able to
solve.)

>> So, in the end, for Python 3.4 and beyond, I want to promote a style
>> that mixes simple callbacks (perhaps augmented with simple Futures)
>> and generator-based coroutines (either PEP 342, yield/send-based, or
>> PEP 380 yield-from-based). I'm looking to Twisted for the best
>> reactors (see other thread). But for transport/protocol
>> implementations I think that generator/coroutines offers a cleaner,
>> better interface than incorporating Deferred.
>
> Egh. I mean, sure, supposed we have those things. But what if you want
> to send the result of a callback to a generator-coroutine? Presumably
> generator coroutines work by yielding deferreds and being called back
> when the future resolves (deferred fires).

No, they don't use deferreds. They use Futures. You've made it quite
clear that they are very different.

> But if those
> futures/deferreds aren't unexposed, and instead only the generator
> stuff is exposed, then bridging the gap between callbacks and
> generator-coroutines is impossible. So every callback function has to
> also be defined to use something else. And worse, other APIs using
> callbacks are left in the dust.

My plan is to expose the Futures *will* be exposed -- this is what
worked well in NDB.

> Suppose, OTOH, futures/deferreds are exposed. Then we can easily
> bridge between callbacks and generators, by returning a future whose
> `set_result` is the callback to our callback function (deferred whose
> `callback` is the callback).

And that's how NDB does it. I've got a question to Greg Ewing on how he does it.

> But if we're exposing futures/deferreds, why have callbacks in the
> first place? The difference between these two functions, is that the
> second can be used in generator-coroutines trivially and the first
> cannot:
>
>     # callbacks:
>     reactor.timer(10, print, "hello world")
>
>     # deferreds
>     reactor.timer(10).addCallback(print, "hello world")

How about this:

  f = <some future>
  reactor.timer(10, f.set_result, None)

Then whoever waits for f gets woken up in 10 seconds, and the reactor
doesn't have to know what Futures are.

But I believe your whole argument may be based on a misreading of my
proposal. *I* want plain callbacks, Futures, and coroutines, and an
event loop that only knows about plain callbacks and IO objects (e.g.
sockets).

> Now here's another thing: suppose we have a list of "deferred events",
> but instead of handling all 10 at once, we want to handle them "as
> they arrive", and then synthesize a result at the bottom. How do you
> do this with pure generator coroutines?

Let's ask Greg that.

In NDB, I have a wait_any() function that you give a set of Futures
and returns the first one that completes. It would be easy to build an
iterator on top of this that takes a set of Futures and iterates over
them in the order in which they are completed.

> For example, perhaps I am implementing a game server, where all the
> players choose their characters and then the game begins. Whenever a
> character is chosen, everyone else has to know about it so that they
> can plan their strategy based on who has chosen a character. Character
> selections are final, just so that I can use deferreds (hee hee).
>
> I am imagining something like the following:
>
>     # WRONG: handles players in a certain order, rather than as they come in
>     def player_lobby(reactor, players):
>         for player in players:
>             player_character = yield player.wait_for_confirm(reactor)
>             player.set_character(player_character)
>             # tell all the other players what character the player has chosen
>             notify_choice((player, player_character), players)
>
>         start_game(players)
>
> This is wrong, because it goes in a certain order and "blocks" the
> coroutine until every character is chosen. Players will not know who
> has chosen what characters in an appropriate order.
>
> But hypothetically, maybe we could do the following:
>
>     # Hypothetical magical code?
>     def player_lobby(reactor, players):
>         confirmation_events =
> UnorderedEventList([player.wait_for_confirm(reactor) for player in
> players])
>         while confirmation_events:
>             player_character = yield confirmation_events.get_next()
>             player.set_character(player_character)
>             # tell all the other players what character the player has chosen
>             notify_choice((player, player_character), players)
>
>         start_game(players)
>
> But then, how do we write UnorderedEventList? I don't really know. I
> suspect I've made the problem harder, not easier! eek. Plus, it
> doesn't even read very well. Especially not compared to the deferred
> version:
>
> This is how I would personally do it in Twisted, without using
> UnorderedEventList (no magic!):
>
>     @inlineCallbacks
>     def player_lobby(reactor, players):
>         events = []
>         for player in players:
>             confirm_event = player.wait_for_confirm(reactor)
>             @confirm_event.addCallback
>             def on_confirmation(player_character, player=player)
>                 player.set_character(player_character)
>                 # tell all the other players what character the player has chosen
>                 notify_choice((player, player_character), players)
>
>         yield gatherResults(events)
>         start_game(players)
>
> Notice how I dropped down into the level of manipulating deferreds so
> that I could add this "as they come in" functionality, and then went
> back. Actually it wouldn't've hurt much to just not bother with
> inlineCallbacks at all.
>
> I don't think this is particularly unreadable. More importantly, I
> actually know how to do it. I have no idea how I would do this without
> using addCallback, or without reimplementing addCallback using
> inlineCallbacks.

Clearly we have an educational issue on our hands! :-)

> And then, supposing we don't have these deferreds/futures exposed...
> how do we implement delayed computation stuff from extension modules?
> What if we want to do these kinds of compositions within said
> extension modules? What if we want to write our own version of @tasks
> or @inlineCallbacks with extra features, or generate callback chains
> from XML files, and so on?
>
> I don't really like the prospect of having just the "sugary syntax"
> available, without a flexible underlying representation also exposed.
> I don't know if you've ever shared that worry -- sometimes the pretty
> syntax gets in the way of getting stuff done.

You're barking up the wrong tree -- please badger Greg Ewing with use
cases in the yield-from thread. With my approach all of these can be
done. (See the yield-from thread for an example I just posted of a
barrier, where multiple tasks wait for a single event.)

>> I hope that the path forward for Twisted will be simple enough: it
>> should be possible to hook Deferred into the simpler callback APIs
>> (perhaps a new implementation using some form of adaptation, but
>> keeping the interface the same). In a sense, the greenlet/gevent crowd
>> will be the biggest losers, since they currently write async code
>> without either callbacks or yield, using microthreads instead. I
>> wouldn't want to have to start putting yield back everywhere into that
>> code. But the stdlib will still support yield-free blocking calls
>> (even if under the hood some of these use yield/send-based or
>> yield-from-based couroutines) so the monkey-patchey tradition can
>> continue.
>
> Surely it's no harder to make yourself into a generator than to make
> yourself into a low-level thread-like context switching function with
> a saved callstack implemented by hand in assembler, and so on?
>
> I'm sure they'll be fine.

The thing that worries me most is reimplementing httplib, urllib and
so on to use all this new machinery *and* keep the old synchronous
APIs working *even* if some code is written using the old style and
some other code wants to use the new style.

>>> 1. Explicit callbacks:
>>>
>>>     For example, reactor.callLater(t, lambda: print("woo hoo"))
>>
>> I actually like this, as it's a lowest-common-denominator approach
>> which everyone can easily adapt to their purposes. See the thread I
>> started about reactors.
>
> Will do (but also see my response above about why not "everyone" can).
>
>>> 2. Method dispatch callbacks:
>>>
>>>     Similar to the above, the reactor or somebody has a handle on your
>>> object, and calls methods that you've defined when events happen
>>>     e.g. IProtocol's dataReceived method
>>
>> While I'm sure it's expedient and captures certain common patterns
>> well, I like this the least of all -- calling fixed methods on an
>> object sounds like a step back; it smells of the old Java way (before
>> it had some equivalent of anonymous functions), and of asyncore, which
>> (nearly) everybody agrees is kind of bad due to its insistence that
>> you subclass its classes. (Notice how subclassing as the prevalent
>> approach to structuring your code has gotten into a lot of discredit
>> since 1996.)
>
> I only used asyncore once, indirectly, so I don't know anything about
> it. I'm willing to dismiss it (and, in fact, various parts of twisted
> (I'm looking at you twisted.words)) as not good examples of the
> pattern.
>
> First of all, I'd like to separate the notion of subclassing and
> method dispatch. They're entirely unrelated. If I pass my object to
> you, and you call different methods depending on what happens
> elsewhere, that's method dispatch. And my object doesn't have to be
> subclassed or anything for it to happen.

Agreed. Antoine made the same point elsewhere and I half conceded.

> Now here's the thing. Suppose we're writing, for example, an IRC bot.
> (Everyone loves IRC bots.)

(For the record, I hate IRC, the software, the culture, the
interaction style. But maybe I'm unusual that way. :-)

> My IRC bot needs to handle several
> different possible events, such as:
>
>     private messages
>     channel join event
>     CTCP event
>
> and so on. My event handlers for each of these events probably
> manipulate some internal state (such as a log file, or a GUI). We'd
> probably organize this as a class, or else as a bunch of functions
> accessing global state. Or, perhaps a collection of closures. This
> last one is pretty unlikely.

I certainly wouldn't recommend collections of closures for that!

> For the most part, these functions are all intrinsically related and
> can't be sensibly treated separately. You can't take the private
> message callback of Bot A, and the channel join callback of bot B, and
> register these and expect a result that makes sense.
>
> If we look at this, we're expecting to deal with a set of functions
> that manage shared data. The abstraction for this is usually an
> object, and we'd really probably write the callbacks in a class unless
> we were being contrarian. And it's not too crazy for the dispatcher to
> know this and expect you to write it as a class that supports a
> certain interface (certain methods correspond to certain events).
> Missing methods can be assumed to have the empty implementation (no
> subclassing, just catching AttributeError).
>
> This isn't too much of an imposition on the user -- any collection of
> functions (with shared state via globals or closure variables) can be
> converted to an object with callable attributes very simply (thanks to
> types.SimpleNamespace, especially). And I only really think this is OK
> when writing it as an object -- as a collection of functions with
> shared state -- is the eminently obvious primary use case, so that
> that situation wouldn't come up very often.
>
> So, as an example, a protocol that passes data on further down the
> line needs to be notified when data is received, but also when the
> connection begins and ends. So the twisted protocol interface has
> "dataReceived", "connectionMade", and "connectionLost" callbacks.
> These really do belong together, they manage a single connection
> between computers and how it gets mapped to events usable by a twisted
> application. So I like the convenience and suggestiveness of them all
> being methods on an object.

There's also a certain order to them, right? I'd think the state
transition diagram is something like

  connectionMade (1); dataReceived (*); connectionLost (1)

I wonder if there are any guarantees that they will only be called in
this order, and who is supposed to enforce this? If would be awkward
if the user code would have to guard itself against this; also if the
developer made an unwarranted assumption (e.g. dataReceived is called
at least once).

>>> 4. Generator coroutines
>>>
>>>     These are a syntactic wrapper around deferreds. If you yield a
>>> deferred, you will be sent the result if the deferred succeeds, or an
>>> exception if the deferred fails.
>>>     e.g. examples from previous message
>>
>> Seeing them as syntactic sugar for Deferreds is one way of looking at
>> it; no doubt this is how they're seen in the Twisted community because
>> Deferreds are older and more entrenched. But there's no requirement
>> that an architecture has to have Deferreds in order to use generator
>> coroutines -- simple Futures will do just fine, and Greg Ewing has
>> shown that using yield-from you can even do without those. (But he
>> does use simple, explicit callbacks at the lowest level of his
>> system.)
>
> I meant it as a factual explanation of what generator coroutines are
> in Twisted, not what they are in general. Sorry for the confusion. We
> are probably agreed here.
>
> After a cursory examination, I don't really understand Greg Ewing's
> thing. I'd have to dig deeper into the logs for when he first
> introduced it.

Please press him for explanations. Ask questions. He knows his dream
best of all. We need to learn.

>> I'd like to come back to that Django example though. You are implying
>> that there are some opportunities for concurrency here, and I agree,
>> assuming we believe disk I/O is slow enough to bother making it
>> asynchronously. (In App Engine it's not, and we can't anyways, but in
>> other contexts I agree that it would be bad if a slow disk seek were
>> to hold up all processing -- not to mention that it might really be
>> NFS...)
>>
> --snip--
>> How would you code that using Twisted Deferreds?
>
> Well. I'd replace the @task in your NDB thing with @inlineCallbacks
> and call it a day. ;)
>
> (I think there's enough deferred examples above, and I'm getting tired
> and it's been a day since I started writing this damned email.)

No problem. Same here. :-)

>>> For that stuff, you'd have to speak to the main authors of Twisted.
>>> I'm just a twisted user. :(
>>
>> They seem to be mostly ignoring this conversation, so your standing in
>> as a proxy for them is much appreciated!
>
> Well. We are on Python-Ideas... :(

Somehow we got Itamar and Glyph to join, so I think we're covered!

>>> In the end it really doesn't matter what API you go with. The Twisted
>>> people will wrap it up so that they are compatible, as far as that is
>>> possible.
>>
>> And I want to ensure that that is possible and preferably easy, if I
>> can do it without introducing too many warts in the API that
>> non-Twisted users see and use.
>
> I probably lack the expertise to help too much with this. I can point
> out anything that sticks out, if/when an extended futures proposal is
> made.

You've done great in increasing my understanding of Twisted and
Deferred. Thank you very much!

-- 
--Guido van Rossum (python.org/~guido)