[Python-ideas] The async API of the future: Twisted and Deferreds

Sat Oct 13 00:11:54 CEST 2012

[This is the third spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre
<jeanpierreda at gmail.com> wrote:
> On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido at python.org> wrote:
>> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
>> <jeanpierreda at gmail.com> wrote:
>>> Could you be more specific? I've never heard Deferreds in particular
>>> called "arcane". They're very popular in e.g. the JS world,
>>
>> Really? Twisted is used in the JS world? Or do you just mean the
>> pervasiveness of callback style async programming?
>
> Ah, I mean Deferreds. I attended a talk earlier this year all about
> deferreds in JS, and not a single reference to Python or Twisted was
> made!
>
> These are the examples I remember mentioned in the talk:
>
> - http://api.jquery.com/category/deferred-object/ (not very twistedish
> at all, ill-liked by the speaker)
> - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe
> not a good example, mochikit tries to be "python in JS")
> - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html
> - https://github.com/kriskowal/q (also includes an explanation of why
> the author likes deferreds)
>
> There were a few more that the speaker mentioned, but didn't cover.
> One of his points was that the various systems of deferreds are subtly
> different, some very badly so, and that it was a mess, but that
> deferreds were still awesome. JS is a language where async programming
> is mainstream, so lots of people try to make it easier, and they all
> do it slightly differently.

Thanks for those links. I followed the kriskowal/q link and was
reminded of why Twisted's Deferreds are considered more awesome than
Futures: it's the chaining.

BUT... That's only important if callbacks are all the language lets
you do! If your baseline is this:

step1(function (value1) {
    step2(value1, function(value2) {
        step3(value2, function(value3) {
            step4(value3, function(value4) {
                // Do something with value4
            });
        });
    });
});

then of course the alternative using Deferred looks better:

Q.fcall(step1)
.then(step2)
.then(step3)
.then(step4)
.then(function (value4) {
    // Do something with value4
}, function (error) {
    // Handle any error from step1 through step4
})
.end();

(Both quoted literally from the kriskowal/q link.)

I also don't doubt that using classic Futures you can't do this -- the
chaining really matter for this style, and I presume this (modulo
unimportant API differences) is what typical Twisted code looks like.

However, Python has yield, and you can do much better (I'll write
plain yield for now, but it works the same with yield-from):

try:
  value1 = yield step1(<args>)
  value2 = yield step2(value1)
  value3 = yield step3(value2)
  # Do something with value4
except Exception:
  # Handle any error from step1 through step4

There's an outer function missing here, since you can't have a
toplevel yield; I think that's the same for the JS case, typically.
Also, strictly speaking the "Do something with value4" code should
probably be in an else: clause after the except handler. But that
actually leads nicely to the advantage:

This form is more flexible, since it is easier to catch different
exceptions at different points. It is also much easier to pass extra
information around. E.g. what if your flow ends up having to pass both
value1 and value2 into step3()? Sure, you can do that by making value2
a tuple (or a dict, or an object) incorporating value1 and the
original value2, but that's exactly where this style becomes
cumbersome, whereas in the yield-based form, such things can remain
simple local variables. All in all I find it more readable.

In the past, when I pointed this out to Twisted aficionados, the
responses usually were a mix of "sure, if you like that style, we got
it covered, Twisted has inlineCallbacks," and "but that only works for
the simple cases, for the real stuff you still need Deferreds." But
that really sounds to me like Twisted people just liking what they've
got and not wanting to change. Which I understand -- I don't want to
change either. But I also observe that a lot of people find bare
Twisted-with-Deferreds too hard to grok, so they use Tornado instead,
or they build a layer on top of either (like Monocle), or they go a
completely different route and use greenlets/gevent instead -- and get
amazing performance and productivity that way too, even though they
know it's monkey-patching their asses off...

So, in the end, for Python 3.4 and beyond, I want to promote a style
that mixes simple callbacks (perhaps augmented with simple Futures)
and generator-based coroutines (either PEP 342, yield/send-based, or
PEP 380 yield-from-based). I'm looking to Twisted for the best
reactors (see other thread). But for transport/protocol
implementations I think that generator/coroutines offers a cleaner,
better interface than incorporating Deferred.

I hope that the path forward for Twisted will be simple enough: it
should be possible to hook Deferred into the simpler callback APIs
(perhaps a new implementation using some form of adaptation, but
keeping the interface the same). In a sense, the greenlet/gevent crowd
will be the biggest losers, since they currently write async code
without either callbacks or yield, using microthreads instead. I
wouldn't want to have to start putting yield back everywhere into that
code. But the stdlib will still support yield-free blocking calls
(even if under the hood some of these use yield/send-based or
yield-from-based couroutines) so the monkey-patchey tradition can
continue.

>> That's one of the
>> things I am desperately trying to keep out of Python, I find that
>> style unreadable and unmanageable (whenever I click on a button in a
>> website and nothing happens I know someone has a bug in their
>> callbacks). I understand you feel different; but I feel the general
>> sentiment is that callback-based async programming is even harder than
>> multi-threaded programming (and nobody is claiming that threads are
>> easy :-).
>
> :S
>
> There are (at least?) four different styles of asynchronous
> computation used in Twisted, and you seem to be confused as to which
> ones I'm talking about.
>
> 1. Explicit callbacks:
>
>     For example, reactor.callLater(t, lambda: print("woo hoo"))

I actually like this, as it's a lowest-common-denominator approach
which everyone can easily adapt to their purposes. See the thread I
started about reactors.

> 2. Method dispatch callbacks:
>
>     Similar to the above, the reactor or somebody has a handle on your
> object, and calls methods that you've defined when events happen
>     e.g. IProtocol's dataReceived method

While I'm sure it's expedient and captures certain common patterns
well, I like this the least of all -- calling fixed methods on an
object sounds like a step back; it smells of the old Java way (before
it had some equivalent of anonymous functions), and of asyncore, which
(nearly) everybody agrees is kind of bad due to its insistence that
you subclass its classes. (Notice how subclassing as the prevalent
approach to structuring your code has gotten into a lot of discredit
since 1996.)

> 3. Deferred callbacks:
>
>     When you ask for something to be done, it's set up, and you get an
> object back, which you can add a pipeline of callbacks to that will be
> called whenever whatever happens
>     e.g. twisted.internet.threads.deferToThread(print,
> "x").addCallback(print, "x was printed in some other thread!")

Discussed above.

> 4. Generator coroutines
>
>     These are a syntactic wrapper around deferreds. If you yield a
> deferred, you will be sent the result if the deferred succeeds, or an
> exception if the deferred fails.
>     e.g. examples from previous message

Seeing them as syntactic sugar for Deferreds is one way of looking at
it; no doubt this is how they're seen in the Twisted community because
Deferreds are older and more entrenched. But there's no requirement
that an architecture has to have Deferreds in order to use generator
coroutines -- simple Futures will do just fine, and Greg Ewing has
shown that using yield-from you can even do without those. (But he
does use simple, explicit callbacks at the lowest level of his
system.)

> I don't see a reason for the first to exist at all, the second one is
> kind of nice in some circumstances (see below), but perhaps overused.
>
> I feel like you're railing on the first and second when I'm talking
> about the third and fourth. I could be wrong.

I think you're wrong -- I was (and am) most concerned about the
perceived complexity of the API offered by, and the typical looks of
code using, Deferreds (i.e., #3).

>>> and possibly elsewhere. Moreover, they're extremely similar to futures, so
>>> if one is arcane so is the other.
>>
>> I love Futures, they represent a nice simple programming model. But I
>> especially love that you can write async code using Futures and
>> yield-based coroutines (what you call inlineCallbacks) and never have
>> to write an explicit callback function. Ever.
>
> The reason explicit non-deferred callbacks are involved in Twisted is
> because of situations in which deferreds are not present, because of
> past history in Twisted. It is not at all a limitation of deferreds or
> something futures are better at, best as I'm aware.
>
> (In case that's what you're getting at.)

I don't think I was. It's clear to me (now) that Futures are simpler
than Deferreds -- and I like Futures better because of it, because for
the complex cases I would much rather use generator coroutines than
Deferreds.

> Anyway, one big issue is that generator coroutines can't really
> effectively replace callbacks everywhere. Consider the GUI button
> example you gave. How do you write that as a coroutine?
>
> I can see it being written like this:
>
>     def mycoroutine(gui):
>         while True:
>             clickevent = yield gui.mybutton1.on_click()
>             # handle clickevent
>
> But that's probably worse than using callbacks.

I touched on this briefly in the reactor thread. Basically, GUI
callbacks are often level-triggered rather than edge-triggered, and
IIUC Deferreds are not great for that either; and in a few cases where
edge-triggered coding makes sense I *would* like to use a generator
coroutine.

>>> Neither is clearly better or more obvious than the other. If anything
>>> I generally find deferred composition more useful than deferred
>>> tee-ing, so I feel like composition is the correct base operator, but
>>> you could pick another.
>>
>> If you're writing long complicated chains of callbacks that benefit
>> from these features, IMO you are already doing it wrong. I understand
>> that this is a matter of style where I won't be able to convince you.
>> But style is important to me, so let's agree to disagree.

[In a follow-up to yourself, you quoted starting from this point and
appended "Nevermind that whole segment." I'm keeping it in here just
for context of the thread.]

> This is more than a matter of style, so at least for now I'd like to
> hold off on calling it even.
>
> In my day to day silly, synchronous, python code, I do lots of
> synchronous requests. For example, it's not unreasonable for me to
> want to load two different files from disk, or make several database
> interactions, etc. If I want to make this asynchronous, I have to find
> a way to execute multiple things that could hypothetically block, at
> the same time. If I can't do that easily, then the asynchronous
> solution has failed, because its entire purpose is to do everything
> that I do synchronously, except without blocking the main thread.
>
> Here's an example with lots of synchronous requests in Django:
>
> def view_paste(request, filekey):
>     try:
>         fileinfo= Pastes.objects.get(key=filekey)
>     except DoesNotExist:
>         t = loader.get_template('pastebin/error.html')
>         return HttpResponse(t.render(Context(dict(error='File does not exist'))))
>
>     f = open(fileinfo.filename)
>     fcontents = f.read()
>     t = loader.get_template('pastebin/paste.html')
>     return HttpResponse(t.render(Context(dict(file=fcontents))))
>
> How many blocking requests are there? Lots. This is, in a word, a
> long, complicated chain of synchronous requests. This is also very
> similar to what actual django code might look like in some
> circumstances. Even if we might think this is unreasonable, some
> subset of alteration of this is reasonable. Certainly we should be
> able to, say, load multiple (!) objects from the database, and open
> the template (possibly from disk), all potentially-blocking
> operations.
>
> This is inherently a long, complicated chain of requests, whether we
> implement it asynchronously or synchronously, or use Deferreds or
> Futures, or write it in Java or Python. Some parts can be done at any
> time before the end (loader.get_template(...)), some need to be done
> in a certain order, and there's branching depending on what happens in
> different cases. In order to even write this code _at all_, we need a
> way to chain these IO actions together. If we can't chain them
> together, we can't produce that final synthesis of results at the end.

[This is here you write "Ugh, just realized way after the fact that of
course you meant callbacks, not composition. I feel dumb. Nevermind
that whole segment."]

I'd like to come back to that Django example though. You are implying
that there are some opportunities for concurrency here, and I agree,
assuming we believe disk I/O is slow enough to bother making it
asynchronously. (In App Engine it's not, and we can't anyways, but in
other contexts I agree that it would be bad if a slow disk seek were
to hold up all processing -- not to mention that it might really be
NFS...)

The potentially async operations I see are:

(1) fileinfo = Pastes.objects.get(key=filekey)  # I assume this is
some kind of database query

(2) loader.get_template('pastebin/error.html')

(3) f = open(fileinfo.filename)  # depends on (1)

(4) fcontents = f.read()  # depends on (3)

(5) loader.get_template('pastebin/paste.html')

How would you code that using Twisted Deferreds?

Using Futures and generator coroutines, I would do it as follows. I'm
hypothesizing that for every blocking API foo() there is a
corresponding non-blocking API foo_async() with the same call
signature, and returning a Future whose result is what the synchronous
API returns (and raises what the synchronous call would raise, if
there's an error). These are the conventions I use in NDB. I'm also
inventing a @task decorator.

 @task
 def view_paste_async(request, filekey):
    # Create Futures -- no yields!
    f1 = Pastes.objects.get_async(key=filekey) # This won't raise
    f2 = loader.get_template_async('pastebin/error.html')
    f3 = loader.get_template_async('pastebin/paste.html')

    try:
        fileinfo= yield f1
    except DoesNotExist:
        t = yield f2
        return HttpResponse(t.render(Context(dict(error='File does not
exist'))))

    f = yield open_async(fileinfo.filename)
    fcontents = yield f.read_async()
    t = yield f3
    return HttpResponse(t.render(Context(dict(file=fcontents))))

You could easily decide not to bother loading the error template
asynchronously (assuming most requests don't fail), and you could move
the creation of f3 below the try/except. But you get the idea. Even if
you do everything serially, inserting the yields and _async calls
would make this more parallellizable without the use of threads. (If
you were using threads, all this would be moot of course -- but then
your limit on requests being handled concurrently probably goes way
down.)

> We _need_ a pipeline or something computationally equivalent or more
> powerful. Results from past "deferred computations" need to be passed
> forward into future "deferred computations", in order to implement
> this at all.

Yeah, and I think that a single generator using multiple yields is the
ideal pipeline to me (see my example near the top based on
kriskowal/q).

> This is not a style issue, this is an issue of needing to be able to
> solve problems that involve more than one computation where the
> results of every computation matters somewhere. It's just that in this
> case, some of the computations are computed asynchronously.

And I think generators do this very well.

>> I am totally open to learning from Twisted's experience. I hope that
>> you are willing to share even the end result might not look like
>> Twisted at all -- after all in Python 3.3 we have "yield from" and
>> return from a generator and many years of experience with different
>> styles of async APIs. In addition to Twisted, there's Tornado and
>> Monocle, and then there's the whole greenlets/gevent and
>> Stackless/microthreads community that we can't completely ignore. I
>> believe somewhere is an ideal async architecture, and I hope you can
>> help us discover it.
>>
>> (For example, I am very interested in Twisted's experiences writing
>> real-world performant, robust reactors.)
>
> For that stuff, you'd have to speak to the main authors of Twisted.
> I'm just a twisted user. :(

They seem to be mostly ignoring this conversation, so your standing in
as a proxy for them is much appreciated!

> In the end it really doesn't matter what API you go with. The Twisted
> people will wrap it up so that they are compatible, as far as that is
> possible.

And I want to ensure that that is possible and preferably easy, if I
can do it without introducing too many warts in the API that
non-Twisted users see and use.

> I hope I haven't detracted too much from the main thrust of the
> surrounding discussion. Futures/deferreds are a pretty big tangent, so
> sorry. I justified it to myself by figuring that it'd probably come up
> anyway, somehow, since these are useful abstractions for asynchronous
> programming.

Not at all. This has been a valuable refresher for me!

-- 
--Guido van Rossum (python.org/~guido)