New subject: The async API of the future: Twisted and Deferreds

Oct. 12, 2012

      [This is the third spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre
<jeanpierreda@gmail.com> wrote:
...
On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido@python.org> wrote:
...
On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
<jeanpierreda@gmail.com> wrote:
...
Could you be more specific? I've never heard Deferreds in particular
called "arcane". They're very popular in e.g. the JS world,
Really? Twisted is used in the JS world? Or do you just mean the
pervasiveness of callback style async programming?
Ah, I mean Deferreds. I attended a talk earlier this year all about
deferreds in JS, and not a single reference to Python or Twisted was
made!
These are the examples I remember mentioned in the talk:
- http://api.jquery.com/category/deferred-object/ (not very twistedish
at all, ill-liked by the speaker)
- http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe
not a good example, mochikit tries to be "python in JS")
- http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html
- https://github.com/kriskowal/q (also includes an explanation of why
the author likes deferreds)
There were a few more that the speaker mentioned, but didn't cover.
One of his points was that the various systems of deferreds are subtly
different, some very badly so, and that it was a mess, but that
deferreds were still awesome. JS is a language where async programming
is mainstream, so lots of people try to make it easier, and they all
do it slightly differently.
Thanks for those links. I followed the kriskowal/q link and was
reminded of why Twisted's Deferreds are considered more awesome than
Futures: it's the chaining.

BUT... That's only important if callbacks are all the language lets
you do! If your baseline is this:

step1(function (value1) {
    step2(value1, function(value2) {
        step3(value2, function(value3) {
            step4(value3, function(value4) {
                // Do something with value4
            });
        });
    });
});

then of course the alternative using Deferred looks better:

Q.fcall(step1)
.then(step2)
.then(step3)
.then(step4)
.then(function (value4) {
    // Do something with value4
}, function (error) {
    // Handle any error from step1 through step4
})
.end();

(Both quoted literally from the kriskowal/q link.)

I also don't doubt that using classic Futures you can't do this -- the
chaining really matter for this style, and I presume this (modulo
unimportant API differences) is what typical Twisted code looks like.

However, Python has yield, and you can do much better (I'll write
plain yield for now, but it works the same with yield-from):

try:
  value1 = yield step1(<args>)
  value2 = yield step2(value1)
  value3 = yield step3(value2)
  # Do something with value4
except Exception:
  # Handle any error from step1 through step4

There's an outer function missing here, since you can't have a
toplevel yield; I think that's the same for the JS case, typically.
Also, strictly speaking the "Do something with value4" code should
probably be in an else: clause after the except handler. But that
actually leads nicely to the advantage:

This form is more flexible, since it is easier to catch different
exceptions at different points. It is also much easier to pass extra
information around. E.g. what if your flow ends up having to pass both
value1 and value2 into step3()? Sure, you can do that by making value2
a tuple (or a dict, or an object) incorporating value1 and the
original value2, but that's exactly where this style becomes
cumbersome, whereas in the yield-based form, such things can remain
simple local variables. All in all I find it more readable.

In the past, when I pointed this out to Twisted aficionados, the
responses usually were a mix of "sure, if you like that style, we got
it covered, Twisted has inlineCallbacks," and "but that only works for
the simple cases, for the real stuff you still need Deferreds." But
that really sounds to me like Twisted people just liking what they've
got and not wanting to change. Which I understand -- I don't want to
change either. But I also observe that a lot of people find bare
Twisted-with-Deferreds too hard to grok, so they use Tornado instead,
or they build a layer on top of either (like Monocle), or they go a
completely different route and use greenlets/gevent instead -- and get
amazing performance and productivity that way too, even though they
know it's monkey-patching their asses off...

So, in the end, for Python 3.4 and beyond, I want to promote a style
that mixes simple callbacks (perhaps augmented with simple Futures)
and generator-based coroutines (either PEP 342, yield/send-based, or
PEP 380 yield-from-based). I'm looking to Twisted for the best
reactors (see other thread). But for transport/protocol
implementations I think that generator/coroutines offers a cleaner,
better interface than incorporating Deferred.

I hope that the path forward for Twisted will be simple enough: it
should be possible to hook Deferred into the simpler callback APIs
(perhaps a new implementation using some form of adaptation, but
keeping the interface the same). In a sense, the greenlet/gevent crowd
will be the biggest losers, since they currently write async code
without either callbacks or yield, using microthreads instead. I
wouldn't want to have to start putting yield back everywhere into that
code. But the stdlib will still support yield-free blocking calls
(even if under the hood some of these use yield/send-based or
yield-from-based couroutines) so the monkey-patchey tradition can
continue.
...
...
That's one of the
things I am desperately trying to keep out of Python, I find that
style unreadable and unmanageable (whenever I click on a button in a
website and nothing happens I know someone has a bug in their
callbacks). I understand you feel different; but I feel the general
sentiment is that callback-based async programming is even harder than
multi-threaded programming (and nobody is claiming that threads are
easy :-).
:S
There are (at least?) four different styles of asynchronous
computation used in Twisted, and you seem to be confused as to which
ones I'm talking about.
1. Explicit callbacks:
For example, reactor.callLater(t, lambda: print("woo hoo"))
I actually like this, as it's a lowest-common-denominator approach
which everyone can easily adapt to their purposes. See the thread I
started about reactors.
...
2. Method dispatch callbacks:
Similar to the above, the reactor or somebody has a handle on your
object, and calls methods that you've defined when events happen
    e.g. IProtocol's dataReceived method
While I'm sure it's expedient and captures certain common patterns
well, I like this the least of all -- calling fixed methods on an
object sounds like a step back; it smells of the old Java way (before
it had some equivalent of anonymous functions), and of asyncore, which
(nearly) everybody agrees is kind of bad due to its insistence that
you subclass its classes. (Notice how subclassing as the prevalent
approach to structuring your code has gotten into a lot of discredit
since 1996.)
...
3. Deferred callbacks:
When you ask for something to be done, it's set up, and you get an
object back, which you can add a pipeline of callbacks to that will be
called whenever whatever happens
    e.g. twisted.internet.threads.deferToThread(print,
"x").addCallback(print, "x was printed in some other thread!")
Discussed above.
...
4. Generator coroutines
These are a syntactic wrapper around deferreds. If you yield a
deferred, you will be sent the result if the deferred succeeds, or an
exception if the deferred fails.
    e.g. examples from previous message
Seeing them as syntactic sugar for Deferreds is one way of looking at
it; no doubt this is how they're seen in the Twisted community because
Deferreds are older and more entrenched. But there's no requirement
that an architecture has to have Deferreds in order to use generator
coroutines -- simple Futures will do just fine, and Greg Ewing has
shown that using yield-from you can even do without those. (But he
does use simple, explicit callbacks at the lowest level of his
system.)
...
I don't see a reason for the first to exist at all, the second one is
kind of nice in some circumstances (see below), but perhaps overused.
I feel like you're railing on the first and second when I'm talking
about the third and fourth. I could be wrong.
I think you're wrong -- I was (and am) most concerned about the
perceived complexity of the API offered by, and the typical looks of
code using, Deferreds (i.e., #3).
...
...
...
and possibly elsewhere. Moreover, they're extremely similar to futures, so
if one is arcane so is the other.
I love Futures, they represent a nice simple programming model. But I
especially love that you can write async code using Futures and
yield-based coroutines (what you call inlineCallbacks) and never have
to write an explicit callback function. Ever.
The reason explicit non-deferred callbacks are involved in Twisted is
because of situations in which deferreds are not present, because of
past history in Twisted. It is not at all a limitation of deferreds or
something futures are better at, best as I'm aware.
(In case that's what you're getting at.)
I don't think I was. It's clear to me (now) that Futures are simpler
than Deferreds -- and I like Futures better because of it, because for
the complex cases I would much rather use generator coroutines than
Deferreds.
...
Anyway, one big issue is that generator coroutines can't really
effectively replace callbacks everywhere. Consider the GUI button
example you gave. How do you write that as a coroutine?
I can see it being written like this:
def mycoroutine(gui):
        while True:
            clickevent = yield gui.mybutton1.on_click()
            # handle clickevent
But that's probably worse than using callbacks.
I touched on this briefly in the reactor thread. Basically, GUI
callbacks are often level-triggered rather than edge-triggered, and
IIUC Deferreds are not great for that either; and in a few cases where
edge-triggered coding makes sense I *would* like to use a generator
coroutine.
...
...
...
Neither is clearly better or more obvious than the other. If anything
I generally find deferred composition more useful than deferred
tee-ing, so I feel like composition is the correct base operator, but
you could pick another.
If you're writing long complicated chains of callbacks that benefit
from these features, IMO you are already doing it wrong. I understand
that this is a matter of style where I won't be able to convince you.
But style is important to me, so let's agree to disagree.
[In a follow-up to yourself, you quoted starting from this point and
appended "Nevermind that whole segment." I'm keeping it in here just
for context of the thread.]
...
This is more than a matter of style, so at least for now I'd like to
hold off on calling it even.
In my day to day silly, synchronous, python code, I do lots of
synchronous requests. For example, it's not unreasonable for me to
want to load two different files from disk, or make several database
interactions, etc. If I want to make this asynchronous, I have to find
a way to execute multiple things that could hypothetically block, at
the same time. If I can't do that easily, then the asynchronous
solution has failed, because its entire purpose is to do everything
that I do synchronously, except without blocking the main thread.
Here's an example with lots of synchronous requests in Django:
def view_paste(request, filekey):
    try:
        fileinfo= Pastes.objects.get(key=filekey)
    except DoesNotExist:
        t = loader.get_template('pastebin/error.html')
        return HttpResponse(t.render(Context(dict(error='File does not exist'))))
f = open(fileinfo.filename)
    fcontents = f.read()
    t = loader.get_template('pastebin/paste.html')
    return HttpResponse(t.render(Context(dict(file=fcontents))))
How many blocking requests are there? Lots. This is, in a word, a
long, complicated chain of synchronous requests. This is also very
similar to what actual django code might look like in some
circumstances. Even if we might think this is unreasonable, some
subset of alteration of this is reasonable. Certainly we should be
able to, say, load multiple (!) objects from the database, and open
the template (possibly from disk), all potentially-blocking
operations.
This is inherently a long, complicated chain of requests, whether we
implement it asynchronously or synchronously, or use Deferreds or
Futures, or write it in Java or Python. Some parts can be done at any
time before the end (loader.get_template(...)), some need to be done
in a certain order, and there's branching depending on what happens in
different cases. In order to even write this code _at all_, we need a
way to chain these IO actions together. If we can't chain them
together, we can't produce that final synthesis of results at the end.
[This is here you write "Ugh, just realized way after the fact that of
course you meant callbacks, not composition. I feel dumb. Nevermind
that whole segment."]

I'd like to come back to that Django example though. You are implying
that there are some opportunities for concurrency here, and I agree,
assuming we believe disk I/O is slow enough to bother making it
asynchronously. (In App Engine it's not, and we can't anyways, but in
other contexts I agree that it would be bad if a slow disk seek were
to hold up all processing -- not to mention that it might really be
NFS...)

The potentially async operations I see are:

(1) fileinfo = Pastes.objects.get(key=filekey)  # I assume this is
some kind of database query

(2) loader.get_template('pastebin/error.html')

(3) f = open(fileinfo.filename)  # depends on (1)

(4) fcontents = f.read()  # depends on (3)

(5) loader.get_template('pastebin/paste.html')

How would you code that using Twisted Deferreds?

Using Futures and generator coroutines, I would do it as follows. I'm
hypothesizing that for every blocking API foo() there is a
corresponding non-blocking API foo_async() with the same call
signature, and returning a Future whose result is what the synchronous
API returns (and raises what the synchronous call would raise, if
there's an error). These are the conventions I use in NDB. I'm also
inventing a @task decorator.

 @task
 def view_paste_async(request, filekey):
    # Create Futures -- no yields!
    f1 = Pastes.objects.get_async(key=filekey) # This won't raise
    f2 = loader.get_template_async('pastebin/error.html')
    f3 = loader.get_template_async('pastebin/paste.html')

    try:
        fileinfo= yield f1
    except DoesNotExist:
        t = yield f2
        return HttpResponse(t.render(Context(dict(error='File does not
exist'))))

    f = yield open_async(fileinfo.filename)
    fcontents = yield f.read_async()
    t = yield f3
    return HttpResponse(t.render(Context(dict(file=fcontents))))

You could easily decide not to bother loading the error template
asynchronously (assuming most requests don't fail), and you could move
the creation of f3 below the try/except. But you get the idea. Even if
you do everything serially, inserting the yields and _async calls
would make this more parallellizable without the use of threads. (If
you were using threads, all this would be moot of course -- but then
your limit on requests being handled concurrently probably goes way
down.)
...
We _need_ a pipeline or something computationally equivalent or more
powerful. Results from past "deferred computations" need to be passed
forward into future "deferred computations", in order to implement
this at all.
Yeah, and I think that a single generator using multiple yields is the
ideal pipeline to me (see my example near the top based on
kriskowal/q).
...
This is not a style issue, this is an issue of needing to be able to
solve problems that involve more than one computation where the
results of every computation matters somewhere. It's just that in this
case, some of the computations are computed asynchronously.
And I think generators do this very well.
...
...
I am totally open to learning from Twisted's experience. I hope that
you are willing to share even the end result might not look like
Twisted at all -- after all in Python 3.3 we have "yield from" and
return from a generator and many years of experience with different
styles of async APIs. In addition to Twisted, there's Tornado and
Monocle, and then there's the whole greenlets/gevent and
Stackless/microthreads community that we can't completely ignore. I
believe somewhere is an ideal async architecture, and I hope you can
help us discover it.
(For example, I am very interested in Twisted's experiences writing
real-world performant, robust reactors.)
For that stuff, you'd have to speak to the main authors of Twisted.
I'm just a twisted user. :(
They seem to be mostly ignoring this conversation, so your standing in
as a proxy for them is much appreciated!
...
In the end it really doesn't matter what API you go with. The Twisted
people will wrap it up so that they are compatible, as far as that is
possible.
And I want to ensure that that is possible and preferably easy, if I
can do it without introducing too many warts in the API that
non-Twisted users see and use.
...
I hope I haven't detracted too much from the main thrust of the
surrounding discussion. Futures/deferreds are a pretty big tangent, so
sorry. I justified it to myself by figuring that it'd probably come up
anyway, somehow, since these are useful abstractions for asynchronous
programming.
Not at all. This has been a valuable refresher for me!

-- 
--Guido van Rossum (python.org/~guido)

The async API of the future: Twisted and Deferreds

Shane Green

Shane Green

tags

participants (9)