[Python-ideas] The async API of the future: Twisted and Deferreds

Sun Oct 14 19:26:16 CEST 2012

On Sun, Oct 14, 2012 at 11:53 AM, Guido van Rossum <guido at python.org> wrote:
>> My experience has been unfortunately rather devoid of deferreds in
>> Twisted. I always feel like the odd one out when people discuss this
>> confusion. For me, it was all Protocol this and Protocol that, and
>> deferreds only came up when I used Twisted's great AMP (Asynchronous
>> Messaging Protocol) library.
>
> Especially odd since you jumped into the discussion when I called
> Deferreds a bad name. :-)

Did I mention how great AMP was? ;)

> I'm sorry, but that's not very readable at all. You needed a lambda
> (which if there was anything more would have to be expanded using
> 'def') and you're cheating by passing print as a callable (which saves
> you a second lambda, but only in this simple case).
>
> A readable version of this could should not have to use lambdas.

Sure. I probably erred in not using inlineCallbacks form, what I
wanted to do was highlight the gatherResults function (which, as it
happens, does something generators can't without invoking an external
function.)

My worry here was that generators are being praised for being more
readable, which is true and reasonable, but I don't know that they're
flexible enough to be the only way to do things. But you've stated now
that you'd want futures to be there too, so... those are probably
mostly flexible enough.

>> Egh. I mean, sure, supposed we have those things. But what if you want
>> to send the result of a callback to a generator-coroutine? Presumably
>> generator coroutines work by yielding deferreds and being called back
>> when the future resolves (deferred fires).
>
> No, they don't use deferreds. They use Futures. You've made it quite
> clear that they are very different.

Haha, different in API and what they can do, but they are meant to do
the same thing (represent delayed results). I meant to talk about
futures and deferreds equally, and ask the same questions of both of
them.

>> But if those
>> futures/deferreds aren't unexposed, and instead only the generator
>> stuff is exposed, then bridging the gap between callbacks and
>> generator-coroutines is impossible. So every callback function has to
>> also be defined to use something else. And worse, other APIs using
>> callbacks are left in the dust.
>
> My plan is to expose the Futures *will* be exposed -- this is what
> worked well in NDB.

OK. I was confused when you said there would only be generators and
simple callbacks (and so I posed questions about what happens when you
have just generators, which you took to be questions aimed at Greg
Ewing's thing.)

> How about this:
>
>   f = <some future>
>   reactor.timer(10, f.set_result, None)
>
> Then whoever waits for f gets woken up in 10 seconds, and the reactor
> doesn't have to know what Futures are.

I know that Twisted has historically agreed with the idea that the
reactor shouldn't know about futures/deferreds. I'm not sure I agree
it's so important. If the universal way of writing asynchronous code
is generator-coroutines, then the reactor should work well with this
and not require extra effort.

> But I believe your whole argument may be based on a misreading of my
> proposal. *I* want plain callbacks, Futures, and coroutines, and an
> event loop that only knows about plain callbacks and IO objects (e.g.
> sockets).

You're correct.

>> Now here's another thing: suppose we have a list of "deferred events",
>> but instead of handling all 10 at once, we want to handle them "as
>> they arrive", and then synthesize a result at the bottom. How do you
>> do this with pure generator coroutines?
>
> Let's ask Greg that.

I meant to be asking about the situation you were proposing. I thought
it was just callbacks and generators, now we've added futures. Futures
sans chaining can definitely implement this, just maybe not as nicely
as how I'd do it.

The issue is that it's a reasonable thing to want to escape the
generator system in order to implement things that aren't "linear" the
way generator coroutines are. And if we escape the system, it should
be possible and easy to do a large variety of things.

But, on the plus side, I'm convinced that it's possible, and that the
necessary things will be exposed (even if it's very unpleasant,
there's always helper functions...).

Unless you do Greg's thing, then I'm worried again. I will read his
stuff later today or tomorrow.

(Unrelated: I'm not sure why I was so sure UnorderedEventList had to
be that ugly. It can use a for loop... oops.)

> The thing that worries me most is reimplementing httplib, urllib and
> so on to use all this new machinery *and* keep the old synchronous
> APIs working *even* if some code is written using the old style and
> some other code wants to use the new style.

(We're now deviating from futures and deferreds, but I think the part
I was taking was drawing to a close anyway)

Code that wants to use the old style can be integrated by calling it
in a separate thread, and that's fine. If the results should be used
in the asynchronous code, then have a thing that integrates with
threading so that when the thread returns (or fails with an exception)
it can notify a future/deferred of the outcome. Twisted's has
deferToThread for this. It also has blockingCallFromThread if the
synchronous code wants to talk back to the asynchronous code. And that
leads me to this:

Imagine if, instead of having two implementations (one synchronous,
one not), we had only one (asynchronous), and then had some wrappers
to make it work as a synchronous implementation as well?

Here is an example of a synchronous program written in Python+Twisted,
where I wrap deferlater to be a blocking function (so that it is
similar to a time.sleep() followed by a function call).

The reactor is started in a separate thread, and is left to die
whenever the main thread dies (because thread daemons yay.)

    from __future__ import print_function
    import threading
    from twisted.internet import task, reactor
    from twisted.internet.threads import blockingCallFromThread

    def my_deferlater(reactor, time, callback, *args, **kwargs):
        return blockingCallFromThread(reactor,
            task.deferLater, reactor, time, callback, *args, **kwargs)

    # in reality, global reactor for all threads is terrible idea.
    # We'd want to instantiate a new reactor for
    # the reactor thread, and have a global REACTOR as well.
    # We'll just use this reactor.
    # This code will not work with any other twisted
    # code because of the global reactor shenanigans.

    # (But it'd work if we were able to have a reactor per thread.)

    REACTOR_THREAD = None

    def start_reactor():
        global REACTOR_THREAD
        if REACTOR_THREAD is not None:
            # could be an error, or not, depending on how you feel
this should be.
            return

        REACTOR_THREAD = threading.Thread(target=reactor.run,
            kwargs=dict(
                # signal handlers don't work if not in main thread.
                installSignalHandlers=0))
        REACTOR_THREAD.daemon = True # Probably really evil.
        REACTOR_THREAD.start()

    start_reactor()

    my_deferlater(reactor, 1, print, "This will print after 1 second!")
    my_deferlater(reactor, 1, print, "This will print after 2 seconds!")
    my_deferlater(reactor, 1, print, "This will print after 3 seconds!")

So maybe this is an option? It's really important that there not be
just one global reactor, and that multiple reactors can run at the
same time, for this to really work. But if that were done, then you
could have a single global reactor responsible for being the back end
of the new implementations of old synchronous APIs. Maybe it'd be
started whenever the first call is made to a synchronous function. And
maybe, to interoperate with some actual asynchronous code, you could
have a way to change which reactor acts as the global reactor for
synchronous APIs?

I did this once, because I needed to rewrite a blocking API and wanted
to use Twisted, except that I made the mistake of starting the thread
when the module was created instead of on first call. This lead to a
deadlock because of the global import lock... :(  In principle I don't
know why this would be a terrible awful idea, if it was done right,
but maybe people with more experiences with threaded code can correct
me.

(The whole thread daemon thing necessary to make it act like a
synchronous program, might be terribly insane and therefore an idea
killer. I'm not sure.)

I'm under the understanding that the global import lock won't cause
this particular issue anymore as of Python 3.3, so perhaps starting a
reactor on import is reasonable.

> There's also a certain order to them, right? I'd think the state
> transition diagram is something like
>
>   connectionMade (1); dataReceived (*); connectionLost (1)
>
> I wonder if there are any guarantees that they will only be called in
> this order, and who is supposed to enforce this? If would be awkward
> if the user code would have to guard itself against this; also if the
> developer made an unwarranted assumption (e.g. dataReceived is called
> at least once).

The docs in Twisted don't spell it out, but they do say that
connectionMade should be considered to be the initializer for the
connection, and that upon connectionLost the one should let the
protocol be garbage collected. So, that seems like a guarantee that
they are called in that order.

I don't think it can really be enforced in Python (unless you want to
do some jiggery pokery into model checking at runtime), but the
responsibility for this failing in Twisted would be on the transport,
as far as I understand it. If the transport calls back to the protocol
in some invalid combination, it's the transport's fault for being
broken.

This is something that should be clearly documented. (It's an issue,
also, regardless of whether or not a class is used to encapsulate the
callbacks, or whether they are registered individually.)

-- Devin