[Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures)

Mon Oct 15 20:08:41 CEST 2012

Still working my way through zillions of messages on this thread, trying to find things worth responding to, I found this, from Guido:

> [Generators are] more flexible [than Deferreds], since it is easier to catch different exceptions at different points (...) In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change.

If you were actually paying attention, we did explain what "the real stuff" is, and why you can't do it with inlineCallbacks. ;-)

(Or perhaps I should say, why we prefer to do it with Deferreds explicitly.)

Managing parallelism is easy with the when-this-then-that idiom of Deferreds, but challenging with the sequential this-then-this-then-this idiom of generators.  The examples in the quoted message were all sequential workflows, which are roughly equivalent in both styles.  As soon as a for loop gets involved though, yield-based coroutines have a harder time expressing the kind of parallelism that a lot of applications should use, so it's easy to become accidentally sequential (and therefore less responsive) even if you don't need to be.  For example, using some hypothetical generator coroutine library, the idiomatic expression of a loop across several request/responses would be something like this:

@yield_coroutine
def something_async():
    values = yield step1()
    results = set()
    for value in values:
        results.add(step3((yield step2(value))))
    return_(results)

Since it's in a set, the order of 'results' doesn't actually matter; but this code needs to sit and wait for each result to come back in order; it can't perform any processing on the ones that are already ready while it's waiting.  You express this with Deferreds:

def something_deferred():
    return step1().addCallback(
        lambda values: gatherResults([step2(value).addCallback(step3)
                                      for value in values])).addCallback(set)

In addition to being a roughly equivalent amount of code (fewer lines, but denser), that will run step2() and step3() on demand, as results are ready from the set of Deferreds from step1.  That means that your program will automatically spread out its computation, which makes better use of time as results may be arriving in any order.

The problem is that it is difficult to express laziness with generator coroutines: you've already spent the generator-ness on the function on responding to events, so there's no longer any syntactic support for laziness.

(There's another problem where sometimes you can determine that work needs to be done as it arrives; that's an even trickier abstraction than Deferreds though and I'm still working on it. I think I've mentioned <http://tm.tl/1956> already in one of my previous posts.)

Also, this is not at all a hypothetical or academic example.  This pattern comes up all the time in e.g. web-spidering and chat applications.

To be fair, you could express this in a generator-coroutine library like this:

@yield_coroutine
def something_async():
    values = yield step1()
    thunks = []
    @yield_coroutine
    def do_steps(value):
        return_(step3((yield step2(value))))
    for value in values:
        thunks.append(do_steps(value))
    return_(set((yield multi_wait(thunks))))

but that seems bizarre and not very idiomatic; to me, it looks like the confusing aspects of both styles.

David Reid also wrote up some examples of how Deferreds can express sequential workflows more nicely as well (also indirectly as a response to Guido!) on his blog, here: <http://dreid.org/2012/03/30/deferreds-are-a-dataflow-abstraction>.

> Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle),

inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle.  That's not to say Monocle has no value; it is a portability layer between Twisted and Tornado that does the same thing inlineCallbacks does but allows you to do it even if you're not using Deferreds, which will surely be useful to some people.

I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X?  I'd use it if it did X, but it's all weird and I don't understand Y that it forces me to do instead, that's why I use Z" when, in fact:

Twisted does do X
It's done X for years
It actually invented X in the first place
There are legitimate reasons why we (Twisted core developers) suggest and prefer Y for many cases, but you don't need to do it if you don't want to follow our advice
Thing Z that is being cited as doing X actually explicitly mentions Twisted as an inspiration for its implementation of X

It's fair, of course, to complain that we haven't explained this very well, and I'll cop to that unless I can immediately respond with a pre-existing URL that explains things :).

One other comment that's probably worth responding to:

> I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it.

In my opinion, it is a mistake to try to harmonize or unify all GUI event systems, unless you are also harmonizing the GUI itself (i.e. writing a totally portable GUI toolkit that does everything).  And I think we can all agree that writing a totally portable GUI toolkit is an impossibly huge task that is out of scope for this (or, really, any other) discussion.  GUI systems can already dispatch its event to user code just fine - interposing a Python reactor API between the GUI and the event registration adds additional unnecessary work, and may not even be possible in some cases.  See, for example, the way that Xcode (formerly Interface Builder) and the Glade interface designer use: the name of the event handler is registered inside a somewhat opaque blob, which is data and not code, and then hooked up automatically at runtime based on reflection.  The code itself never calls any event-registration APIs.

Also, modeling all GUI interaction as a request/response conversation is limiting and leads to bad UI conventions.  Consider: the UI element that most readily corresponds to a request/response is a modal dialog box.  Does anyone out there really like applications that consist mainly of popping up dialog after dialog to prompt you for the answers to questions?

-g

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121015/4f50fd63/attachment.html>