[Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures)

Tue Oct 16 03:51:26 CEST 2012

On Mon, Oct 15, 2012 at 11:08 AM, Glyph <glyph at twistedmatrix.com> wrote:

> Still working my way through zillions of messages on this thread, trying
> to find things worth responding to, I found this, from Guido:
>
> [Generators are] more flexible [than Deferreds], since it is easier to
> catch different exceptions at different points (...) In the past, when I
> pointed this out to Twisted aficionados, the responses usually were a mix
> of "sure, if you like that style, we got it covered, Twisted has
> inlineCallbacks," and "but that only works for the simple cases, for the
> real stuff you still need Deferreds." But that really sounds to me like
> Twisted people just liking what they've got and not wanting to change.
>
>
> If you were actually paying attention, we did explain what "the real
> stuff" is, and why you can't do it with inlineCallbacks. ;-)
>

An yet the rest of your email could be paraphrased by those two quoted
phrases. :-) But seriously, thanks for repeating the explanation for my
benefit.

> (Or perhaps I should say, why we prefer to do it with Deferreds
> explicitly.)
>
> Managing parallelism is easy with the when-this-then-that idiom of
> Deferreds, but challenging with the sequential this-then-this-then-this
> idiom of generators.  The examples in the quoted message were all
> sequential workflows, which are roughly equivalent in both styles.  As soon
> as a for loop gets involved though, yield-based coroutines have a harder
> time expressing the kind of parallelism that a lot of applications *should
> * use, so it's easy to become accidentally sequential (and therefore less
> responsive) even if you don't need to be.  For example, using some
> hypothetical generator coroutine library, the idiomatic expression of a
> loop across several request/responses would be something like this:
>
> @yield_coroutine
> def something_async():
>     values = yield step1()
>     results = set()
>     for value in values:
>         results.add(step3((yield step2(value))))
>     return_(results)
>
>
> Since it's in a set, the order of 'results' doesn't actually matter; but
> this code needs to sit and wait for each result to come back in order; it
> can't perform any processing on the ones that are already ready while it's
> waiting.  You express this with Deferreds:
>
> def something_deferred():
>     return step1().addCallback(
>         lambda values: gatherResults([step2(value).addCallback(step3)
>                                       for value in
> values])).addCallback(set)
>
>
> In addition to being a roughly equivalent amount of code (fewer lines, but
> denser), that will run step2() and step3() on demand, as results are ready
> from the set of Deferreds from step1.  That means that your program will
> automatically spread out its computation, which makes better use of time as
> results may be arriving in any order.
>
> The problem is that it is difficult to express laziness with generator
> coroutines: you've already spent the generator-ness on the function on
> responding to events, so there's no longer any syntactic support for
> laziness.
>

I see your example as a perfect motivation for adding some kind of map()
primitive. In NDB there is one for the specific case of mapping over query
results (common in NDB because it's primarily a database client). That
map() primitive takes a callback that is either a plain function or a
tasklet (i.e. something returning a Future). map() itself is also async
(returning a Future) and all the tasklets results are waited for and
collected only when you wait for the map(). It also handles the input
arriving in batches (as they do for App Engine Datastore queries). IOW it
exploits all available parallelism. While the public API is tailored for
queries, the underlying mechanism can support a few different ways of
collecting the results, supporting filter() and even reduce() (!) in
addition to map(); and most of the code is reusable for other (non-query)
contexts. I feel it would be possible to extend it to support "stop after
the first N results" and "stop when this predicate says so" too.

In general, whenever you want parallelism in Python, you have to introduce
a new function, unless you happen to have a suitable function lying around
already; so I don't feel I am contradicting myself by proposing a mechanism
using callbacks here. It's the callbacks for sequencing that I dislike.

> (There's another problem where sometimes you can determine that work needs
> to be done as it arrives; that's an even trickier abstraction than
> Deferreds though and I'm still working on it. I think I've mentioned <
> http://tm.tl/1956> already in one of my previous posts.)
>

NDB's map() does this.

> Also, this is not at all a hypothetical or academic example.  This pattern
> comes up all the time in e.g. web-spidering and chat applications.
>

Of course. In App Engine, fetching multiple URLs in parallel is the
hello-world of async operations.

> To be fair, you *could* express this in a generator-coroutine library
> like this:
>
> @yield_coroutine
>
> def something_async():
>
>     values = yield step1()
>
>     thunks = []
>
>     @yield_coroutine
>
>     def do_steps(value):
>
>         return_(step3((yield step2(value))))
>
>     for value in values:
>
>         thunks.append(do_steps(value))
>
>     return_(set((yield multi_wait(thunks))))
>
>
> but that seems bizarre and not very idiomatic; to me, it looks like the
> confusing aspects of both styles.
>

Yeah, you need a map() operation:

@yield_coroutine
def something_async():
  values = yield step1()
  @yield_coroutine
  def do_steps(value):
    return step3((yield step2(value)))
  return set(yield map_async(do_steps, values))

Or maybe map_async()'s Future's result should be a set?

> David Reid also wrote up some examples of how Deferreds can express
> sequential workflows more nicely as well (also indirectly as a response to
> Guido!) on his blog, here: <
> http://dreid.org/2012/03/30/deferreds-are-a-dataflow-abstraction>.
>
> Which I understand -- I don't want to change either. But I also observe
> that a lot of people find bare Twisted-with-Deferreds too hard to grok, so
> they use Tornado instead, or they build a layer on top of either (like
> Monocle),
>
>
> inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle.
>  That's not to say Monocle has no value; it is a portability layer between
> Twisted and Tornado that does the same thing inlineCallbacks does but
> allows you to do it even if you're not using Deferreds, which will surely
> be useful to some people.
>
> I don't want to belabor this point, but it bugs me a little bit that we
> get so much feedback from the broader Python community along the lines of
> "Why doesn't Twisted do X?
>

I don't think I quite said that. But I suspect it happens because Twisted
is hard to get into. I suspect anything using higher-order functions this
much has that problem; I feel this way about Haskell's Monads. I wouldn't
be surprised if many Twisted lovers are also closet (or not) Haskell lovers.

> I'd use it if it did X, but it's all weird and I don't understand Y that
> it forces me to do instead, that's why I use Z" when, in fact:
>
>
>    1. Twisted does do X
>    2. It's done X for years
>    3. It actually invented X in the first place
>    4. There are legitimate reasons why we (Twisted core developers)
>    suggest and prefer Y for many cases, but you don't need to do it if you
>    don't want to follow our advice
>    5. Thing Z that is being cited as doing X actually explicitly mentions
>    Twisted as an inspiration for its implementation of X
>
>
> It's fair, of course, to complain that we haven't explained this very
> well, and I'll cop to that unless I can immediately respond with a
> pre-existing URL that explains things :).
>
> One other comment that's probably worth responding to:
>
> I suppose on systems that support both networking and GUI events, in my
> design these would use different I/O objects (created using different
> platform-specific factories) and the shared reactor API would sort things
> out based on the type of I/O object passed in to it.
>
>
> In my opinion, it is a mistake to try to harmonize or unify all GUI event
> systems, unless you are also harmonizing the GUI itself (i.e. writing a
> totally portable GUI toolkit that does everything).  And I think we can all
> agree that writing a totally portable GUI toolkit is an impossibly huge
> task that is out of scope for this (or, really, any other) discussion.  GUI
> systems can already dispatch its event to user code just fine - interposing
> a Python reactor API between the GUI and the event registration adds
> additional unnecessary work, and may not even be possible in some cases.
>  See, for example, the way that Xcode (formerly Interface Builder) and the
> Glade interface designer use: the name of the event handler is registered
> inside a somewhat opaque blob, which is data and not code, and then hooked
> up automatically at runtime based on reflection.  The code itself never
> calls any event-registration APIs.
>
> Also, modeling all GUI interaction as a request/response conversation is
> limiting and leads to bad UI conventions.  Consider: the UI element that
> most readily corresponds to a request/response is a modal dialog box.  Does
> anyone out there really like applications that consist mainly of popping up
> dialog after dialog to prompt you for the answers to questions?
>

I don't feel very strongly about integrating GUI systems. IIRC Twisted has
some way to integrate with certain GUI event loops. I don't think we should
desire any more (but neither, less).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121015/213f5a68/attachment.html>