On Mon, Oct 15, 2012 at 11:08 AM, Glyph
<glyph@twistedmatrix.com> wrote:
Still working my way through zillions of messages on this thread, trying to find things worth responding to, I found this, from Guido:
[Generators are] more flexible [than Deferreds], since it is easier to catch different exceptions at different points (...) In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change.
If you were actually paying attention, we did explain what "the real stuff" is, and why you can't do it with inlineCallbacks. ;-)
An yet the rest of your email could be paraphrased by those two quoted phrases. :-) But seriously, thanks for repeating the explanation for my benefit.
(Or perhaps I should say, why we prefer to do it with Deferreds explicitly.)
Managing parallelism is easy with the when-this-then-that idiom of Deferreds, but challenging with the sequential this-then-this-then-this idiom of generators. The examples in the quoted message were all sequential workflows, which are roughly equivalent in both styles. As soon as a for loop gets involved though, yield-based coroutines have a harder time expressing the kind of parallelism that a lot of applications should use, so it's easy to become accidentally sequential (and therefore less responsive) even if you don't need to be. For example, using some hypothetical generator coroutine library, the idiomatic expression of a loop across several request/responses would be something like this:
@yield_coroutine
def something_async():
values = yield step1()
results = set()
for value in values:
results.add(step3((yield step2(value))))
return_(results)
Since it's in a set, the order of 'results' doesn't actually matter; but this code needs to sit and wait for each result to come back in order; it can't perform any processing on the ones that are already ready while it's waiting. You express this with Deferreds:
def something_deferred():
return step1().addCallback(
lambda values: gatherResults([step2(value).addCallback(step3)
for value in values])).addCallback(set)
In addition to being a roughly equivalent amount of code (fewer lines, but denser), that will run step2() and step3() on demand, as results are ready from the set of Deferreds from step1. That means that your program will automatically spread out its computation, which makes better use of time as results may be arriving in any order.
The problem is that it is difficult to express laziness with generator coroutines: you've already spent the generator-ness on the function on responding to events, so there's no longer any syntactic support for laziness.
I see your example as a perfect motivation for adding some kind of map() primitive. In NDB there is one for the specific case of mapping over query results (common in NDB because it's primarily a database client). That map() primitive takes a callback that is either a plain function or a tasklet (i.e. something returning a Future). map() itself is also async (returning a Future) and all the tasklets results are waited for and collected only when you wait for the map(). It also handles the input arriving in batches (as they do for App Engine Datastore queries). IOW it exploits all available parallelism. While the public API is tailored for queries, the underlying mechanism can support a few different ways of collecting the results, supporting filter() and even reduce() (!) in addition to map(); and most of the code is reusable for other (non-query) contexts. I feel it would be possible to extend it to support "stop after the first N results" and "stop when this predicate says so" too.
In general, whenever you want parallelism in Python, you have to introduce a new function, unless you happen to have a suitable function lying around already; so I don't feel I am contradicting myself by proposing a mechanism using callbacks here. It's the callbacks for sequencing that I dislike.
(There's another problem where sometimes you can determine that work needs to be done as it arrives; that's an even trickier abstraction than Deferreds though and I'm still working on it. I think I've mentioned <
http://tm.tl/1956> already in one of my previous posts.)
NDB's map() does this.
Also, this is not at all a hypothetical or academic example. This pattern comes up all the time in e.g. web-spidering and chat applications.
Of course. In App Engine, fetching multiple URLs in parallel is the hello-world of async operations.
To be fair, you could express this in a generator-coroutine library like this:
@yield_coroutine
def something_async():
values = yield step1()
thunks = []
@yield_coroutine
def do_steps(value):
return_(step3((yield step2(value))))
for value in values:
thunks.append(do_steps(value))
return_(set((yield multi_wait(thunks))))
but that seems bizarre and not very idiomatic; to me, it looks like the confusing aspects of both styles.
Yeah, you need a map() operation:
@yield_coroutine
def something_async():
values = yield step1()
@yield_coroutine
def do_steps(value):
return step3((yield step2(value)))
return set(yield map_async(do_steps, values))
Or maybe map_async()'s Future's result should be a set?
Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle),
inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle. That's not to say Monocle has no value; it is a portability layer between Twisted and Tornado that does the same thing inlineCallbacks does but allows you to do it even if you're not using Deferreds, which will surely be useful to some people.
I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X?
I don't think I quite said that. But I suspect it happens because Twisted is hard to get into. I suspect anything using higher-order functions this much has that problem; I feel this way about Haskell's Monads. I wouldn't be surprised if many Twisted lovers are also closet (or not) Haskell lovers.
I'd use it if it did X, but it's all weird and I don't understand Y that it forces me to do instead, that's why I use Z" when, in fact:
- Twisted does do X
- It's done X for years
- It actually invented X in the first place
- There are legitimate reasons why we (Twisted core developers) suggest and prefer Y for many cases, but you don't need to do it if you don't want to follow our advice
- Thing Z that is being cited as doing X actually explicitly mentions Twisted as an inspiration for its implementation of X
It's fair, of course, to complain that we haven't explained this very well, and I'll cop to that unless I can immediately respond with a pre-existing URL that explains things :).
One other comment that's probably worth responding to:
I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it.
In my opinion, it is a mistake to try to harmonize or unify all GUI event systems, unless you are also harmonizing the GUI itself (i.e. writing a totally portable GUI toolkit that does everything). And I think we can all agree that writing a totally portable GUI toolkit is an impossibly huge task that is out of scope for this (or, really, any other) discussion. GUI systems can already dispatch its event to user code just fine - interposing a Python reactor API between the GUI and the event registration adds additional unnecessary work, and may not even be possible in some cases. See, for example, the way that Xcode (formerly Interface Builder) and the Glade interface designer use: the name of the event handler is registered inside a somewhat opaque blob, which is data and not code, and then hooked up automatically at runtime based on reflection. The code itself never calls any event-registration APIs.
Also, modeling all GUI interaction as a request/response conversation is limiting and leads to bad UI conventions. Consider: the UI element that most readily corresponds to a request/response is a modal dialog box. Does anyone out there really like applications that consist mainly of popping up dialog after dialog to prompt you for the answers to questions?