[This is the third spin-off thread from "asyncore: included batteries
don't fit"]
On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre
On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum
wrote: On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
wrote: Could you be more specific? I've never heard Deferreds in particular called "arcane". They're very popular in e.g. the JS world,
Really? Twisted is used in the JS world? Or do you just mean the pervasiveness of callback style async programming?
Ah, I mean Deferreds. I attended a talk earlier this year all about deferreds in JS, and not a single reference to Python or Twisted was made!
These are the examples I remember mentioned in the talk:
- http://api.jquery.com/category/deferred-object/ (not very twistedish at all, ill-liked by the speaker) - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe not a good example, mochikit tries to be "python in JS") - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html - https://github.com/kriskowal/q (also includes an explanation of why the author likes deferreds)
There were a few more that the speaker mentioned, but didn't cover. One of his points was that the various systems of deferreds are subtly different, some very badly so, and that it was a mess, but that deferreds were still awesome. JS is a language where async programming is mainstream, so lots of people try to make it easier, and they all do it slightly differently.
Thanks for those links. I followed the kriskowal/q link and was reminded of why Twisted's Deferreds are considered more awesome than Futures: it's the chaining. BUT... That's only important if callbacks are all the language lets you do! If your baseline is this: step1(function (value1) { step2(value1, function(value2) { step3(value2, function(value3) { step4(value3, function(value4) { // Do something with value4 }); }); }); }); then of course the alternative using Deferred looks better: Q.fcall(step1) .then(step2) .then(step3) .then(step4) .then(function (value4) { // Do something with value4 }, function (error) { // Handle any error from step1 through step4 }) .end(); (Both quoted literally from the kriskowal/q link.) I also don't doubt that using classic Futures you can't do this -- the chaining really matter for this style, and I presume this (modulo unimportant API differences) is what typical Twisted code looks like. However, Python has yield, and you can do much better (I'll write plain yield for now, but it works the same with yield-from): try: value1 = yield step1(<args>) value2 = yield step2(value1) value3 = yield step3(value2) # Do something with value4 except Exception: # Handle any error from step1 through step4 There's an outer function missing here, since you can't have a toplevel yield; I think that's the same for the JS case, typically. Also, strictly speaking the "Do something with value4" code should probably be in an else: clause after the except handler. But that actually leads nicely to the advantage: This form is more flexible, since it is easier to catch different exceptions at different points. It is also much easier to pass extra information around. E.g. what if your flow ends up having to pass both value1 and value2 into step3()? Sure, you can do that by making value2 a tuple (or a dict, or an object) incorporating value1 and the original value2, but that's exactly where this style becomes cumbersome, whereas in the yield-based form, such things can remain simple local variables. All in all I find it more readable. In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change. Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle), or they go a completely different route and use greenlets/gevent instead -- and get amazing performance and productivity that way too, even though they know it's monkey-patching their asses off... So, in the end, for Python 3.4 and beyond, I want to promote a style that mixes simple callbacks (perhaps augmented with simple Futures) and generator-based coroutines (either PEP 342, yield/send-based, or PEP 380 yield-from-based). I'm looking to Twisted for the best reactors (see other thread). But for transport/protocol implementations I think that generator/coroutines offers a cleaner, better interface than incorporating Deferred. I hope that the path forward for Twisted will be simple enough: it should be possible to hook Deferred into the simpler callback APIs (perhaps a new implementation using some form of adaptation, but keeping the interface the same). In a sense, the greenlet/gevent crowd will be the biggest losers, since they currently write async code without either callbacks or yield, using microthreads instead. I wouldn't want to have to start putting yield back everywhere into that code. But the stdlib will still support yield-free blocking calls (even if under the hood some of these use yield/send-based or yield-from-based couroutines) so the monkey-patchey tradition can continue.
That's one of the things I am desperately trying to keep out of Python, I find that style unreadable and unmanageable (whenever I click on a button in a website and nothing happens I know someone has a bug in their callbacks). I understand you feel different; but I feel the general sentiment is that callback-based async programming is even harder than multi-threaded programming (and nobody is claiming that threads are easy :-).
:S
There are (at least?) four different styles of asynchronous computation used in Twisted, and you seem to be confused as to which ones I'm talking about.
1. Explicit callbacks:
For example, reactor.callLater(t, lambda: print("woo hoo"))
I actually like this, as it's a lowest-common-denominator approach which everyone can easily adapt to their purposes. See the thread I started about reactors.
2. Method dispatch callbacks:
Similar to the above, the reactor or somebody has a handle on your object, and calls methods that you've defined when events happen e.g. IProtocol's dataReceived method
While I'm sure it's expedient and captures certain common patterns well, I like this the least of all -- calling fixed methods on an object sounds like a step back; it smells of the old Java way (before it had some equivalent of anonymous functions), and of asyncore, which (nearly) everybody agrees is kind of bad due to its insistence that you subclass its classes. (Notice how subclassing as the prevalent approach to structuring your code has gotten into a lot of discredit since 1996.)
3. Deferred callbacks:
When you ask for something to be done, it's set up, and you get an object back, which you can add a pipeline of callbacks to that will be called whenever whatever happens e.g. twisted.internet.threads.deferToThread(print, "x").addCallback(print, "x was printed in some other thread!")
Discussed above.
4. Generator coroutines
These are a syntactic wrapper around deferreds. If you yield a deferred, you will be sent the result if the deferred succeeds, or an exception if the deferred fails. e.g. examples from previous message
Seeing them as syntactic sugar for Deferreds is one way of looking at it; no doubt this is how they're seen in the Twisted community because Deferreds are older and more entrenched. But there's no requirement that an architecture has to have Deferreds in order to use generator coroutines -- simple Futures will do just fine, and Greg Ewing has shown that using yield-from you can even do without those. (But he does use simple, explicit callbacks at the lowest level of his system.)
I don't see a reason for the first to exist at all, the second one is kind of nice in some circumstances (see below), but perhaps overused.
I feel like you're railing on the first and second when I'm talking about the third and fourth. I could be wrong.
I think you're wrong -- I was (and am) most concerned about the perceived complexity of the API offered by, and the typical looks of code using, Deferreds (i.e., #3).
and possibly elsewhere. Moreover, they're extremely similar to futures, so if one is arcane so is the other.
I love Futures, they represent a nice simple programming model. But I especially love that you can write async code using Futures and yield-based coroutines (what you call inlineCallbacks) and never have to write an explicit callback function. Ever.
The reason explicit non-deferred callbacks are involved in Twisted is because of situations in which deferreds are not present, because of past history in Twisted. It is not at all a limitation of deferreds or something futures are better at, best as I'm aware.
(In case that's what you're getting at.)
I don't think I was. It's clear to me (now) that Futures are simpler than Deferreds -- and I like Futures better because of it, because for the complex cases I would much rather use generator coroutines than Deferreds.
Anyway, one big issue is that generator coroutines can't really effectively replace callbacks everywhere. Consider the GUI button example you gave. How do you write that as a coroutine?
I can see it being written like this:
def mycoroutine(gui): while True: clickevent = yield gui.mybutton1.on_click() # handle clickevent
But that's probably worse than using callbacks.
I touched on this briefly in the reactor thread. Basically, GUI callbacks are often level-triggered rather than edge-triggered, and IIUC Deferreds are not great for that either; and in a few cases where edge-triggered coding makes sense I *would* like to use a generator coroutine.
Neither is clearly better or more obvious than the other. If anything I generally find deferred composition more useful than deferred tee-ing, so I feel like composition is the correct base operator, but you could pick another.
If you're writing long complicated chains of callbacks that benefit from these features, IMO you are already doing it wrong. I understand that this is a matter of style where I won't be able to convince you. But style is important to me, so let's agree to disagree.
[In a follow-up to yourself, you quoted starting from this point and appended "Nevermind that whole segment." I'm keeping it in here just for context of the thread.]
This is more than a matter of style, so at least for now I'd like to hold off on calling it even.
In my day to day silly, synchronous, python code, I do lots of synchronous requests. For example, it's not unreasonable for me to want to load two different files from disk, or make several database interactions, etc. If I want to make this asynchronous, I have to find a way to execute multiple things that could hypothetically block, at the same time. If I can't do that easily, then the asynchronous solution has failed, because its entire purpose is to do everything that I do synchronously, except without blocking the main thread.
Here's an example with lots of synchronous requests in Django:
def view_paste(request, filekey): try: fileinfo= Pastes.objects.get(key=filekey) except DoesNotExist: t = loader.get_template('pastebin/error.html') return HttpResponse(t.render(Context(dict(error='File does not exist'))))
f = open(fileinfo.filename) fcontents = f.read() t = loader.get_template('pastebin/paste.html') return HttpResponse(t.render(Context(dict(file=fcontents))))
How many blocking requests are there? Lots. This is, in a word, a long, complicated chain of synchronous requests. This is also very similar to what actual django code might look like in some circumstances. Even if we might think this is unreasonable, some subset of alteration of this is reasonable. Certainly we should be able to, say, load multiple (!) objects from the database, and open the template (possibly from disk), all potentially-blocking operations.
This is inherently a long, complicated chain of requests, whether we implement it asynchronously or synchronously, or use Deferreds or Futures, or write it in Java or Python. Some parts can be done at any time before the end (loader.get_template(...)), some need to be done in a certain order, and there's branching depending on what happens in different cases. In order to even write this code _at all_, we need a way to chain these IO actions together. If we can't chain them together, we can't produce that final synthesis of results at the end.
[This is here you write "Ugh, just realized way after the fact that of course you meant callbacks, not composition. I feel dumb. Nevermind that whole segment."] I'd like to come back to that Django example though. You are implying that there are some opportunities for concurrency here, and I agree, assuming we believe disk I/O is slow enough to bother making it asynchronously. (In App Engine it's not, and we can't anyways, but in other contexts I agree that it would be bad if a slow disk seek were to hold up all processing -- not to mention that it might really be NFS...) The potentially async operations I see are: (1) fileinfo = Pastes.objects.get(key=filekey) # I assume this is some kind of database query (2) loader.get_template('pastebin/error.html') (3) f = open(fileinfo.filename) # depends on (1) (4) fcontents = f.read() # depends on (3) (5) loader.get_template('pastebin/paste.html') How would you code that using Twisted Deferreds? Using Futures and generator coroutines, I would do it as follows. I'm hypothesizing that for every blocking API foo() there is a corresponding non-blocking API foo_async() with the same call signature, and returning a Future whose result is what the synchronous API returns (and raises what the synchronous call would raise, if there's an error). These are the conventions I use in NDB. I'm also inventing a @task decorator. @task def view_paste_async(request, filekey): # Create Futures -- no yields! f1 = Pastes.objects.get_async(key=filekey) # This won't raise f2 = loader.get_template_async('pastebin/error.html') f3 = loader.get_template_async('pastebin/paste.html') try: fileinfo= yield f1 except DoesNotExist: t = yield f2 return HttpResponse(t.render(Context(dict(error='File does not exist')))) f = yield open_async(fileinfo.filename) fcontents = yield f.read_async() t = yield f3 return HttpResponse(t.render(Context(dict(file=fcontents)))) You could easily decide not to bother loading the error template asynchronously (assuming most requests don't fail), and you could move the creation of f3 below the try/except. But you get the idea. Even if you do everything serially, inserting the yields and _async calls would make this more parallellizable without the use of threads. (If you were using threads, all this would be moot of course -- but then your limit on requests being handled concurrently probably goes way down.)
We _need_ a pipeline or something computationally equivalent or more powerful. Results from past "deferred computations" need to be passed forward into future "deferred computations", in order to implement this at all.
Yeah, and I think that a single generator using multiple yields is the ideal pipeline to me (see my example near the top based on kriskowal/q).
This is not a style issue, this is an issue of needing to be able to solve problems that involve more than one computation where the results of every computation matters somewhere. It's just that in this case, some of the computations are computed asynchronously.
And I think generators do this very well.
I am totally open to learning from Twisted's experience. I hope that you are willing to share even the end result might not look like Twisted at all -- after all in Python 3.3 we have "yield from" and return from a generator and many years of experience with different styles of async APIs. In addition to Twisted, there's Tornado and Monocle, and then there's the whole greenlets/gevent and Stackless/microthreads community that we can't completely ignore. I believe somewhere is an ideal async architecture, and I hope you can help us discover it.
(For example, I am very interested in Twisted's experiences writing real-world performant, robust reactors.)
For that stuff, you'd have to speak to the main authors of Twisted. I'm just a twisted user. :(
They seem to be mostly ignoring this conversation, so your standing in as a proxy for them is much appreciated!
In the end it really doesn't matter what API you go with. The Twisted people will wrap it up so that they are compatible, as far as that is possible.
And I want to ensure that that is possible and preferably easy, if I can do it without introducing too many warts in the API that non-Twisted users see and use.
I hope I haven't detracted too much from the main thrust of the surrounding discussion. Futures/deferreds are a pretty big tangent, so sorry. I justified it to myself by figuring that it'd probably come up anyway, somehow, since these are useful abstractions for asynchronous programming.
Not at all. This has been a valuable refresher for me! -- --Guido van Rossum (python.org/~guido)