The async API of the future: Twisted and Deferreds

[This is the third spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Thanks for those links. I followed the kriskowal/q link and was reminded of why Twisted's Deferreds are considered more awesome than Futures: it's the chaining. BUT... That's only important if callbacks are all the language lets you do! If your baseline is this: step1(function (value1) { step2(value1, function(value2) { step3(value2, function(value3) { step4(value3, function(value4) { // Do something with value4 }); }); }); }); then of course the alternative using Deferred looks better: Q.fcall(step1) .then(step2) .then(step3) .then(step4) .then(function (value4) { // Do something with value4 }, function (error) { // Handle any error from step1 through step4 }) .end(); (Both quoted literally from the kriskowal/q link.) I also don't doubt that using classic Futures you can't do this -- the chaining really matter for this style, and I presume this (modulo unimportant API differences) is what typical Twisted code looks like. However, Python has yield, and you can do much better (I'll write plain yield for now, but it works the same with yield-from): try: value1 = yield step1(<args>) value2 = yield step2(value1) value3 = yield step3(value2) # Do something with value4 except Exception: # Handle any error from step1 through step4 There's an outer function missing here, since you can't have a toplevel yield; I think that's the same for the JS case, typically. Also, strictly speaking the "Do something with value4" code should probably be in an else: clause after the except handler. But that actually leads nicely to the advantage: This form is more flexible, since it is easier to catch different exceptions at different points. It is also much easier to pass extra information around. E.g. what if your flow ends up having to pass both value1 and value2 into step3()? Sure, you can do that by making value2 a tuple (or a dict, or an object) incorporating value1 and the original value2, but that's exactly where this style becomes cumbersome, whereas in the yield-based form, such things can remain simple local variables. All in all I find it more readable. In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change. Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle), or they go a completely different route and use greenlets/gevent instead -- and get amazing performance and productivity that way too, even though they know it's monkey-patching their asses off... So, in the end, for Python 3.4 and beyond, I want to promote a style that mixes simple callbacks (perhaps augmented with simple Futures) and generator-based coroutines (either PEP 342, yield/send-based, or PEP 380 yield-from-based). I'm looking to Twisted for the best reactors (see other thread). But for transport/protocol implementations I think that generator/coroutines offers a cleaner, better interface than incorporating Deferred. I hope that the path forward for Twisted will be simple enough: it should be possible to hook Deferred into the simpler callback APIs (perhaps a new implementation using some form of adaptation, but keeping the interface the same). In a sense, the greenlet/gevent crowd will be the biggest losers, since they currently write async code without either callbacks or yield, using microthreads instead. I wouldn't want to have to start putting yield back everywhere into that code. But the stdlib will still support yield-free blocking calls (even if under the hood some of these use yield/send-based or yield-from-based couroutines) so the monkey-patchey tradition can continue.
I actually like this, as it's a lowest-common-denominator approach which everyone can easily adapt to their purposes. See the thread I started about reactors.
While I'm sure it's expedient and captures certain common patterns well, I like this the least of all -- calling fixed methods on an object sounds like a step back; it smells of the old Java way (before it had some equivalent of anonymous functions), and of asyncore, which (nearly) everybody agrees is kind of bad due to its insistence that you subclass its classes. (Notice how subclassing as the prevalent approach to structuring your code has gotten into a lot of discredit since 1996.)
Discussed above.
Seeing them as syntactic sugar for Deferreds is one way of looking at it; no doubt this is how they're seen in the Twisted community because Deferreds are older and more entrenched. But there's no requirement that an architecture has to have Deferreds in order to use generator coroutines -- simple Futures will do just fine, and Greg Ewing has shown that using yield-from you can even do without those. (But he does use simple, explicit callbacks at the lowest level of his system.)
I think you're wrong -- I was (and am) most concerned about the perceived complexity of the API offered by, and the typical looks of code using, Deferreds (i.e., #3).
I don't think I was. It's clear to me (now) that Futures are simpler than Deferreds -- and I like Futures better because of it, because for the complex cases I would much rather use generator coroutines than Deferreds.
I touched on this briefly in the reactor thread. Basically, GUI callbacks are often level-triggered rather than edge-triggered, and IIUC Deferreds are not great for that either; and in a few cases where edge-triggered coding makes sense I *would* like to use a generator coroutine.
[In a follow-up to yourself, you quoted starting from this point and appended "Nevermind that whole segment." I'm keeping it in here just for context of the thread.]
[This is here you write "Ugh, just realized way after the fact that of course you meant callbacks, not composition. I feel dumb. Nevermind that whole segment."] I'd like to come back to that Django example though. You are implying that there are some opportunities for concurrency here, and I agree, assuming we believe disk I/O is slow enough to bother making it asynchronously. (In App Engine it's not, and we can't anyways, but in other contexts I agree that it would be bad if a slow disk seek were to hold up all processing -- not to mention that it might really be NFS...) The potentially async operations I see are: (1) fileinfo = Pastes.objects.get(key=filekey) # I assume this is some kind of database query (2) loader.get_template('pastebin/error.html') (3) f = open(fileinfo.filename) # depends on (1) (4) fcontents = f.read() # depends on (3) (5) loader.get_template('pastebin/paste.html') How would you code that using Twisted Deferreds? Using Futures and generator coroutines, I would do it as follows. I'm hypothesizing that for every blocking API foo() there is a corresponding non-blocking API foo_async() with the same call signature, and returning a Future whose result is what the synchronous API returns (and raises what the synchronous call would raise, if there's an error). These are the conventions I use in NDB. I'm also inventing a @task decorator. @task def view_paste_async(request, filekey): # Create Futures -- no yields! f1 = Pastes.objects.get_async(key=filekey) # This won't raise f2 = loader.get_template_async('pastebin/error.html') f3 = loader.get_template_async('pastebin/paste.html') try: fileinfo= yield f1 except DoesNotExist: t = yield f2 return HttpResponse(t.render(Context(dict(error='File does not exist')))) f = yield open_async(fileinfo.filename) fcontents = yield f.read_async() t = yield f3 return HttpResponse(t.render(Context(dict(file=fcontents)))) You could easily decide not to bother loading the error template asynchronously (assuming most requests don't fail), and you could move the creation of f3 below the try/except. But you get the idea. Even if you do everything serially, inserting the yields and _async calls would make this more parallellizable without the use of threads. (If you were using threads, all this would be moot of course -- but then your limit on requests being handled concurrently probably goes way down.)
Yeah, and I think that a single generator using multiple yields is the ideal pipeline to me (see my example near the top based on kriskowal/q).
And I think generators do this very well.
They seem to be mostly ignoring this conversation, so your standing in as a proxy for them is much appreciated!
And I want to ensure that that is possible and preferably easy, if I can do it without introducing too many warts in the API that non-Twisted users see and use.
Not at all. This has been a valuable refresher for me! -- --Guido van Rossum (python.org/~guido)

On 12/10/2012 11:11pm, Guido van Rossum wrote:
So would the futures be registered with the reactor as soon as they are created, or only when they are yielded? I can't see how there can be any "concurrency" if they don't start till they are yielded. It would be like doing t1 = Thread(target=f1) t2 = Thread(target=f2) t3 = Thread(target=f3) t1.start() t1.join() t2.start() t2.join() t3.start() t3.join() But if the futures are registered immediately with the reactor then does that mean there is a singleton reactor? That seems rather inflexible. Richard.

On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
I don't think it follows that there can only be one reactor if they are registered immediately. There could be a notion of "current reactor" maintained in thread-local context; moreover it could depend on the reactor that made the callback that caused the current task to run. The reactor could also be chosen by the code that made the Future. (Though I'm not immediately sure how that would work in the yield-from scenario -- but I'm sure there's a way.) FWIW, in NDB there is one event loop per thread; separate threads are handling separate requests and are completely independent. Also, in NDB there's some code that turns Futures into actual RPCs that runs only once there are no more immediately runnable tasks. I think that in general such behaviors are up to the reactor implementation for the platform though, and should not directly be reflected in the reactor API. -- --Guido van Rossum (python.org/~guido)

On 13/10/2012 1:22am, Guido van Rossum wrote:
Alternatively, yielding a future (or whatever ones calls the objects returned by *_async()) could register *and* wait for the result. To register without waiting one would yield a wrapper for the future. So one could write result = yield foo_async(...) or f = yield Register(foo_async()) # do some other work result = yield f Richard

On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
The Futures are not what is doing the work here, they just hold the result. In this example the get_async() functions register something with the reactor when they are called. When that "something" is done (or perhaps after several "somethings" chained together), get_async will set a result on its Future.
But if the futures are registered immediately with the reactor then does that mean there is a singleton reactor? That seems rather inflexible.
In most event-driven systems there is a global (or thread-local) event loop, but it's also possible to pass one in explicitly to get_async(). -Ben

On Fri, 12 Oct 2012 15:11:54 -0700 Guido van Rossum <guido@python.org> wrote:
But how would you write a dataReceived equivalent then? Would you have a "task" looping on a read() call, e.g. @task def my_protocol_main_loop(conn): while <some_condition>: try: data = yield conn.read(1024) except ConnectionError: conn.close() break I'm not sure I understand the problem with subclassing. It works fine in Twisted. Even in Python 3 we don't shy away from subclassing, for example the IO stack is based on subclassing RawIOBase, BufferedIOBase, etc. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Subclassing per se isn't a problem, but requiring a single dataReceived method per class can be awkward. Many protocols are effectively state machines, and modeling each state as a function can be cleaner than a big if/switch block in dataReceived. For example, here's a simplistic HTTP client using tornado's IOStream: from tornado import ioloop from tornado import iostream import socket def send_request(): stream.write("GET / HTTP/1.0\r\nHost: friendfeed.com\r\n\r\n") stream.read_until("\r\n\r\n", on_headers) def on_headers(data): headers = {} for line in data.split("\r\n"): parts = line.split(":") if len(parts) == 2: headers[parts[0].strip()] = parts[1].strip() stream.read_bytes(int(headers["Content-Length"]), on_body) def on_body(data): print data stream.close() ioloop.IOLoop.instance().stop() s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) stream = iostream.IOStream(s) stream.connect(("friendfeed.com", 80), send_request) ioloop.IOLoop.instance().start() Classes allow and encourage broader interfaces, which are sometimes a good thing, but interact poorly with coroutines. Both twisted and tornado use separate callbacks for incoming data and for the connection being closed, but for coroutines it's probably better to just treat a closed connection as an error on the read. Futures (and yield from) give us a nice way to do that. -Ben

What calls on_headers in this example? Coming from twisted, that seems like dataReceived's responsibility, but given your introductory paragraph that's not actually what goes on here? On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben@bendarnell.com> wrote:
-- cheers lvh

On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_@lvh.cc> wrote:
The IOStream does, after send_request calls stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is a _handle_read method that is registered with the IOLoop and fills up a buffer. When the read condition is satisfied the IOStream calls back into application code. -Ben

Interesting. That's certainly a nice API, but that then again (read_until) sounds like something I'd implement using dataReceived... You know, read_until clears the buffer, logs the requested callback. data_received adds something to the buffer, and checks if it triggered the (one of the?) registered callbacks. Of course, I may just be rusted in my ways and trying to implement everything in terms of things I know (then again, that might be just what's needed when you're trying to make a useful general API). I guess it's time for me to go deep-diving into Tornado :) On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell <ben@bendarnell.com> wrote:
-- cheers lvh

On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_@lvh.cc> wrote:
Right, that's how IOStream is implemented internally. The transport/protocol split works a little differently in Tornado: IOStream is implemented something like a Protocol subclass, but we consider it a part of the transport layer. The "protocols" are arbitrary classes that don't share any particular interface, but instead just call methods on the IOStream. -Ben

I quite like IOStream's interface, actually. If that's part of the transport layer, how do you prevent from having duplicating its behavior (read_until etc)? If there's just another separate object that would be the ITransport in twisted, I think the difference is purely one of labeling. On Sat, Oct 13, 2012 at 8:54 PM, Ben Darnell <ben@bendarnell.com> wrote:
-- cheers lvh

On Sat, Oct 13, 2012 at 12:13 PM, Laurens Van Houtven <_@lvh.cc> wrote:
So far we haven't actually needed much flexibility in the transport layer - most of the functionality is in the BaseIOStream class, and then there are subclasses IOStream (regular sockets), SSLIOStream, and PipeIOStream that actually call recv(), read(), connect(), etc. We might need a little refactoring if we introduce dramatically different types of transports, but the plan is that we'd represent transports as classes in the IOStream hierarchy. -Ben

[Quick, I know I'm way behind, especially on this thread; more tomorrow.] On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
No, I would use plain callbacks. There would be some kind of IOObject class defined by the stdlib that wraps a socket (it would make it non-blocking, and possibly to other things), and the user would make a registration call to the event loop giving it the IOOjbect and the user's callback function plus *args and **kwds; the event loop would call callback(*args, **kwds) each time the IOObject became readable. (Oh, and there would be separate registration (and unregistration) functions for reading and writing.) Apparently my rants about callbacks have made people assume that I don't want to see them anywhere. In fact I am comfortable with callbacks for a number of situations -- I just think we have several other tools in our toolbox that are way underused, whereas callbacks are way overused, in part because the alternative tools are relatively new. This way the user could switch to a different callback when a different phase of the protocol is reached. I realize there are other shapes this API could take. But I really don't want the user to have to subclass IOObject.
I'm fine with using subclassing for the internal structure of a library. (The IOObject I am postulating would almost certainly have a bunch of subclasses used for different types of sockets, IOCP, SSL, etc.) The thing that I've soured upon (and many others too) is to tell users "and to use this fine feature, just subclass this handy base class and override or extend the following three methods". Because in practice (certainly in Python, where the compiler doesn't enforce privacy) users always start overriding other methods, or using internal state, or add state that clashes with the base class's state, or forget to call mandatory super calls, or make incorrect assumptions about thread-safety, or whatever else they can do to screw things up. And duck typing isn't ideal either for this situation. -- --Guido van Rossum (python.org/~guido)

On Sat, 13 Oct 2012 22:03:17 -0700 Guido van Rossum <guido@python.org> wrote:
Subclassing IOObject would be wrong, since the user isn't writing an IO object in the first place. But subclassing a separate class, like Twisted's Protocol (which is mostly an empty shell, really), would sound reasonable to me. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 14, 2012 at 3:43 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It's a possible style. I'm inclined not to follow this example but I could go either way. One thing that somewhat worries me is that the names of these methods will be baked forever into all user code. As a user I prefer to have control over the names of my methods; first, there's the style issue (e.g. I'm always conflicted over what style to use in unittest.TestCase subclasses, since its own style is setUp, tearDown); second, in my app there may be a much better name for what the method does than e.g. data_received(). (Not to mention that that's another adjective used as a verb. ;-) -- --Guido van Rossum (python.org/~guido)

There has to be some way to contract emails sent in discussions rather than exploding them. I swear I'm trying to be concise, yet readable. It's not working. On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum <guido@python.org> wrote:
My experience has been unfortunately rather devoid of deferreds in Twisted. I always feel like the odd one out when people discuss this confusion. For me, it was all Protocol this and Protocol that, and deferreds only came up when I used Twisted's great AMP (Asynchronous Messaging Protocol) library.
--snip--
Well, first of all, deferreds have ways of joining values together. For example: from __future__ import print_function from twisted.internet import defer def example_joined(): d1 = defer.Deferred() d2 = defer.Deferred() # consumeErrors looks scary, but it only means that # d1 and d2's errbacks aren't called. Instead, the error is sent to d's # errback. d = defer.gatherResults([d1, d2], consumeErrors=True) d.addCallback(print) d.addErrback(lambda v: print("ERROR!")) d1.callback("The first deferred has succeeded") # now we're waiting on the second deferred to succeed, # which we'll let the caller handle return d2 example_joined().callback("The second deferred has succeeded too!") print("==============") example_joined().errback("The second deferred has failed...") I agree it's easier to use the generator style in many complicated cases. That doesn't preclude manual deferreds from also being useful.
Egh. I mean, sure, supposed we have those things. But what if you want to send the result of a callback to a generator-coroutine? Presumably generator coroutines work by yielding deferreds and being called back when the future resolves (deferred fires). But if those futures/deferreds aren't unexposed, and instead only the generator stuff is exposed, then bridging the gap between callbacks and generator-coroutines is impossible. So every callback function has to also be defined to use something else. And worse, other APIs using callbacks are left in the dust. Suppose, OTOH, futures/deferreds are exposed. Then we can easily bridge between callbacks and generators, by returning a future whose `set_result` is the callback to our callback function (deferred whose `callback` is the callback). But if we're exposing futures/deferreds, why have callbacks in the first place? The difference between these two functions, is that the second can be used in generator-coroutines trivially and the first cannot: # callbacks: reactor.timer(10, print, "hello world") # deferreds reactor.timer(10).addCallback(print, "hello world") Now here's another thing: suppose we have a list of "deferred events", but instead of handling all 10 at once, we want to handle them "as they arrive", and then synthesize a result at the bottom. How do you do this with pure generator coroutines? For example, perhaps I am implementing a game server, where all the players choose their characters and then the game begins. Whenever a character is chosen, everyone else has to know about it so that they can plan their strategy based on who has chosen a character. Character selections are final, just so that I can use deferreds (hee hee). I am imagining something like the following: # WRONG: handles players in a certain order, rather than as they come in def player_lobby(reactor, players): for player in players: player_character = yield player.wait_for_confirm(reactor) player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) start_game(players) This is wrong, because it goes in a certain order and "blocks" the coroutine until every character is chosen. Players will not know who has chosen what characters in an appropriate order. But hypothetically, maybe we could do the following: # Hypothetical magical code? def player_lobby(reactor, players): confirmation_events = UnorderedEventList([player.wait_for_confirm(reactor) for player in players]) while confirmation_events: player_character = yield confirmation_events.get_next() player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) start_game(players) But then, how do we write UnorderedEventList? I don't really know. I suspect I've made the problem harder, not easier! eek. Plus, it doesn't even read very well. Especially not compared to the deferred version: This is how I would personally do it in Twisted, without using UnorderedEventList (no magic!): @inlineCallbacks def player_lobby(reactor, players): events = [] for player in players: confirm_event = player.wait_for_confirm(reactor) @confirm_event.addCallback def on_confirmation(player_character, player=player) player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) yield gatherResults(events) start_game(players) Notice how I dropped down into the level of manipulating deferreds so that I could add this "as they come in" functionality, and then went back. Actually it wouldn't've hurt much to just not bother with inlineCallbacks at all. I don't think this is particularly unreadable. More importantly, I actually know how to do it. I have no idea how I would do this without using addCallback, or without reimplementing addCallback using inlineCallbacks. And then, supposing we don't have these deferreds/futures exposed... how do we implement delayed computation stuff from extension modules? What if we want to do these kinds of compositions within said extension modules? What if we want to write our own version of @tasks or @inlineCallbacks with extra features, or generate callback chains from XML files, and so on? I don't really like the prospect of having just the "sugary syntax" available, without a flexible underlying representation also exposed. I don't know if you've ever shared that worry -- sometimes the pretty syntax gets in the way of getting stuff done.
Surely it's no harder to make yourself into a generator than to make yourself into a low-level thread-like context switching function with a saved callstack implemented by hand in assembler, and so on? I'm sure they'll be fine.
Will do (but also see my response above about why not "everyone" can).
I only used asyncore once, indirectly, so I don't know anything about it. I'm willing to dismiss it (and, in fact, various parts of twisted (I'm looking at you twisted.words)) as not good examples of the pattern. First of all, I'd like to separate the notion of subclassing and method dispatch. They're entirely unrelated. If I pass my object to you, and you call different methods depending on what happens elsewhere, that's method dispatch. And my object doesn't have to be subclassed or anything for it to happen. Now here's the thing. Suppose we're writing, for example, an IRC bot. (Everyone loves IRC bots.) My IRC bot needs to handle several different possible events, such as: private messages channel join event CTCP event and so on. My event handlers for each of these events probably manipulate some internal state (such as a log file, or a GUI). We'd probably organize this as a class, or else as a bunch of functions accessing global state. Or, perhaps a collection of closures. This last one is pretty unlikely. For the most part, these functions are all intrinsically related and can't be sensibly treated separately. You can't take the private message callback of Bot A, and the channel join callback of bot B, and register these and expect a result that makes sense. If we look at this, we're expecting to deal with a set of functions that manage shared data. The abstraction for this is usually an object, and we'd really probably write the callbacks in a class unless we were being contrarian. And it's not too crazy for the dispatcher to know this and expect you to write it as a class that supports a certain interface (certain methods correspond to certain events). Missing methods can be assumed to have the empty implementation (no subclassing, just catching AttributeError). This isn't too much of an imposition on the user -- any collection of functions (with shared state via globals or closure variables) can be converted to an object with callable attributes very simply (thanks to types.SimpleNamespace, especially). And I only really think this is OK when writing it as an object -- as a collection of functions with shared state -- is the eminently obvious primary use case, so that that situation wouldn't come up very often. So, as an example, a protocol that passes data on further down the line needs to be notified when data is received, but also when the connection begins and ends. So the twisted protocol interface has "dataReceived", "connectionMade", and "connectionLost" callbacks. These really do belong together, they manage a single connection between computers and how it gets mapped to events usable by a twisted application. So I like the convenience and suggestiveness of them all being methods on an object.
I meant it as a factual explanation of what generator coroutines are in Twisted, not what they are in general. Sorry for the confusion. We are probably agreed here. After a cursory examination, I don't really understand Greg Ewing's thing. I'd have to dig deeper into the logs for when he first introduced it.
--snip--
How would you code that using Twisted Deferreds?
Well. I'd replace the @task in your NDB thing with @inlineCallbacks and call it a day. ;) (I think there's enough deferred examples above, and I'm getting tired and it's been a day since I started writing this damned email.)
Well. We are on Python-Ideas... :(
I probably lack the expertise to help too much with this. I can point out anything that sticks out, if/when an extended futures proposal is made. -- Devin

Devin Jeanpierre wrote:
That's one way to go about it, but it's not the only way. See here for my take on how it might work: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Exa... -- Greg

Devin Jeanpierre wrote (concerning callbacks):
IIUC, what Guido objects to is callbacks that are methods *of the I/O object*, so that you have to subclass the library-supplied object and override them. You seem to be talking about something slightly different -- an object that's entirely supplied by the user, and simply bundles a set of callbacks together. That doesn't seem so bad. -- Greg

On Sat, Oct 13, 2012 at 4:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Don't worry too much. I took essentially all Friday starting those four new threads. I am up at night thinking about the issues. I can't expect everyone else to have this much time to devote to Python!
Especially odd since you jumped into the discussion when I called Deferreds a bad name. :-)
I'm sorry, but that's not very readable at all. You needed a lambda (which if there was anything more would have to be expanded using 'def') and you're cheating by passing print as a callable (which saves you a second lambda, but only in this simple case). A readable version of this could should not have to use lambdas.
I agree it's easier to use the generator style in many complicated cases. That doesn't preclude manual deferreds from also being useful.
Yeah, but things should be as simple as they can. If you can do everything using plain callbacks, Futures and coroutines, why add Deferreds even if you can? (Except for backward compatibility of course. That's a totally different topic. But we're first defining the API of the future.) If Greg Ewing had his way we'd even do without Futures -- I'm still considering that bid. (In the yield-from thread I'm asking for common patterns that the new API should be able to solve.)
No, they don't use deferreds. They use Futures. You've made it quite clear that they are very different.
My plan is to expose the Futures *will* be exposed -- this is what worked well in NDB.
And that's how NDB does it. I've got a question to Greg Ewing on how he does it.
How about this: f = <some future> reactor.timer(10, f.set_result, None) Then whoever waits for f gets woken up in 10 seconds, and the reactor doesn't have to know what Futures are. But I believe your whole argument may be based on a misreading of my proposal. *I* want plain callbacks, Futures, and coroutines, and an event loop that only knows about plain callbacks and IO objects (e.g. sockets).
Let's ask Greg that. In NDB, I have a wait_any() function that you give a set of Futures and returns the first one that completes. It would be easy to build an iterator on top of this that takes a set of Futures and iterates over them in the order in which they are completed.
Clearly we have an educational issue on our hands! :-)
You're barking up the wrong tree -- please badger Greg Ewing with use cases in the yield-from thread. With my approach all of these can be done. (See the yield-from thread for an example I just posted of a barrier, where multiple tasks wait for a single event.)
The thing that worries me most is reimplementing httplib, urllib and so on to use all this new machinery *and* keep the old synchronous APIs working *even* if some code is written using the old style and some other code wants to use the new style.
Agreed. Antoine made the same point elsewhere and I half conceded.
Now here's the thing. Suppose we're writing, for example, an IRC bot. (Everyone loves IRC bots.)
(For the record, I hate IRC, the software, the culture, the interaction style. But maybe I'm unusual that way. :-)
I certainly wouldn't recommend collections of closures for that!
There's also a certain order to them, right? I'd think the state transition diagram is something like connectionMade (1); dataReceived (*); connectionLost (1) I wonder if there are any guarantees that they will only be called in this order, and who is supposed to enforce this? If would be awkward if the user code would have to guard itself against this; also if the developer made an unwarranted assumption (e.g. dataReceived is called at least once).
Please press him for explanations. Ask questions. He knows his dream best of all. We need to learn.
No problem. Same here. :-)
Somehow we got Itamar and Glyph to join, so I think we're covered!
You've done great in increasing my understanding of Twisted and Deferred. Thank you very much! -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 14, 2012 at 5:53 PM, Guido van Rossum <guido@python.org> wrote:
A readable version of this could should not have to use lambdas.
In a lot of Twisted code, it happens with methods as callback methods, something like: d = self._doRPC(....) d.addCallbacks(self._formatResponse, self._formatException) d.addCallback(self._finish) That doesn't talk about gatherResults, but hopefully it makes the idea clear. A lot of the legibility is dependant on making those method names sensible, though. Our in-house style guide asks for limiting functions to about ten lines, preferably half that. Works for us. Another pattern that's frowned upon since it's a bit of an abuse of decorator syntax, but I still like because it tends to make things easier to read for inline callback definitions where you do need more than a lambda: d = somethingThatHappensLater() @d.addCallback def whenItsDone(result): doSomethingWith(result) --Guido van Rossum (python.org/~guido)
cheers lvh

On Sun, Oct 14, 2012 at 9:18 AM, Laurens Van Houtven <_@lvh.cc> wrote:
I quite understand that in your ecosystem you've found best practices for every imaginable use case. And I understand that once you're part of the community and have internalized the idioms and style, it's quite readable. But you haven't shaken my belief that we can do better with the current version of the language (3.3). (FWIW, I think it would be a good idea to develop a "reference implementation" of many of these ideas outside the standard library. Depending on whether we end up adopting yield <future> or yield from <generator> it might even support versions of Python 3 before 3.3. I certainly don't want to have to wait for 3.4 -- although that's the first opportunity for incorporating it into the stdlib.) -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 14, 2012 at 11:53 AM, Guido van Rossum <guido@python.org> wrote:
Did I mention how great AMP was? ;)
Sure. I probably erred in not using inlineCallbacks form, what I wanted to do was highlight the gatherResults function (which, as it happens, does something generators can't without invoking an external function.) My worry here was that generators are being praised for being more readable, which is true and reasonable, but I don't know that they're flexible enough to be the only way to do things. But you've stated now that you'd want futures to be there too, so... those are probably mostly flexible enough.
Haha, different in API and what they can do, but they are meant to do the same thing (represent delayed results). I meant to talk about futures and deferreds equally, and ask the same questions of both of them.
OK. I was confused when you said there would only be generators and simple callbacks (and so I posed questions about what happens when you have just generators, which you took to be questions aimed at Greg Ewing's thing.)
I know that Twisted has historically agreed with the idea that the reactor shouldn't know about futures/deferreds. I'm not sure I agree it's so important. If the universal way of writing asynchronous code is generator-coroutines, then the reactor should work well with this and not require extra effort.
You're correct.
I meant to be asking about the situation you were proposing. I thought it was just callbacks and generators, now we've added futures. Futures sans chaining can definitely implement this, just maybe not as nicely as how I'd do it. The issue is that it's a reasonable thing to want to escape the generator system in order to implement things that aren't "linear" the way generator coroutines are. And if we escape the system, it should be possible and easy to do a large variety of things. But, on the plus side, I'm convinced that it's possible, and that the necessary things will be exposed (even if it's very unpleasant, there's always helper functions...). Unless you do Greg's thing, then I'm worried again. I will read his stuff later today or tomorrow. (Unrelated: I'm not sure why I was so sure UnorderedEventList had to be that ugly. It can use a for loop... oops.)
(We're now deviating from futures and deferreds, but I think the part I was taking was drawing to a close anyway) Code that wants to use the old style can be integrated by calling it in a separate thread, and that's fine. If the results should be used in the asynchronous code, then have a thing that integrates with threading so that when the thread returns (or fails with an exception) it can notify a future/deferred of the outcome. Twisted's has deferToThread for this. It also has blockingCallFromThread if the synchronous code wants to talk back to the asynchronous code. And that leads me to this: Imagine if, instead of having two implementations (one synchronous, one not), we had only one (asynchronous), and then had some wrappers to make it work as a synchronous implementation as well? Here is an example of a synchronous program written in Python+Twisted, where I wrap deferlater to be a blocking function (so that it is similar to a time.sleep() followed by a function call). The reactor is started in a separate thread, and is left to die whenever the main thread dies (because thread daemons yay.) from __future__ import print_function import threading from twisted.internet import task, reactor from twisted.internet.threads import blockingCallFromThread def my_deferlater(reactor, time, callback, *args, **kwargs): return blockingCallFromThread(reactor, task.deferLater, reactor, time, callback, *args, **kwargs) # in reality, global reactor for all threads is terrible idea. # We'd want to instantiate a new reactor for # the reactor thread, and have a global REACTOR as well. # We'll just use this reactor. # This code will not work with any other twisted # code because of the global reactor shenanigans. # (But it'd work if we were able to have a reactor per thread.) REACTOR_THREAD = None def start_reactor(): global REACTOR_THREAD if REACTOR_THREAD is not None: # could be an error, or not, depending on how you feel this should be. return REACTOR_THREAD = threading.Thread(target=reactor.run, kwargs=dict( # signal handlers don't work if not in main thread. installSignalHandlers=0)) REACTOR_THREAD.daemon = True # Probably really evil. REACTOR_THREAD.start() start_reactor() my_deferlater(reactor, 1, print, "This will print after 1 second!") my_deferlater(reactor, 1, print, "This will print after 2 seconds!") my_deferlater(reactor, 1, print, "This will print after 3 seconds!") So maybe this is an option? It's really important that there not be just one global reactor, and that multiple reactors can run at the same time, for this to really work. But if that were done, then you could have a single global reactor responsible for being the back end of the new implementations of old synchronous APIs. Maybe it'd be started whenever the first call is made to a synchronous function. And maybe, to interoperate with some actual asynchronous code, you could have a way to change which reactor acts as the global reactor for synchronous APIs? I did this once, because I needed to rewrite a blocking API and wanted to use Twisted, except that I made the mistake of starting the thread when the module was created instead of on first call. This lead to a deadlock because of the global import lock... :( In principle I don't know why this would be a terrible awful idea, if it was done right, but maybe people with more experiences with threaded code can correct me. (The whole thread daemon thing necessary to make it act like a synchronous program, might be terribly insane and therefore an idea killer. I'm not sure.) I'm under the understanding that the global import lock won't cause this particular issue anymore as of Python 3.3, so perhaps starting a reactor on import is reasonable.
The docs in Twisted don't spell it out, but they do say that connectionMade should be considered to be the initializer for the connection, and that upon connectionLost the one should let the protocol be garbage collected. So, that seems like a guarantee that they are called in that order. I don't think it can really be enforced in Python (unless you want to do some jiggery pokery into model checking at runtime), but the responsibility for this failing in Twisted would be on the transport, as far as I understand it. If the transport calls back to the protocol in some invalid combination, it's the transport's fault for being broken. This is something that should be clearly documented. (It's an issue, also, regardless of whether or not a class is used to encapsulate the callbacks, or whether they are registered individually.) -- Devin

On Oct 14, 2012 11:27 AM, "Devin Jeanpierre" <jeanpierreda@gmail.com> wrote:
Yeah, while a global import lock still exists, it's used just long enough to get a per-module lock. On top of that, the import system now uses importlib (read: pure Python) for most functionality, which has bearing on threading and ease of better accommodating async if needed. -import

Guido van Rossum wrote:
I think this could be handled the same way you alluded to before when talking about the App Engine. The base implementation is asynchronous, and you provide a synchronous API that sets up an async operation and then runs a nested event loop until it completes. -- Greg

Okay, I hate to do this, but is there any chance someone can provide a quick summary of the solution we're referring to here? I just started watching python-ideas today, and have a lot of things going on, plus real bad ADD, so I'm having a hard time reassembling the solutions being referred to… (maybe there's a web presentation that gives a better threaded presentation than my mail program? Or maybe I'm daff. Either way, this sounded interesting!) In summary, then, the Q/A below is referring to which approach? Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 5:23 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

On 12/10/2012 11:11pm, Guido van Rossum wrote:
So would the futures be registered with the reactor as soon as they are created, or only when they are yielded? I can't see how there can be any "concurrency" if they don't start till they are yielded. It would be like doing t1 = Thread(target=f1) t2 = Thread(target=f2) t3 = Thread(target=f3) t1.start() t1.join() t2.start() t2.join() t3.start() t3.join() But if the futures are registered immediately with the reactor then does that mean there is a singleton reactor? That seems rather inflexible. Richard.

On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
I don't think it follows that there can only be one reactor if they are registered immediately. There could be a notion of "current reactor" maintained in thread-local context; moreover it could depend on the reactor that made the callback that caused the current task to run. The reactor could also be chosen by the code that made the Future. (Though I'm not immediately sure how that would work in the yield-from scenario -- but I'm sure there's a way.) FWIW, in NDB there is one event loop per thread; separate threads are handling separate requests and are completely independent. Also, in NDB there's some code that turns Futures into actual RPCs that runs only once there are no more immediately runnable tasks. I think that in general such behaviors are up to the reactor implementation for the platform though, and should not directly be reflected in the reactor API. -- --Guido van Rossum (python.org/~guido)

On 13/10/2012 1:22am, Guido van Rossum wrote:
Alternatively, yielding a future (or whatever ones calls the objects returned by *_async()) could register *and* wait for the result. To register without waiting one would yield a wrapper for the future. So one could write result = yield foo_async(...) or f = yield Register(foo_async()) # do some other work result = yield f Richard

On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
The Futures are not what is doing the work here, they just hold the result. In this example the get_async() functions register something with the reactor when they are called. When that "something" is done (or perhaps after several "somethings" chained together), get_async will set a result on its Future.
But if the futures are registered immediately with the reactor then does that mean there is a singleton reactor? That seems rather inflexible.
In most event-driven systems there is a global (or thread-local) event loop, but it's also possible to pass one in explicitly to get_async(). -Ben

On Fri, 12 Oct 2012 15:11:54 -0700 Guido van Rossum <guido@python.org> wrote:
But how would you write a dataReceived equivalent then? Would you have a "task" looping on a read() call, e.g. @task def my_protocol_main_loop(conn): while <some_condition>: try: data = yield conn.read(1024) except ConnectionError: conn.close() break I'm not sure I understand the problem with subclassing. It works fine in Twisted. Even in Python 3 we don't shy away from subclassing, for example the IO stack is based on subclassing RawIOBase, BufferedIOBase, etc. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Subclassing per se isn't a problem, but requiring a single dataReceived method per class can be awkward. Many protocols are effectively state machines, and modeling each state as a function can be cleaner than a big if/switch block in dataReceived. For example, here's a simplistic HTTP client using tornado's IOStream: from tornado import ioloop from tornado import iostream import socket def send_request(): stream.write("GET / HTTP/1.0\r\nHost: friendfeed.com\r\n\r\n") stream.read_until("\r\n\r\n", on_headers) def on_headers(data): headers = {} for line in data.split("\r\n"): parts = line.split(":") if len(parts) == 2: headers[parts[0].strip()] = parts[1].strip() stream.read_bytes(int(headers["Content-Length"]), on_body) def on_body(data): print data stream.close() ioloop.IOLoop.instance().stop() s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) stream = iostream.IOStream(s) stream.connect(("friendfeed.com", 80), send_request) ioloop.IOLoop.instance().start() Classes allow and encourage broader interfaces, which are sometimes a good thing, but interact poorly with coroutines. Both twisted and tornado use separate callbacks for incoming data and for the connection being closed, but for coroutines it's probably better to just treat a closed connection as an error on the read. Futures (and yield from) give us a nice way to do that. -Ben

What calls on_headers in this example? Coming from twisted, that seems like dataReceived's responsibility, but given your introductory paragraph that's not actually what goes on here? On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben@bendarnell.com> wrote:
-- cheers lvh

On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_@lvh.cc> wrote:
The IOStream does, after send_request calls stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is a _handle_read method that is registered with the IOLoop and fills up a buffer. When the read condition is satisfied the IOStream calls back into application code. -Ben

Interesting. That's certainly a nice API, but that then again (read_until) sounds like something I'd implement using dataReceived... You know, read_until clears the buffer, logs the requested callback. data_received adds something to the buffer, and checks if it triggered the (one of the?) registered callbacks. Of course, I may just be rusted in my ways and trying to implement everything in terms of things I know (then again, that might be just what's needed when you're trying to make a useful general API). I guess it's time for me to go deep-diving into Tornado :) On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell <ben@bendarnell.com> wrote:
-- cheers lvh

On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_@lvh.cc> wrote:
Right, that's how IOStream is implemented internally. The transport/protocol split works a little differently in Tornado: IOStream is implemented something like a Protocol subclass, but we consider it a part of the transport layer. The "protocols" are arbitrary classes that don't share any particular interface, but instead just call methods on the IOStream. -Ben

I quite like IOStream's interface, actually. If that's part of the transport layer, how do you prevent from having duplicating its behavior (read_until etc)? If there's just another separate object that would be the ITransport in twisted, I think the difference is purely one of labeling. On Sat, Oct 13, 2012 at 8:54 PM, Ben Darnell <ben@bendarnell.com> wrote:
-- cheers lvh

On Sat, Oct 13, 2012 at 12:13 PM, Laurens Van Houtven <_@lvh.cc> wrote:
So far we haven't actually needed much flexibility in the transport layer - most of the functionality is in the BaseIOStream class, and then there are subclasses IOStream (regular sockets), SSLIOStream, and PipeIOStream that actually call recv(), read(), connect(), etc. We might need a little refactoring if we introduce dramatically different types of transports, but the plan is that we'd represent transports as classes in the IOStream hierarchy. -Ben

[Quick, I know I'm way behind, especially on this thread; more tomorrow.] On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
No, I would use plain callbacks. There would be some kind of IOObject class defined by the stdlib that wraps a socket (it would make it non-blocking, and possibly to other things), and the user would make a registration call to the event loop giving it the IOOjbect and the user's callback function plus *args and **kwds; the event loop would call callback(*args, **kwds) each time the IOObject became readable. (Oh, and there would be separate registration (and unregistration) functions for reading and writing.) Apparently my rants about callbacks have made people assume that I don't want to see them anywhere. In fact I am comfortable with callbacks for a number of situations -- I just think we have several other tools in our toolbox that are way underused, whereas callbacks are way overused, in part because the alternative tools are relatively new. This way the user could switch to a different callback when a different phase of the protocol is reached. I realize there are other shapes this API could take. But I really don't want the user to have to subclass IOObject.
I'm fine with using subclassing for the internal structure of a library. (The IOObject I am postulating would almost certainly have a bunch of subclasses used for different types of sockets, IOCP, SSL, etc.) The thing that I've soured upon (and many others too) is to tell users "and to use this fine feature, just subclass this handy base class and override or extend the following three methods". Because in practice (certainly in Python, where the compiler doesn't enforce privacy) users always start overriding other methods, or using internal state, or add state that clashes with the base class's state, or forget to call mandatory super calls, or make incorrect assumptions about thread-safety, or whatever else they can do to screw things up. And duck typing isn't ideal either for this situation. -- --Guido van Rossum (python.org/~guido)

On Sat, 13 Oct 2012 22:03:17 -0700 Guido van Rossum <guido@python.org> wrote:
Subclassing IOObject would be wrong, since the user isn't writing an IO object in the first place. But subclassing a separate class, like Twisted's Protocol (which is mostly an empty shell, really), would sound reasonable to me. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 14, 2012 at 3:43 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It's a possible style. I'm inclined not to follow this example but I could go either way. One thing that somewhat worries me is that the names of these methods will be baked forever into all user code. As a user I prefer to have control over the names of my methods; first, there's the style issue (e.g. I'm always conflicted over what style to use in unittest.TestCase subclasses, since its own style is setUp, tearDown); second, in my app there may be a much better name for what the method does than e.g. data_received(). (Not to mention that that's another adjective used as a verb. ;-) -- --Guido van Rossum (python.org/~guido)

There has to be some way to contract emails sent in discussions rather than exploding them. I swear I'm trying to be concise, yet readable. It's not working. On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum <guido@python.org> wrote:
My experience has been unfortunately rather devoid of deferreds in Twisted. I always feel like the odd one out when people discuss this confusion. For me, it was all Protocol this and Protocol that, and deferreds only came up when I used Twisted's great AMP (Asynchronous Messaging Protocol) library.
--snip--
Well, first of all, deferreds have ways of joining values together. For example: from __future__ import print_function from twisted.internet import defer def example_joined(): d1 = defer.Deferred() d2 = defer.Deferred() # consumeErrors looks scary, but it only means that # d1 and d2's errbacks aren't called. Instead, the error is sent to d's # errback. d = defer.gatherResults([d1, d2], consumeErrors=True) d.addCallback(print) d.addErrback(lambda v: print("ERROR!")) d1.callback("The first deferred has succeeded") # now we're waiting on the second deferred to succeed, # which we'll let the caller handle return d2 example_joined().callback("The second deferred has succeeded too!") print("==============") example_joined().errback("The second deferred has failed...") I agree it's easier to use the generator style in many complicated cases. That doesn't preclude manual deferreds from also being useful.
Egh. I mean, sure, supposed we have those things. But what if you want to send the result of a callback to a generator-coroutine? Presumably generator coroutines work by yielding deferreds and being called back when the future resolves (deferred fires). But if those futures/deferreds aren't unexposed, and instead only the generator stuff is exposed, then bridging the gap between callbacks and generator-coroutines is impossible. So every callback function has to also be defined to use something else. And worse, other APIs using callbacks are left in the dust. Suppose, OTOH, futures/deferreds are exposed. Then we can easily bridge between callbacks and generators, by returning a future whose `set_result` is the callback to our callback function (deferred whose `callback` is the callback). But if we're exposing futures/deferreds, why have callbacks in the first place? The difference between these two functions, is that the second can be used in generator-coroutines trivially and the first cannot: # callbacks: reactor.timer(10, print, "hello world") # deferreds reactor.timer(10).addCallback(print, "hello world") Now here's another thing: suppose we have a list of "deferred events", but instead of handling all 10 at once, we want to handle them "as they arrive", and then synthesize a result at the bottom. How do you do this with pure generator coroutines? For example, perhaps I am implementing a game server, where all the players choose their characters and then the game begins. Whenever a character is chosen, everyone else has to know about it so that they can plan their strategy based on who has chosen a character. Character selections are final, just so that I can use deferreds (hee hee). I am imagining something like the following: # WRONG: handles players in a certain order, rather than as they come in def player_lobby(reactor, players): for player in players: player_character = yield player.wait_for_confirm(reactor) player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) start_game(players) This is wrong, because it goes in a certain order and "blocks" the coroutine until every character is chosen. Players will not know who has chosen what characters in an appropriate order. But hypothetically, maybe we could do the following: # Hypothetical magical code? def player_lobby(reactor, players): confirmation_events = UnorderedEventList([player.wait_for_confirm(reactor) for player in players]) while confirmation_events: player_character = yield confirmation_events.get_next() player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) start_game(players) But then, how do we write UnorderedEventList? I don't really know. I suspect I've made the problem harder, not easier! eek. Plus, it doesn't even read very well. Especially not compared to the deferred version: This is how I would personally do it in Twisted, without using UnorderedEventList (no magic!): @inlineCallbacks def player_lobby(reactor, players): events = [] for player in players: confirm_event = player.wait_for_confirm(reactor) @confirm_event.addCallback def on_confirmation(player_character, player=player) player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) yield gatherResults(events) start_game(players) Notice how I dropped down into the level of manipulating deferreds so that I could add this "as they come in" functionality, and then went back. Actually it wouldn't've hurt much to just not bother with inlineCallbacks at all. I don't think this is particularly unreadable. More importantly, I actually know how to do it. I have no idea how I would do this without using addCallback, or without reimplementing addCallback using inlineCallbacks. And then, supposing we don't have these deferreds/futures exposed... how do we implement delayed computation stuff from extension modules? What if we want to do these kinds of compositions within said extension modules? What if we want to write our own version of @tasks or @inlineCallbacks with extra features, or generate callback chains from XML files, and so on? I don't really like the prospect of having just the "sugary syntax" available, without a flexible underlying representation also exposed. I don't know if you've ever shared that worry -- sometimes the pretty syntax gets in the way of getting stuff done.
Surely it's no harder to make yourself into a generator than to make yourself into a low-level thread-like context switching function with a saved callstack implemented by hand in assembler, and so on? I'm sure they'll be fine.
Will do (but also see my response above about why not "everyone" can).
I only used asyncore once, indirectly, so I don't know anything about it. I'm willing to dismiss it (and, in fact, various parts of twisted (I'm looking at you twisted.words)) as not good examples of the pattern. First of all, I'd like to separate the notion of subclassing and method dispatch. They're entirely unrelated. If I pass my object to you, and you call different methods depending on what happens elsewhere, that's method dispatch. And my object doesn't have to be subclassed or anything for it to happen. Now here's the thing. Suppose we're writing, for example, an IRC bot. (Everyone loves IRC bots.) My IRC bot needs to handle several different possible events, such as: private messages channel join event CTCP event and so on. My event handlers for each of these events probably manipulate some internal state (such as a log file, or a GUI). We'd probably organize this as a class, or else as a bunch of functions accessing global state. Or, perhaps a collection of closures. This last one is pretty unlikely. For the most part, these functions are all intrinsically related and can't be sensibly treated separately. You can't take the private message callback of Bot A, and the channel join callback of bot B, and register these and expect a result that makes sense. If we look at this, we're expecting to deal with a set of functions that manage shared data. The abstraction for this is usually an object, and we'd really probably write the callbacks in a class unless we were being contrarian. And it's not too crazy for the dispatcher to know this and expect you to write it as a class that supports a certain interface (certain methods correspond to certain events). Missing methods can be assumed to have the empty implementation (no subclassing, just catching AttributeError). This isn't too much of an imposition on the user -- any collection of functions (with shared state via globals or closure variables) can be converted to an object with callable attributes very simply (thanks to types.SimpleNamespace, especially). And I only really think this is OK when writing it as an object -- as a collection of functions with shared state -- is the eminently obvious primary use case, so that that situation wouldn't come up very often. So, as an example, a protocol that passes data on further down the line needs to be notified when data is received, but also when the connection begins and ends. So the twisted protocol interface has "dataReceived", "connectionMade", and "connectionLost" callbacks. These really do belong together, they manage a single connection between computers and how it gets mapped to events usable by a twisted application. So I like the convenience and suggestiveness of them all being methods on an object.
I meant it as a factual explanation of what generator coroutines are in Twisted, not what they are in general. Sorry for the confusion. We are probably agreed here. After a cursory examination, I don't really understand Greg Ewing's thing. I'd have to dig deeper into the logs for when he first introduced it.
--snip--
How would you code that using Twisted Deferreds?
Well. I'd replace the @task in your NDB thing with @inlineCallbacks and call it a day. ;) (I think there's enough deferred examples above, and I'm getting tired and it's been a day since I started writing this damned email.)
Well. We are on Python-Ideas... :(
I probably lack the expertise to help too much with this. I can point out anything that sticks out, if/when an extended futures proposal is made. -- Devin

Devin Jeanpierre wrote:
That's one way to go about it, but it's not the only way. See here for my take on how it might work: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Exa... -- Greg

Devin Jeanpierre wrote (concerning callbacks):
IIUC, what Guido objects to is callbacks that are methods *of the I/O object*, so that you have to subclass the library-supplied object and override them. You seem to be talking about something slightly different -- an object that's entirely supplied by the user, and simply bundles a set of callbacks together. That doesn't seem so bad. -- Greg

On Sat, Oct 13, 2012 at 4:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Don't worry too much. I took essentially all Friday starting those four new threads. I am up at night thinking about the issues. I can't expect everyone else to have this much time to devote to Python!
Especially odd since you jumped into the discussion when I called Deferreds a bad name. :-)
I'm sorry, but that's not very readable at all. You needed a lambda (which if there was anything more would have to be expanded using 'def') and you're cheating by passing print as a callable (which saves you a second lambda, but only in this simple case). A readable version of this could should not have to use lambdas.
I agree it's easier to use the generator style in many complicated cases. That doesn't preclude manual deferreds from also being useful.
Yeah, but things should be as simple as they can. If you can do everything using plain callbacks, Futures and coroutines, why add Deferreds even if you can? (Except for backward compatibility of course. That's a totally different topic. But we're first defining the API of the future.) If Greg Ewing had his way we'd even do without Futures -- I'm still considering that bid. (In the yield-from thread I'm asking for common patterns that the new API should be able to solve.)
No, they don't use deferreds. They use Futures. You've made it quite clear that they are very different.
My plan is to expose the Futures *will* be exposed -- this is what worked well in NDB.
And that's how NDB does it. I've got a question to Greg Ewing on how he does it.
How about this: f = <some future> reactor.timer(10, f.set_result, None) Then whoever waits for f gets woken up in 10 seconds, and the reactor doesn't have to know what Futures are. But I believe your whole argument may be based on a misreading of my proposal. *I* want plain callbacks, Futures, and coroutines, and an event loop that only knows about plain callbacks and IO objects (e.g. sockets).
Let's ask Greg that. In NDB, I have a wait_any() function that you give a set of Futures and returns the first one that completes. It would be easy to build an iterator on top of this that takes a set of Futures and iterates over them in the order in which they are completed.
Clearly we have an educational issue on our hands! :-)
You're barking up the wrong tree -- please badger Greg Ewing with use cases in the yield-from thread. With my approach all of these can be done. (See the yield-from thread for an example I just posted of a barrier, where multiple tasks wait for a single event.)
The thing that worries me most is reimplementing httplib, urllib and so on to use all this new machinery *and* keep the old synchronous APIs working *even* if some code is written using the old style and some other code wants to use the new style.
Agreed. Antoine made the same point elsewhere and I half conceded.
Now here's the thing. Suppose we're writing, for example, an IRC bot. (Everyone loves IRC bots.)
(For the record, I hate IRC, the software, the culture, the interaction style. But maybe I'm unusual that way. :-)
I certainly wouldn't recommend collections of closures for that!
There's also a certain order to them, right? I'd think the state transition diagram is something like connectionMade (1); dataReceived (*); connectionLost (1) I wonder if there are any guarantees that they will only be called in this order, and who is supposed to enforce this? If would be awkward if the user code would have to guard itself against this; also if the developer made an unwarranted assumption (e.g. dataReceived is called at least once).
Please press him for explanations. Ask questions. He knows his dream best of all. We need to learn.
No problem. Same here. :-)
Somehow we got Itamar and Glyph to join, so I think we're covered!
You've done great in increasing my understanding of Twisted and Deferred. Thank you very much! -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 14, 2012 at 5:53 PM, Guido van Rossum <guido@python.org> wrote:
A readable version of this could should not have to use lambdas.
In a lot of Twisted code, it happens with methods as callback methods, something like: d = self._doRPC(....) d.addCallbacks(self._formatResponse, self._formatException) d.addCallback(self._finish) That doesn't talk about gatherResults, but hopefully it makes the idea clear. A lot of the legibility is dependant on making those method names sensible, though. Our in-house style guide asks for limiting functions to about ten lines, preferably half that. Works for us. Another pattern that's frowned upon since it's a bit of an abuse of decorator syntax, but I still like because it tends to make things easier to read for inline callback definitions where you do need more than a lambda: d = somethingThatHappensLater() @d.addCallback def whenItsDone(result): doSomethingWith(result) --Guido van Rossum (python.org/~guido)
cheers lvh

On Sun, Oct 14, 2012 at 9:18 AM, Laurens Van Houtven <_@lvh.cc> wrote:
I quite understand that in your ecosystem you've found best practices for every imaginable use case. And I understand that once you're part of the community and have internalized the idioms and style, it's quite readable. But you haven't shaken my belief that we can do better with the current version of the language (3.3). (FWIW, I think it would be a good idea to develop a "reference implementation" of many of these ideas outside the standard library. Depending on whether we end up adopting yield <future> or yield from <generator> it might even support versions of Python 3 before 3.3. I certainly don't want to have to wait for 3.4 -- although that's the first opportunity for incorporating it into the stdlib.) -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 14, 2012 at 11:53 AM, Guido van Rossum <guido@python.org> wrote:
Did I mention how great AMP was? ;)
Sure. I probably erred in not using inlineCallbacks form, what I wanted to do was highlight the gatherResults function (which, as it happens, does something generators can't without invoking an external function.) My worry here was that generators are being praised for being more readable, which is true and reasonable, but I don't know that they're flexible enough to be the only way to do things. But you've stated now that you'd want futures to be there too, so... those are probably mostly flexible enough.
Haha, different in API and what they can do, but they are meant to do the same thing (represent delayed results). I meant to talk about futures and deferreds equally, and ask the same questions of both of them.
OK. I was confused when you said there would only be generators and simple callbacks (and so I posed questions about what happens when you have just generators, which you took to be questions aimed at Greg Ewing's thing.)
I know that Twisted has historically agreed with the idea that the reactor shouldn't know about futures/deferreds. I'm not sure I agree it's so important. If the universal way of writing asynchronous code is generator-coroutines, then the reactor should work well with this and not require extra effort.
You're correct.
I meant to be asking about the situation you were proposing. I thought it was just callbacks and generators, now we've added futures. Futures sans chaining can definitely implement this, just maybe not as nicely as how I'd do it. The issue is that it's a reasonable thing to want to escape the generator system in order to implement things that aren't "linear" the way generator coroutines are. And if we escape the system, it should be possible and easy to do a large variety of things. But, on the plus side, I'm convinced that it's possible, and that the necessary things will be exposed (even if it's very unpleasant, there's always helper functions...). Unless you do Greg's thing, then I'm worried again. I will read his stuff later today or tomorrow. (Unrelated: I'm not sure why I was so sure UnorderedEventList had to be that ugly. It can use a for loop... oops.)
(We're now deviating from futures and deferreds, but I think the part I was taking was drawing to a close anyway) Code that wants to use the old style can be integrated by calling it in a separate thread, and that's fine. If the results should be used in the asynchronous code, then have a thing that integrates with threading so that when the thread returns (or fails with an exception) it can notify a future/deferred of the outcome. Twisted's has deferToThread for this. It also has blockingCallFromThread if the synchronous code wants to talk back to the asynchronous code. And that leads me to this: Imagine if, instead of having two implementations (one synchronous, one not), we had only one (asynchronous), and then had some wrappers to make it work as a synchronous implementation as well? Here is an example of a synchronous program written in Python+Twisted, where I wrap deferlater to be a blocking function (so that it is similar to a time.sleep() followed by a function call). The reactor is started in a separate thread, and is left to die whenever the main thread dies (because thread daemons yay.) from __future__ import print_function import threading from twisted.internet import task, reactor from twisted.internet.threads import blockingCallFromThread def my_deferlater(reactor, time, callback, *args, **kwargs): return blockingCallFromThread(reactor, task.deferLater, reactor, time, callback, *args, **kwargs) # in reality, global reactor for all threads is terrible idea. # We'd want to instantiate a new reactor for # the reactor thread, and have a global REACTOR as well. # We'll just use this reactor. # This code will not work with any other twisted # code because of the global reactor shenanigans. # (But it'd work if we were able to have a reactor per thread.) REACTOR_THREAD = None def start_reactor(): global REACTOR_THREAD if REACTOR_THREAD is not None: # could be an error, or not, depending on how you feel this should be. return REACTOR_THREAD = threading.Thread(target=reactor.run, kwargs=dict( # signal handlers don't work if not in main thread. installSignalHandlers=0)) REACTOR_THREAD.daemon = True # Probably really evil. REACTOR_THREAD.start() start_reactor() my_deferlater(reactor, 1, print, "This will print after 1 second!") my_deferlater(reactor, 1, print, "This will print after 2 seconds!") my_deferlater(reactor, 1, print, "This will print after 3 seconds!") So maybe this is an option? It's really important that there not be just one global reactor, and that multiple reactors can run at the same time, for this to really work. But if that were done, then you could have a single global reactor responsible for being the back end of the new implementations of old synchronous APIs. Maybe it'd be started whenever the first call is made to a synchronous function. And maybe, to interoperate with some actual asynchronous code, you could have a way to change which reactor acts as the global reactor for synchronous APIs? I did this once, because I needed to rewrite a blocking API and wanted to use Twisted, except that I made the mistake of starting the thread when the module was created instead of on first call. This lead to a deadlock because of the global import lock... :( In principle I don't know why this would be a terrible awful idea, if it was done right, but maybe people with more experiences with threaded code can correct me. (The whole thread daemon thing necessary to make it act like a synchronous program, might be terribly insane and therefore an idea killer. I'm not sure.) I'm under the understanding that the global import lock won't cause this particular issue anymore as of Python 3.3, so perhaps starting a reactor on import is reasonable.
The docs in Twisted don't spell it out, but they do say that connectionMade should be considered to be the initializer for the connection, and that upon connectionLost the one should let the protocol be garbage collected. So, that seems like a guarantee that they are called in that order. I don't think it can really be enforced in Python (unless you want to do some jiggery pokery into model checking at runtime), but the responsibility for this failing in Twisted would be on the transport, as far as I understand it. If the transport calls back to the protocol in some invalid combination, it's the transport's fault for being broken. This is something that should be clearly documented. (It's an issue, also, regardless of whether or not a class is used to encapsulate the callbacks, or whether they are registered individually.) -- Devin

On Oct 14, 2012 11:27 AM, "Devin Jeanpierre" <jeanpierreda@gmail.com> wrote:
Yeah, while a global import lock still exists, it's used just long enough to get a per-module lock. On top of that, the import system now uses importlib (read: pure Python) for most functionality, which has bearing on threading and ease of better accommodating async if needed. -import

Guido van Rossum wrote:
I think this could be handled the same way you alluded to before when talking about the App Engine. The base implementation is asynchronous, and you provide a synchronous API that sets up an async operation and then runs a nested event loop until it completes. -- Greg

Okay, I hate to do this, but is there any chance someone can provide a quick summary of the solution we're referring to here? I just started watching python-ideas today, and have a lot of things going on, plus real bad ADD, so I'm having a hard time reassembling the solutions being referred to… (maybe there's a web presentation that gives a better threaded presentation than my mail program? Or maybe I'm daff. Either way, this sounded interesting!) In summary, then, the Q/A below is referring to which approach? Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 5:23 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
participants (9)
-
Antoine Pitrou
-
Ben Darnell
-
Devin Jeanpierre
-
Eric Snow
-
Greg Ewing
-
Guido van Rossum
-
Laurens Van Houtven
-
Richard Oudkerk
-
Shane Green