re-implementing Twisted for fun and profit

There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion. Would everyone interested in this please please please read <https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? Especially this section: <https://github.com/lvh/async-pep/blob/master/pep-3153.rst#why-separate-proto...>. If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation. I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer. Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore. If it needs a name, I suppose I'd call my preferred style "event triggering". Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk. -g

On Fri, Oct 12, 2012 at 9:46 PM, Glyph <glyph@twistedmatrix.com> wrote:
There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.
Would everyone interested in this please please please read <https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? Especially this section: <https://github.com/lvh/async-pep/blob/master/pep-3153.rst#why-separate-proto...>. If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.
I am well aware of that section. But, like the rest of PEP 3153, it is sorely lacking in examples or specifications.
I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.
This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best.
Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore.
Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer. The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run. But the common case is that they don't actually need to block because there is still data / space in the buffer. (You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.)
If it needs a name, I suppose I'd call my preferred style "event triggering".
But how does it work? What would typical user code in this style look like?
Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.
Well understood. (And I don't even want to get microthreading into the mix, although others may disagree -- I see Christian Tismer has jumped in...) But I also think that if we design these things in isolation it's likely that we'll find later that the pieces don't fit, and I don't want that to happen either. So I think we should consider these separate, but loosely coordinated efforts. -- --Guido van Rossum (python.org/~guido)

On 13.10.12 18:17, Guido van Rossum wrote:
I don't disagree but understand this, too. As long as we are talking Python 3.x, the topic is good compromises, usability and coordination. Pushing for microthreads would not be constructive for these threads (email-threads, of course ;-) . ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

On Oct 13, 2012, at 9:17 AM, Guido van Rossum <guido@python.org> wrote:
If that's what the problem is, I will do what I can to get those sections fleshed out ASAP.
I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.
This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best.
Edge triggering and level triggering both have to do with changes in boolean state. Edge triggering is "call me when this bit is changed"; level triggering is "call me (and keep calling me) when this bit is set". The metaphor extends very well from the electrical-circuit definition, but the distinction is not very meaningful to applications who want to subscribe to a semantic event and not the state of a bit. Applications want to know about particular bits of information, not state changes. Obviously when data is available on the connection, it's the bytes that the application is interested in. When a new connection is available to be accept()-ed, the application wants to know that as a distinct notification. There's no way to deliver data or new connected sockets to the application as "edge-triggered"; if the bit is still set later, then there's more, different data to be delivered, which needs a separate notification. But, even in more obscure cases like "the socket is writable", the transport layer needs to disambiguate between "the connection has closed unexpectedly" and "you should produce some more data for writing now". (You might want to also know how much buffer space is available, although that is pretty fiddly.) The low-level event loop needs to have both kinds of callbacks, but avoid exposing the distinction to application code. However, this doesn't mean all styles need to be implemented. If Python defines a core event loop interface specification, it doesn't have to provide every type of loop. Twisted can continue using its reactors, Tornado can continue using its IOLoop, and each can have transforming adapters to work with standard-library protocols. When the "you should read some data" bit is set, an edge-triggered transport receives that notification, reads the data, which immediately clears that bit, so it responds to the next down->up edge notification in the same way. The level-triggered transport does the same thing: it receives the notification that the bit is set, then immediately clears it by reading the data; therefore, if it gets another notification that the bit is high, that means it's high again, and more data needs to be read.
Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore.
Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer.
In the current Twisted implementation, you just get bytes objects delivered; when it was designed, 'str' was really the only game in town. However, I think this still applies because the first thing you're going to do when parsing the contents of your buffer is to break it up into chunks by using some handy bytes method. In modern Python, you might want to get a bytearray plus an offset delivered instead, because a bytearray can use recv_into, and a bytearray might be reusable, and could possibly let you implement some interesting zero-copy optimizations. However, in order to facilitate that, bytearray would need to have zero-copy implementations of split() and re.search() and such. In my opinion, the prerequisite for using anything other than a bytes object in practical use would be a very sophisticated lazy-slicing data structure, with zero-copy implementations of everything, and a copy-on-write version of recv_into so that if the sliced-up version of the data structure is shared between loop iterations the copies passed off to other event handlers don't get stomped on. (Although maybe somebody's implemented this while I wasn't looking?) This kind of pay-only-for-what-you-need buffering is really cool, a lot of fun to implement, and it will give you very noticeable performance gains if you're trying to write a wire-speed proxy or router with almost no logic in it; however, I've never seen it really be worth the trouble in any other type of application. I'd say that if we can all agree on the version that delivers bytes, the version that re-uses a fixed-sized bytearray buffer could be an optional feature in the 2.0 version of the spec.
The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run.
This is not how the application should talk to the receive buffer. Bytes should not necessarily be directly be requested by the application: they simply arrive. If you have to model everything in terms of a request-for-bytes/response-to-request idiom, there are several problems: 1. You have to heap-allocate an additional thing-to-track-the-request object every time you ask for bytes, which adds non-trivial additional overhead to the processing of simple streams. (The C-level event object that i.e. IOCP uses to track the request is slightly different, because it's a single signaling event and you should only ever have one outstanding per connection, so you don't have to make a bunch of them.) 2. Multiple listeners might want to "read" from a socket at once; for example, if you have a symmetric protocol where the application is simultaneously waiting for a response message from its peer and also trying to handle new requests of its own. (This is required in symmetric protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in this direction too.) 3. Even assuming you deal with part 1 and 2 properly - they are possible to work around - error-handling becomes tricky and tedious. You can't effectively determine in your coroutine scheduler which errors are in the code that is reading or writing to a given connection (because the error may have been triggered by code that was reading or writing to a different connection), so sometimes your sockets will just go off into la-la land with nothing reading from them or writing to them. In Twisted, if a dataReceived handler causes an error, then we know it's time to shut down that connection and close that socket; there's no ambiguity. Even if you want to write your protocol parsers in a yield-coroutine style, I don't think you want the core I/O layer to be written in that style; it should be possible to write everything as "raw" it's-just-a-method event handlers because that is really the lowest common denominator and therefore the lowest overhead; both in terms of performance and in terms of simplicity of debugging. It's easy enough to write a thing with a .data_received(data) method that calls send() on the appropriate suspended generator.
But the common case is that they don't actually need to block because there is still data / space in the buffer.
I don't think that this is necessarily the "common case". Certainly in bulk-transfer protocols or in any protocol that supports pipelining, you usually fill up the buffer completely on every iteration.
(You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.)
That's how Twisted works... sort of. If you call write(), it always just does its thing. That said, you can ask to be notified if you've written too much, so that you can slow down. (Flow-control is sort of a sore spot for the current Twisted API; what we have works, and it satisfies the core requirements, but the shape of the API is definitely not very convenient. <http://tm.tl/1956> outlines the next-generation streaming and flow-control primitives that we are currently working on. I'm very excited about those but they haven't been battle-tested yet.) If you're talking about "blocking" in a generator-coroutine style, then well-written code can do yield write(x) yield write(y) yield write(z) and "lazy" code, that doesn't care about over-filling its buffer, can just do write(x) write(y) yield write(z) there's no reason that the latter style ought to cause any sort of error.
If it needs a name, I suppose I'd call my preferred style "event triggering".
But how does it work? What would typical user code in this style look like?
It really depends on the layer. You have to promote what methods get called at each semantic layer; but, at the one that's most interesting for interoperability, the thing that delivers bytes to protocol parsers, it looks something like this: def data_received(self, data): lines = (self.buf + data).split("\r\n") for line in lines[:-1]: self.line_received(line) self.buf = lines[-1] At a higher level, you might have header_received, http_request_received, etc. The thing that calls data_received typically would look like this: def handle_read(self): try: data = self.socket.recv(self.buffer_size) except socket.error, se: if se.args[0] == EWOULDBLOCK: return else: return main.CONNECTION_LOST else: try: self.protocol.data_received(data) except: log_the_error() self.disconnect() although it obviously looks a little different in the case of IOCP.
Great, glad to hear it. -g

On Sat, Oct 13, 2012 at 8:41 PM, Glyph <glyph@twistedmatrix.com> wrote:
I'd love that! Laurens seems burned-out from his previous attempts at authoring that PEP and has not volunteered any examples.
I am well aware of the terms' meanings in electrical circuits. It seems that, alas, I may have misunderstood how the terms are used in the world of callbacks. In my naivete, when they were brought up, I thought that edge-triggered meant "call this callback once, when this specific event happens" (e.g. a specific async read or write call completing) whereas level-triggered referred to "call this callback whenever a certain condition is true" (e.g. a socket is ready for reading or writing). But from your messages it seems that it's more a technical term for different ways of dealing with the latter, so that in either case it is about multiple-call callbacks. If this is indeed the case I profusely apologize for the confusion I have probably caused. (Hopefully most people glazed over anyway. :-) Applications want to know about particular bits of information, not state
Excuse my ignorance, but are there ioctl() calls to get at this kind of information, or do you just have to try to call send()/write() and interpret the error you get back?
I'm not 100% sure I follow this. I think you are saying that in some systems the system level (the kernel, say) has an edge-triggered API and in other systems it is level-triggered? And that it doesn't matter much since it's easy to turn either into the other? If I've got that right, do you have a preference for what style the standard-library interface should use? And why?
Makes sense. So they both refer to multi-call callbacks (I don't know what you call these). And it looks like a common application of either is buffered streams, and another is incoming connections to listening sockets. Both seem to belong to the world of transports. Right?
That sounds like a *very* low-level consideration to me, and as you suggest unrealistic given the other limitations. I would rather just get bytes objects and pay for the copying. I know some people care deeply about extra copies, and in certain systems they are probably right, but I doubt that those systems would be implemented in Python even if we *did* bend over backwards to avoid copies. And it really would make the interface much more painful to use. Possibly there could be a separate harder-to-use lower-level API that deals in bytearrays for a few connoisseurs, but we probably shouldn't promote it much, and since it's always possible to add APIs later, I'd rather avoid defining it for version 1.
Seems we are in perfect agreement (I wrote the above without reading this far :-).
(Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)
I'll have to digest all this, but I'll be sure to think about this carefully. My kneejerk reactions are that (1) heap allocations are unavoidable anyway, (2) if there are multiple listeners there should be some other layer demultiplexing, and (3) nobody gets error handling right anyway; but I should be very suspicious of kneejerks, even my own.
I agree. In fact, the lowest level in NDB (my own big foray into async, albeit using App Engine's RPC instead of sockets) is written as an event loop with no references to generators or Futures -- all it knows about are RPCs and callback functions. (Given the way the RPC class is defined in App Engine, calling a designated method on the RPC object is out of the question, everything is callables plus *args plus **kwds.)
Another pragmatic observation that I wouldn't have been able to make on my own.
Good to know.
I see, I've written code like this many times, with many variations.
It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL, and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.
Thanks for taking the time to respond! -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Not sure if this is relevant, but I'd just like to point out that the behaviour of select() in this respect is actually *edge triggered* by this definition. Once it has reported that a given file descriptor is ready, it *won't* report that file descriptor again until you do something with it. This can be a subtle source of bugs in select-based code if you're not aware of it. -- Greg

Not sure I follow, but yeah: select reports the state of the file-descriptor. While the descriptor is readable, every call to select will indicate that it's readable, etc. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 5:48 AM, Richard Oudkerk <shibturn@gmail.com> wrote:

Shane Green wrote:
Unless I have misunderstood you, the following example contradicts that:
It does indeed contradict me. It looks like this is implementation-dependent, because I distinctly remember encountering a bug once that I traced back to the fact that I wasn't servicing *all* the fds reported as ready before making another select call. Since then I've always been careful to do that, so it's possible that the behaviour has changed in the meantime and I haven't noticed. -- Greg

There are definitely bugs and system-dependent behaviour* in these areas. Just a couple years ago I ran into one that lead to me writing a "handle_expt()" method with this comment before it:
Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 4:55 PM, Richard Oudkerk <shibturn@gmail.com> wrote:

On Oct 13, 2012, at 9:49 PM, Guido van Rossum <guido@python.org> wrote:
It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL[1], and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.
Hopefully I'll have time to reply to some of the other stuff in this message, but: Yes, absolutely. This is the most important core issue, for me. There's a little more to it than "data_received" (for example: listening for incoming connections, establishing outgoing connections, and scheduling timed calls) but this was the original motivation for the async PEP: to specify this interface. Again, I'll have to kick the appropriate people to try to get that started again. (Already started, at <https://twitter.com/glyph/status/256983396826378240>.) It's on github so anyone can contribute, so if other participants in this thread - especially those of you with connections to the Tornado community - would like to try fleshing some of it out, please go ahead. Even if you just have a question, or an area you think the PEP should address, file an issue (or comment on one already filed).
(Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)
Thanks for the repeated prompts for Twisted representatives to participate. I was somewhat reluctant to engage with this thread at first, because it looked like a lot of meandering discussion of how to implement stuff that Twisted already deals with quite effectively and I wasn't sure what the point of it all was - why not just go look at Twisted's implementation? But, after writing that message, I'm glad I did, since I realize that many of these insights are not recorded anywhere and in many cases there's no reasonable way to back this information out of Twisted's implementation. In my (ha ha) copious spare time, I'll try to do some blogging about these topics. -glyph [1]: With one minor nitpick: IOCP and SSL should not be mutually exclusive. This was a problem for Twisted for a while, given the absolutely stupid way that OpenSSL thinks "sockets" work; but, we now have <http://twistedmatrix.com/trac/browser/trunk/twisted/protocols/tls.py> which could probably be adapted to be framework-neutral if this transport/event-loop level were standardized.

Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following: # Server accepting incoming connections and spawning new HTTP/S channels… this.socket.accept().then(this.handle_connection) # With a handle_connection() kind of like… def handle_connection(conn): # Create new channel and add to socket map, then… if (this.running()): this.accept().then(this.handle_connection) # And HTTP/S channels with code like this… this.read_until("\r\n\r\n").then(this.handle_request) # And handle-request code that did stuff like… if (this.chunked): get_content = this.read_until("\r\n").then(self.parse_chunk_size).then(this.read_bytes) else:. get_content = this.read_bytes(this.content_length) return get_content.then(handle_content) I'll look around for the code, because it's been well over a year and wasn't complete event then, but that should convey some of how it was shaping up. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 7:07 PM, Glyph <glyph@twistedmatrix.com> wrote:

On Oct 14, 2012, at 7:47 PM, Shane Green <shane@umbrellacode.com> wrote:
Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following:
As I explained in a previous message, I think this is the wrong way to go, because: It's error-prone. It's very easy to forget to call this.accept().then(...). What if you get an exception? How do you associate it with 'this'? (Why do you have to constantly have application code check 'this.running'?) It's inefficient. You have to allocate a promise for every single operation. (Not a big deal for 'accept()' but kind of a big deal for 'recv()'. It's hard to share resources. What if multiple layers try to call .accept() or .read_until() from different promise contexts? As a bonus fourth point, this uses some wacky new promise abstraction which isn't Deferreds, and therefore (most likely) forgets to implement some part of the callback-flow abstraction that you really need for layered, composable systems :). We implemented something very like this in Twisted in a module called called "twisted.web2.stream" and it was a big problem and had very poor performance and I just this week fixed yet another instance of the 'oops I forgot to call .read() again in my exception handler' bug in a system where it's still in use. Please don't repeat this mistake in the standard library. -glyph

Your points regarding performance are good ones. My tests indicated it was slightly slower than asyncore. The API I based it on is actually quite thorough, and addresses many of the shortcomings Deferreds (in Twisted) have. Namely, all callbacks registered with a given Promise instance, receive the output of the original operation; chaining is fully supported but explicitly (this.then(that).then(that)…), rather than having a Deferred whose value automatically assumes that of each callback, making them necessarily dependent handlers fired before them, with a default guaranteed behaviour being that only the first one actually receives the output of the originating application. I haven't come across many instances where one wants to chain their callback by accident, but many examples where multiple parties were interested in the same operation's output. Finally, I'm not sure you're other points differ greatly from the gotchas of I/O programming in general. Uncoordinated access by multiple threads tends to be problematic. Again, though, you're point about efficiency and the less than ideal "an instance for every" arrangement are good ones. Just throwing it out there as a source of ideas, and hopefully to unseat Deferreds as the defacto callback standard four discussion because the promise pattern is more flexible and robust. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 15, 2012, at 12:45 AM, Glyph <glyph@twistedmatrix.com> wrote:

On Oct 15, 2012, at 1:03 AM, Shane Green <shane@umbrellacode.com> wrote:
Namely, all callbacks registered with a given Promise instance, receive the output of the original operation
This is somewhat tangential to the I/O loop discussion, and my hope for that discussion is that it won't involve Deferreds, or Futures, or Promises, or any other request/response callback management abstraction, because requests and responses are significantly higher level than accept() and recv() and do not belong within the same layer. The event loop ought to provide tools to experiment with event-driven abstractions so that users can use Deferreds and Promises - which are, fundamentally, perfectly interoperable, and still use standard library network protocol implementations. What I think you were trying to say was that callback addition on Deferreds is a destructive operation; whereas your promises are (from the caller's perspective, at least) immutable. Sometimes I do think that the visibly mutable nature of Deferreds was a mistake. If I read you properly though, what you're saying is that you can do this: promise = ... promise.then(alpha).then(beta) promise.then(gamma).then(delta) and in yield-coroutine style this is effectively: value = yield promise beta(yield alpha(value)) delta(yield gamma(value)) This deficiency is reasonably easy to work around with Deferreds. You can just do: def fork(d): dprime = Deferred() def propagate(result): dprime.callback(result) return result d.addBoth(propagate) return dprime and then: fork(x).addCallback(alpha).addCallback(beta) fork(x).addCallback(gamma).addCallback(delta) Perhaps this function should be in Twisted; it's certainly come up a few times. But, the fact that the original result is immediately forgotten can also be handy, because it helps the unused result get garbage collected faster, even if multiple things are hanging on to the Deferred after the initial result has been processed. And it is actually pretty unusual to want to share the same result among multiple callers (which is why this function hasn't been added to the core yet). -glyph

You make an excellent point about the different levels being discussed. Yes, you understand my point well. For some reason I've always hated thinking of the promise as immutable, but that's the normal terminology. The reality is that a Promise represents the output of an operation, and will emit the output of that operation to all callers that register with it. The promise doesn't pass itself as the value to the callbacks, so its immutability is somewhat immaterial. I'm not arguing with you on that point, just the general description of the pattern. The more I think about it, the more I'm realizing how inappropriate something like a deferred or promise is to this discussion. Unfortunately my knowledge of coroutines is somewhat limited, and my time the lasts couple of days, and the next couple, is preventing me from giving it a good think through. I understand them well enough to know they're cool, and I'm pretty sure I like the idea of making them the event loop mechanism. I think it would be good for us all to continuously revisit concrete examples during the discussion, because the set of core I/O are small enough to revisit multiple times. If a much more general mechanism naturally falls out then great. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 15, 2012, at 8:51 AM, Glyph <glyph@twistedmatrix.com> wrote:

On Fri, Oct 12, 2012 at 9:46 PM, Glyph <glyph@twistedmatrix.com> wrote:
There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.
Would everyone interested in this please please please read <https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? Especially this section: <https://github.com/lvh/async-pep/blob/master/pep-3153.rst#why-separate-proto...>. If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.
I am well aware of that section. But, like the rest of PEP 3153, it is sorely lacking in examples or specifications.
I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.
This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best.
Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore.
Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer. The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run. But the common case is that they don't actually need to block because there is still data / space in the buffer. (You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.)
If it needs a name, I suppose I'd call my preferred style "event triggering".
But how does it work? What would typical user code in this style look like?
Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.
Well understood. (And I don't even want to get microthreading into the mix, although others may disagree -- I see Christian Tismer has jumped in...) But I also think that if we design these things in isolation it's likely that we'll find later that the pieces don't fit, and I don't want that to happen either. So I think we should consider these separate, but loosely coordinated efforts. -- --Guido van Rossum (python.org/~guido)

On 13.10.12 18:17, Guido van Rossum wrote:
I don't disagree but understand this, too. As long as we are talking Python 3.x, the topic is good compromises, usability and coordination. Pushing for microthreads would not be constructive for these threads (email-threads, of course ;-) . ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

On Oct 13, 2012, at 9:17 AM, Guido van Rossum <guido@python.org> wrote:
If that's what the problem is, I will do what I can to get those sections fleshed out ASAP.
I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.
This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best.
Edge triggering and level triggering both have to do with changes in boolean state. Edge triggering is "call me when this bit is changed"; level triggering is "call me (and keep calling me) when this bit is set". The metaphor extends very well from the electrical-circuit definition, but the distinction is not very meaningful to applications who want to subscribe to a semantic event and not the state of a bit. Applications want to know about particular bits of information, not state changes. Obviously when data is available on the connection, it's the bytes that the application is interested in. When a new connection is available to be accept()-ed, the application wants to know that as a distinct notification. There's no way to deliver data or new connected sockets to the application as "edge-triggered"; if the bit is still set later, then there's more, different data to be delivered, which needs a separate notification. But, even in more obscure cases like "the socket is writable", the transport layer needs to disambiguate between "the connection has closed unexpectedly" and "you should produce some more data for writing now". (You might want to also know how much buffer space is available, although that is pretty fiddly.) The low-level event loop needs to have both kinds of callbacks, but avoid exposing the distinction to application code. However, this doesn't mean all styles need to be implemented. If Python defines a core event loop interface specification, it doesn't have to provide every type of loop. Twisted can continue using its reactors, Tornado can continue using its IOLoop, and each can have transforming adapters to work with standard-library protocols. When the "you should read some data" bit is set, an edge-triggered transport receives that notification, reads the data, which immediately clears that bit, so it responds to the next down->up edge notification in the same way. The level-triggered transport does the same thing: it receives the notification that the bit is set, then immediately clears it by reading the data; therefore, if it gets another notification that the bit is high, that means it's high again, and more data needs to be read.
Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore.
Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer.
In the current Twisted implementation, you just get bytes objects delivered; when it was designed, 'str' was really the only game in town. However, I think this still applies because the first thing you're going to do when parsing the contents of your buffer is to break it up into chunks by using some handy bytes method. In modern Python, you might want to get a bytearray plus an offset delivered instead, because a bytearray can use recv_into, and a bytearray might be reusable, and could possibly let you implement some interesting zero-copy optimizations. However, in order to facilitate that, bytearray would need to have zero-copy implementations of split() and re.search() and such. In my opinion, the prerequisite for using anything other than a bytes object in practical use would be a very sophisticated lazy-slicing data structure, with zero-copy implementations of everything, and a copy-on-write version of recv_into so that if the sliced-up version of the data structure is shared between loop iterations the copies passed off to other event handlers don't get stomped on. (Although maybe somebody's implemented this while I wasn't looking?) This kind of pay-only-for-what-you-need buffering is really cool, a lot of fun to implement, and it will give you very noticeable performance gains if you're trying to write a wire-speed proxy or router with almost no logic in it; however, I've never seen it really be worth the trouble in any other type of application. I'd say that if we can all agree on the version that delivers bytes, the version that re-uses a fixed-sized bytearray buffer could be an optional feature in the 2.0 version of the spec.
The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run.
This is not how the application should talk to the receive buffer. Bytes should not necessarily be directly be requested by the application: they simply arrive. If you have to model everything in terms of a request-for-bytes/response-to-request idiom, there are several problems: 1. You have to heap-allocate an additional thing-to-track-the-request object every time you ask for bytes, which adds non-trivial additional overhead to the processing of simple streams. (The C-level event object that i.e. IOCP uses to track the request is slightly different, because it's a single signaling event and you should only ever have one outstanding per connection, so you don't have to make a bunch of them.) 2. Multiple listeners might want to "read" from a socket at once; for example, if you have a symmetric protocol where the application is simultaneously waiting for a response message from its peer and also trying to handle new requests of its own. (This is required in symmetric protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in this direction too.) 3. Even assuming you deal with part 1 and 2 properly - they are possible to work around - error-handling becomes tricky and tedious. You can't effectively determine in your coroutine scheduler which errors are in the code that is reading or writing to a given connection (because the error may have been triggered by code that was reading or writing to a different connection), so sometimes your sockets will just go off into la-la land with nothing reading from them or writing to them. In Twisted, if a dataReceived handler causes an error, then we know it's time to shut down that connection and close that socket; there's no ambiguity. Even if you want to write your protocol parsers in a yield-coroutine style, I don't think you want the core I/O layer to be written in that style; it should be possible to write everything as "raw" it's-just-a-method event handlers because that is really the lowest common denominator and therefore the lowest overhead; both in terms of performance and in terms of simplicity of debugging. It's easy enough to write a thing with a .data_received(data) method that calls send() on the appropriate suspended generator.
But the common case is that they don't actually need to block because there is still data / space in the buffer.
I don't think that this is necessarily the "common case". Certainly in bulk-transfer protocols or in any protocol that supports pipelining, you usually fill up the buffer completely on every iteration.
(You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.)
That's how Twisted works... sort of. If you call write(), it always just does its thing. That said, you can ask to be notified if you've written too much, so that you can slow down. (Flow-control is sort of a sore spot for the current Twisted API; what we have works, and it satisfies the core requirements, but the shape of the API is definitely not very convenient. <http://tm.tl/1956> outlines the next-generation streaming and flow-control primitives that we are currently working on. I'm very excited about those but they haven't been battle-tested yet.) If you're talking about "blocking" in a generator-coroutine style, then well-written code can do yield write(x) yield write(y) yield write(z) and "lazy" code, that doesn't care about over-filling its buffer, can just do write(x) write(y) yield write(z) there's no reason that the latter style ought to cause any sort of error.
If it needs a name, I suppose I'd call my preferred style "event triggering".
But how does it work? What would typical user code in this style look like?
It really depends on the layer. You have to promote what methods get called at each semantic layer; but, at the one that's most interesting for interoperability, the thing that delivers bytes to protocol parsers, it looks something like this: def data_received(self, data): lines = (self.buf + data).split("\r\n") for line in lines[:-1]: self.line_received(line) self.buf = lines[-1] At a higher level, you might have header_received, http_request_received, etc. The thing that calls data_received typically would look like this: def handle_read(self): try: data = self.socket.recv(self.buffer_size) except socket.error, se: if se.args[0] == EWOULDBLOCK: return else: return main.CONNECTION_LOST else: try: self.protocol.data_received(data) except: log_the_error() self.disconnect() although it obviously looks a little different in the case of IOCP.
Great, glad to hear it. -g

On Sat, Oct 13, 2012 at 8:41 PM, Glyph <glyph@twistedmatrix.com> wrote:
I'd love that! Laurens seems burned-out from his previous attempts at authoring that PEP and has not volunteered any examples.
I am well aware of the terms' meanings in electrical circuits. It seems that, alas, I may have misunderstood how the terms are used in the world of callbacks. In my naivete, when they were brought up, I thought that edge-triggered meant "call this callback once, when this specific event happens" (e.g. a specific async read or write call completing) whereas level-triggered referred to "call this callback whenever a certain condition is true" (e.g. a socket is ready for reading or writing). But from your messages it seems that it's more a technical term for different ways of dealing with the latter, so that in either case it is about multiple-call callbacks. If this is indeed the case I profusely apologize for the confusion I have probably caused. (Hopefully most people glazed over anyway. :-) Applications want to know about particular bits of information, not state
Excuse my ignorance, but are there ioctl() calls to get at this kind of information, or do you just have to try to call send()/write() and interpret the error you get back?
I'm not 100% sure I follow this. I think you are saying that in some systems the system level (the kernel, say) has an edge-triggered API and in other systems it is level-triggered? And that it doesn't matter much since it's easy to turn either into the other? If I've got that right, do you have a preference for what style the standard-library interface should use? And why?
Makes sense. So they both refer to multi-call callbacks (I don't know what you call these). And it looks like a common application of either is buffered streams, and another is incoming connections to listening sockets. Both seem to belong to the world of transports. Right?
That sounds like a *very* low-level consideration to me, and as you suggest unrealistic given the other limitations. I would rather just get bytes objects and pay for the copying. I know some people care deeply about extra copies, and in certain systems they are probably right, but I doubt that those systems would be implemented in Python even if we *did* bend over backwards to avoid copies. And it really would make the interface much more painful to use. Possibly there could be a separate harder-to-use lower-level API that deals in bytearrays for a few connoisseurs, but we probably shouldn't promote it much, and since it's always possible to add APIs later, I'd rather avoid defining it for version 1.
Seems we are in perfect agreement (I wrote the above without reading this far :-).
(Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)
I'll have to digest all this, but I'll be sure to think about this carefully. My kneejerk reactions are that (1) heap allocations are unavoidable anyway, (2) if there are multiple listeners there should be some other layer demultiplexing, and (3) nobody gets error handling right anyway; but I should be very suspicious of kneejerks, even my own.
I agree. In fact, the lowest level in NDB (my own big foray into async, albeit using App Engine's RPC instead of sockets) is written as an event loop with no references to generators or Futures -- all it knows about are RPCs and callback functions. (Given the way the RPC class is defined in App Engine, calling a designated method on the RPC object is out of the question, everything is callables plus *args plus **kwds.)
Another pragmatic observation that I wouldn't have been able to make on my own.
Good to know.
I see, I've written code like this many times, with many variations.
It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL, and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.
Thanks for taking the time to respond! -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Not sure if this is relevant, but I'd just like to point out that the behaviour of select() in this respect is actually *edge triggered* by this definition. Once it has reported that a given file descriptor is ready, it *won't* report that file descriptor again until you do something with it. This can be a subtle source of bugs in select-based code if you're not aware of it. -- Greg

Not sure I follow, but yeah: select reports the state of the file-descriptor. While the descriptor is readable, every call to select will indicate that it's readable, etc. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 5:48 AM, Richard Oudkerk <shibturn@gmail.com> wrote:

Shane Green wrote:
Unless I have misunderstood you, the following example contradicts that:
It does indeed contradict me. It looks like this is implementation-dependent, because I distinctly remember encountering a bug once that I traced back to the fact that I wasn't servicing *all* the fds reported as ready before making another select call. Since then I've always been careful to do that, so it's possible that the behaviour has changed in the meantime and I haven't noticed. -- Greg

There are definitely bugs and system-dependent behaviour* in these areas. Just a couple years ago I ran into one that lead to me writing a "handle_expt()" method with this comment before it:
Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 4:55 PM, Richard Oudkerk <shibturn@gmail.com> wrote:

On Oct 13, 2012, at 9:49 PM, Guido van Rossum <guido@python.org> wrote:
It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL[1], and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.
Hopefully I'll have time to reply to some of the other stuff in this message, but: Yes, absolutely. This is the most important core issue, for me. There's a little more to it than "data_received" (for example: listening for incoming connections, establishing outgoing connections, and scheduling timed calls) but this was the original motivation for the async PEP: to specify this interface. Again, I'll have to kick the appropriate people to try to get that started again. (Already started, at <https://twitter.com/glyph/status/256983396826378240>.) It's on github so anyone can contribute, so if other participants in this thread - especially those of you with connections to the Tornado community - would like to try fleshing some of it out, please go ahead. Even if you just have a question, or an area you think the PEP should address, file an issue (or comment on one already filed).
(Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)
Thanks for the repeated prompts for Twisted representatives to participate. I was somewhat reluctant to engage with this thread at first, because it looked like a lot of meandering discussion of how to implement stuff that Twisted already deals with quite effectively and I wasn't sure what the point of it all was - why not just go look at Twisted's implementation? But, after writing that message, I'm glad I did, since I realize that many of these insights are not recorded anywhere and in many cases there's no reasonable way to back this information out of Twisted's implementation. In my (ha ha) copious spare time, I'll try to do some blogging about these topics. -glyph [1]: With one minor nitpick: IOCP and SSL should not be mutually exclusive. This was a problem for Twisted for a while, given the absolutely stupid way that OpenSSL thinks "sockets" work; but, we now have <http://twistedmatrix.com/trac/browser/trunk/twisted/protocols/tls.py> which could probably be adapted to be framework-neutral if this transport/event-loop level were standardized.

Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following: # Server accepting incoming connections and spawning new HTTP/S channels… this.socket.accept().then(this.handle_connection) # With a handle_connection() kind of like… def handle_connection(conn): # Create new channel and add to socket map, then… if (this.running()): this.accept().then(this.handle_connection) # And HTTP/S channels with code like this… this.read_until("\r\n\r\n").then(this.handle_request) # And handle-request code that did stuff like… if (this.chunked): get_content = this.read_until("\r\n").then(self.parse_chunk_size).then(this.read_bytes) else:. get_content = this.read_bytes(this.content_length) return get_content.then(handle_content) I'll look around for the code, because it's been well over a year and wasn't complete event then, but that should convey some of how it was shaping up. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 14, 2012, at 7:07 PM, Glyph <glyph@twistedmatrix.com> wrote:

On Oct 14, 2012, at 7:47 PM, Shane Green <shane@umbrellacode.com> wrote:
Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following:
As I explained in a previous message, I think this is the wrong way to go, because: It's error-prone. It's very easy to forget to call this.accept().then(...). What if you get an exception? How do you associate it with 'this'? (Why do you have to constantly have application code check 'this.running'?) It's inefficient. You have to allocate a promise for every single operation. (Not a big deal for 'accept()' but kind of a big deal for 'recv()'. It's hard to share resources. What if multiple layers try to call .accept() or .read_until() from different promise contexts? As a bonus fourth point, this uses some wacky new promise abstraction which isn't Deferreds, and therefore (most likely) forgets to implement some part of the callback-flow abstraction that you really need for layered, composable systems :). We implemented something very like this in Twisted in a module called called "twisted.web2.stream" and it was a big problem and had very poor performance and I just this week fixed yet another instance of the 'oops I forgot to call .read() again in my exception handler' bug in a system where it's still in use. Please don't repeat this mistake in the standard library. -glyph

Your points regarding performance are good ones. My tests indicated it was slightly slower than asyncore. The API I based it on is actually quite thorough, and addresses many of the shortcomings Deferreds (in Twisted) have. Namely, all callbacks registered with a given Promise instance, receive the output of the original operation; chaining is fully supported but explicitly (this.then(that).then(that)…), rather than having a Deferred whose value automatically assumes that of each callback, making them necessarily dependent handlers fired before them, with a default guaranteed behaviour being that only the first one actually receives the output of the originating application. I haven't come across many instances where one wants to chain their callback by accident, but many examples where multiple parties were interested in the same operation's output. Finally, I'm not sure you're other points differ greatly from the gotchas of I/O programming in general. Uncoordinated access by multiple threads tends to be problematic. Again, though, you're point about efficiency and the less than ideal "an instance for every" arrangement are good ones. Just throwing it out there as a source of ideas, and hopefully to unseat Deferreds as the defacto callback standard four discussion because the promise pattern is more flexible and robust. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 15, 2012, at 12:45 AM, Glyph <glyph@twistedmatrix.com> wrote:

On Oct 15, 2012, at 1:03 AM, Shane Green <shane@umbrellacode.com> wrote:
Namely, all callbacks registered with a given Promise instance, receive the output of the original operation
This is somewhat tangential to the I/O loop discussion, and my hope for that discussion is that it won't involve Deferreds, or Futures, or Promises, or any other request/response callback management abstraction, because requests and responses are significantly higher level than accept() and recv() and do not belong within the same layer. The event loop ought to provide tools to experiment with event-driven abstractions so that users can use Deferreds and Promises - which are, fundamentally, perfectly interoperable, and still use standard library network protocol implementations. What I think you were trying to say was that callback addition on Deferreds is a destructive operation; whereas your promises are (from the caller's perspective, at least) immutable. Sometimes I do think that the visibly mutable nature of Deferreds was a mistake. If I read you properly though, what you're saying is that you can do this: promise = ... promise.then(alpha).then(beta) promise.then(gamma).then(delta) and in yield-coroutine style this is effectively: value = yield promise beta(yield alpha(value)) delta(yield gamma(value)) This deficiency is reasonably easy to work around with Deferreds. You can just do: def fork(d): dprime = Deferred() def propagate(result): dprime.callback(result) return result d.addBoth(propagate) return dprime and then: fork(x).addCallback(alpha).addCallback(beta) fork(x).addCallback(gamma).addCallback(delta) Perhaps this function should be in Twisted; it's certainly come up a few times. But, the fact that the original result is immediately forgotten can also be handy, because it helps the unused result get garbage collected faster, even if multiple things are hanging on to the Deferred after the initial result has been processed. And it is actually pretty unusual to want to share the same result among multiple callers (which is why this function hasn't been added to the core yet). -glyph

You make an excellent point about the different levels being discussed. Yes, you understand my point well. For some reason I've always hated thinking of the promise as immutable, but that's the normal terminology. The reality is that a Promise represents the output of an operation, and will emit the output of that operation to all callers that register with it. The promise doesn't pass itself as the value to the callbacks, so its immutability is somewhat immaterial. I'm not arguing with you on that point, just the general description of the pattern. The more I think about it, the more I'm realizing how inappropriate something like a deferred or promise is to this discussion. Unfortunately my knowledge of coroutines is somewhat limited, and my time the lasts couple of days, and the next couple, is preventing me from giving it a good think through. I understand them well enough to know they're cool, and I'm pretty sure I like the idea of making them the event loop mechanism. I think it would be good for us all to continuously revisit concrete examples during the discussion, because the set of core I/O are small enough to revisit multiple times. If a much more general mechanism naturally falls out then great. Shane Green www.umbrellacode.com 805-452-9666 | shane@umbrellacode.com On Oct 15, 2012, at 8:51 AM, Glyph <glyph@twistedmatrix.com> wrote:
participants (6)
-
Christian Tismer
-
Glyph
-
Greg Ewing
-
Guido van Rossum
-
Richard Oudkerk
-
Shane Green