On Oct 13, 2012, at 9:17 AM, Guido van Rossum <guido@python.org> wrote:

On Fri, Oct 12, 2012 at 9:46 PM, Glyph <glyph@twistedmatrix.com> wrote:
There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.

Would everyone interested in this please please please read <https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? Especially this section: <https://github.com/lvh/async-pep/blob/master/pep-3153.rst#why-separate-protocols-and-transports>. If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.

I am well aware of that section. But, like the rest of PEP 3153, it is
sorely lacking in examples or specifications.

If that's what the problem is, I will do what I can to get those sections fleshed out ASAP.

I'd love that! Laurens seems burned-out from his previous attempts at authoring that PEP and has not volunteered any examples.

I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.

This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best.

Edge triggering and level triggering both have to do with changes in boolean state. Edge triggering is "call me when this bit is changed"; level triggering is "call me (and keep calling me) when this bit is set". The metaphor extends very well from the electrical-circuit definition, but the distinction is not very meaningful to applications who want to subscribe to a semantic event and not the state of a bit.

I am well aware of the terms' meanings in electrical circuits. It seems that, alas, I may have misunderstood how the terms are used in the world of callbacks. In my naivete, when they were brought up, I thought that edge-triggered meant "call this callback once, when this specific event happens" (e.g. a specific async read or write call completing) whereas level-triggered referred to "call this callback whenever a certain condition is true" (e.g. a socket is ready for reading or writing).

But from your messages it seems that it's more a technical term for different ways of dealing with the latter, so that in either case it is about multiple-call callbacks. If this is indeed the case I profusely apologize for the confusion I have probably caused. (Hopefully most people glazed over anyway. :-)

Applications want to know about particular bits of information, not state changes. Obviously when data is available on the connection, it's the bytes that the application is interested in. When a new connection is available to be accept()-ed, the application wants to know that as a distinct notification. There's no way to deliver data or new connected sockets to the application as "edge-triggered"; if the bit is still set later, then there's more, different data to be delivered, which needs a separate notification. But, even in more obscure cases like "the socket is writable", the transport layer needs to disambiguate between "the connection has closed unexpectedly" and "you should produce some more data for writing now". (You might want to also know how much buffer space is available, although that is pretty fiddly.)

Excuse my ignorance, but are there ioctl() calls to get at this kind of information, or do you just have to try to call send()/write() and interpret the error you get back?

The low-level event loop needs to have both kinds of callbacks, but avoid exposing the distinction to application code. However, this doesn't mean all styles need to be implemented. If Python defines a core event loop interface specification, it doesn't have to provide every type of loop. Twisted can continue using its reactors, Tornado can continue using its IOLoop, and each can have transforming adapters to work with standard-library protocols.

I'm not 100% sure I follow this. I think you are saying that in some systems the system level (the kernel, say) has an edge-triggered API and in other systems it is level-triggered? And that it doesn't matter much since it's easy to turn either into the other?

If I've got that right, do you have a preference for what style the standard-library interface should use? And why?

When the "you should read some data" bit is set, an edge-triggered transport receives that notification, reads the data, which immediately clears that bit, so it responds to the next down->up edge notification in the same way. The level-triggered transport does the same thing: it receives the notification that the bit is set, then immediately clears it by reading the data; therefore, if it gets another notification that the bit is high, that means it's high again, and more data needs to be read.

Makes sense. So they both refer to multi-call callbacks (I don't know what you call these). And it looks like a common application of either is buffered streams, and another is incoming connections to listening sockets. Both seem to belong to the world of transports. Right?

Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore.

Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer.

In the current Twisted implementation, you just get bytes objects delivered; when it was designed, 'str' was really the only game in town. However, I think this still applies because the first thing you're going to do when parsing the contents of your buffer is to break it up into chunks by using some handy bytes method.

In modern Python, you might want to get a bytearray plus an offset delivered instead, because a bytearray can use recv_into, and a bytearray might be reusable, and could possibly let you implement some interesting zero-copy optimizations. However, in order to facilitate that, bytearray would need to have zero-copy implementations of split() and re.search() and such.

That sounds like a *very* low-level consideration to me, and as you suggest unrealistic given the other limitations. I would rather just get bytes objects and pay for the copying. I know some people care deeply about extra copies, and in certain systems they are probably right, but I doubt that those systems would be implemented in Python even if we *did* bend over backwards to avoid copies. And it really would make the interface much more painful to use. Possibly there could be a separate harder-to-use lower-level API that deals in bytearrays for a few connoisseurs, but we probably shouldn't promote it much, and since it's always possible to add APIs later, I'd rather avoid defining it for version 1.

In my opinion, the prerequisite for using anything other than a bytes object in practical use would be a very sophisticated lazy-slicing data structure, with zero-copy implementations of everything, and a copy-on-write version of recv_into so that if the sliced-up version of the data structure is shared between loop iterations the copies passed off to other event handlers don't get stomped on. (Although maybe somebody's implemented this while I wasn't looking?)

This kind of pay-only-for-what-you-need buffering is really cool, a lot of fun to implement, and it will give you very noticeable performance gains if you're trying to write a wire-speed proxy or router with almost no logic in it; however, I've never seen it really be worth the trouble in any other type of application. I'd say that if we can all agree on the version that delivers bytes, the version that re-uses a fixed-sized bytearray buffer could be an optional feature in the 2.0 version of the spec.

Seems we are in perfect agreement (I wrote the above without reading this far :-).

The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run.

This is not how the application should talk to the receive buffer. Bytes should not necessarily be directly be requested by the application: they simply arrive. If you have to model everything in terms of a request-for-bytes/response-to-request idiom, there are several problems:

(Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)

1. You have to heap-allocate an additional thing-to-track-the-request object every time you ask for bytes, which adds non-trivial additional overhead to the processing of simple streams. (The C-level event object that i.e. IOCP uses to track the request is slightly different, because it's a single signaling event and you should only ever have one outstanding per connection, so you don't have to make a bunch of them.)

2. Multiple listeners might want to "read" from a socket at once; for example, if you have a symmetric protocol where the application is simultaneously waiting for a response message from its peer and also trying to handle new requests of its own. (This is required in symmetric protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in this direction too.)

3. Even assuming you deal with part 1 and 2 properly - they are possible to work around - error-handling becomes tricky and tedious. You can't effectively determine in your coroutine scheduler which errors are in the code that is reading or writing to a given connection (because the error may have been triggered by code that was reading or writing to a different connection), so sometimes your sockets will just go off into la-la land with nothing reading from them or writing to them. In Twisted, if a dataReceived handler causes an error, then we know it's time to shut down that connection and close that socket; there's no ambiguity.

I'll have to digest all this, but I'll be sure to think about this carefully. My kneejerk reactions are that (1) heap allocations are unavoidable anyway, (2) if there are multiple listeners there should be some other layer demultiplexing, and (3) nobody gets error handling right anyway; but I should be very suspicious of kneejerks, even my own.

Even if you want to write your protocol parsers in a yield-coroutine style, I don't think you want the core I/O layer to be written in that style; it should be possible to write everything as "raw" it's-just-a-method event handlers because that is really the lowest common denominator and therefore the lowest overhead; both in terms of performance and in terms of simplicity of debugging. It's easy enough to write a thing with a .data_received(data) method that calls send() on the appropriate suspended generator.

I agree. In fact, the lowest level in NDB (my own big foray into async, albeit using App Engine's RPC instead of sockets) is written as an event loop with no references to generators or Futures -- all it knows about are RPCs and callback functions. (Given the way the RPC class is defined in App Engine, calling a designated method on the RPC object is out of the question, everything is callables plus *args plus **kwds.)

But the common case is that they don't actually need to block because there is still data / space in the buffer.

I don't think that this is necessarily the "common case". Certainly in bulk-transfer protocols or in any protocol that supports pipelining, you usually fill up the buffer completely on every iteration.

Another pragmatic observation that I wouldn't have been able to make on my own.

(You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.)

That's how Twisted works... sort of. If you call write(), it always just does its thing. That said, you can ask to be notified if you've written too much, so that you can slow down.

(Flow-control is sort of a sore spot for the current Twisted API; what we have works, and it satisfies the core requirements, but the shape of the API is definitely not very convenient. <http://tm.tl/1956> outlines the next-generation streaming and flow-control primitives that we are currently working on. I'm very excited about those but they haven't been battle-tested yet.)

If you're talking about "blocking" in a generator-coroutine style, then well-written code can do

yield write(x)
yield write(y)
yield write(z)

and "lazy" code, that doesn't care about over-filling its buffer, can just do

write(x)
write(y)
yield write(z)

there's no reason that the latter style ought to cause any sort of error.

Good to know.

If it needs a name, I suppose I'd call my preferred style "event triggering".

But how does it work? What would typical user code in this style look like?

It really depends on the layer. You have to promote what methods get called at each semantic layer; but, at the one that's most interesting for interoperability, the thing that delivers bytes to protocol parsers, it looks something like this:

def data_received(self, data):
lines = (self.buf + data).split("\r\n")
for line in lines[:-1]:

self.line_received(line)
self.buf = lines[-1]

I see, I've written code like this many times, with many variations.

At a higher level, you might have header_received, http_request_received, etc.

The thing that calls data_received typically would look like this:

def handle_read(self):
try:
data = self.socket.recv(self.buffer_size)
except socket.error, se:

if se.args[0] == EWOULDBLOCK:
return
else:
return main.CONNECTION_LOST
else:
try:
self.protocol.data_received(data)

except:
log_the_error()
self.disconnect()

although it obviously looks a little different in the case of IOCP.

It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL, and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.

Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.

Well understood. (And I don't even want to get microthreading into the
mix, although others may disagree -- I see Christian Tismer has jumped
in...) But I also think that if we design these things in isolation
it's likely that we'll find later that the pieces don't fit, and I
don't want that to happen either. So I think we should consider these
separate, but loosely coordinated efforts.

Great, glad to hear it.